Cerebro - AI Conference Paper Search Engine

Cerebro is a modern, streamlined search engine for discovering and exploring papers from major AI/ML conferences. Built with Streamlit, it provides real-time paper aggregation and efficient search capabilities across venues like NeurIPS, ICML, ICLR, and the ACL Anthology along with arXiv search.

Video Demo: https://drive.google.com/file/d/13ajc0AyA_IDs1umJm0K_6Ymn0CLsY4qS/view?usp=sharing

✨ New Features & Improvements

Conference Paper Search

Smart Database Management:
- SQLite with FTS5 indexing for fast full-text search
- Async background updates using SQLite WAL mode
- Progress tracking for initial database population
- Real-time paper count display

arXiv Integration

Real-time arXiv Search:
- Direct integration with arXiv API
- XML response parsing
- Category-based filtering (CS.AI, CS.LG, CS.CL, CS.CV, etc.)
- Pagination support for search results
- Sorted by submission date (newest first)

Enhanced UI/UX

Tabbed Interface:
- Separate tabs for conference and arXiv papers
- Clean, responsive layout
- Abstract preview in dialog windows
- Paper metadata display

Database Architecture

Efficient Schema:
- FTS5 virtual tables for fast search
- Indexes on common query fields
- Optimized for concurrent reads
- Background write operations

🚀 Installation

Clone the repository:

git clone https://github.com/yourusername/cerebro.git
cd cerebro

Install dependencies:

pip install -r requirements.txt

Initialize and run:

make all    # Complete setup and launch
# OR
make init   # Just initialize database
make run    # Just run the application

💡 Usage

Conference Papers:
- Search across indexed conference papers
- Filter by venue and year
- View abstracts and paper links
arXiv Papers:
- Real-time search in arXiv database
- Filter by CS categories
- Navigate through paginated results
- Latest papers first

🏗️ Architecture

cerebro/
├── app.py              # Main Streamlit application
├── config.py           # Configuration and constants
├── utils.py            # Helper functions
├── db/
│   └── paper_db.py     # SQLite database management
├── parsers/
│   ├── base.py         # Abstract parser class
│   ├── acl_parser.py   # ACL Anthology parser
│   ├── arxiv_parser.py # arXiv API parser
│   └── ml_parser.py    # ML conference parser
└── assets/            # Static files

Key Components:

Database: SQLite with FTS5 for full-text search
Web UI: Streamlit for reactive interface
APIs: arXiv API integration, ACL Anthology/Neurips/ICML/ICLR scraping
Background Tasks: Async database updates
Search Engine: SQLite FTS5 with ranking

🤝 Contributing

Contributions welcome! Feel free to submit a Pull Request.

📝 License

MIT License - see LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
assets		assets
db		db
docs		docs
parsers		parsers
.DS_Store		.DS_Store
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
__init__.py		__init__.py
app.py		app.py
config.py		config.py
requirements.txt		requirements.txt
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Cerebro - AI Conference Paper Search Engine

✨ New Features & Improvements

Conference Paper Search

arXiv Integration

Enhanced UI/UX

Database Architecture

🚀 Installation

💡 Usage

🏗️ Architecture

Key Components:

🤝 Contributing

📝 License

About

Releases

Packages

Languages

License

nafis-neehal/Cerebro

Folders and files

Latest commit

History

Repository files navigation

Cerebro - AI Conference Paper Search Engine

✨ New Features & Improvements

Conference Paper Search

arXiv Integration

Enhanced UI/UX

Database Architecture

🚀 Installation

💡 Usage

🏗️ Architecture

Key Components:

🤝 Contributing

📝 License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages