Cerebro is a modern, streamlined search engine for discovering and exploring papers from major AI/ML conferences. Built with Streamlit, it provides real-time paper aggregation and efficient search capabilities across venues like NeurIPS, ICML, ICLR, and the ACL Anthology along with arXiv search.
Video Demo: https://drive.google.com/file/d/13ajc0AyA_IDs1umJm0K_6Ymn0CLsY4qS/view?usp=sharing
- Smart Database Management:
- SQLite with FTS5 indexing for fast full-text search
- Async background updates using SQLite WAL mode
- Progress tracking for initial database population
- Real-time paper count display
- Real-time arXiv Search:
- Direct integration with arXiv API
- XML response parsing
- Category-based filtering (CS.AI, CS.LG, CS.CL, CS.CV, etc.)
- Pagination support for search results
- Sorted by submission date (newest first)
- Tabbed Interface:
- Separate tabs for conference and arXiv papers
- Clean, responsive layout
- Abstract preview in dialog windows
- Paper metadata display
- Efficient Schema:
- FTS5 virtual tables for fast search
- Indexes on common query fields
- Optimized for concurrent reads
- Background write operations
- Clone the repository:
git clone https://github.com/yourusername/cerebro.git
cd cerebro
- Install dependencies:
pip install -r requirements.txt
- Initialize and run:
make all # Complete setup and launch
# OR
make init # Just initialize database
make run # Just run the application
-
Conference Papers:
- Search across indexed conference papers
- Filter by venue and year
- View abstracts and paper links
-
arXiv Papers:
- Real-time search in arXiv database
- Filter by CS categories
- Navigate through paginated results
- Latest papers first
cerebro/
├── app.py # Main Streamlit application
├── config.py # Configuration and constants
├── utils.py # Helper functions
├── db/
│ └── paper_db.py # SQLite database management
├── parsers/
│ ├── base.py # Abstract parser class
│ ├── acl_parser.py # ACL Anthology parser
│ ├── arxiv_parser.py # arXiv API parser
│ └── ml_parser.py # ML conference parser
└── assets/ # Static files
- Database: SQLite with FTS5 for full-text search
- Web UI: Streamlit for reactive interface
- APIs: arXiv API integration, ACL Anthology/Neurips/ICML/ICLR scraping
- Background Tasks: Async database updates
- Search Engine: SQLite FTS5 with ranking
Contributions welcome! Feel free to submit a Pull Request.
MIT License - see LICENSE file for details.