This project allows you to upload a PDF and ask questions about its content using Deepseek R1 via Ollama. The application processes PDFs, extracts text, indexes them into a vector store, and retrieves relevant context to generate concise answers.
- 📂 Upload a PDF: Select a PDF file to process.
- 🔍 Text Extraction & Indexing: Extracts content and indexes it for efficient search.
- 💡 Question-Answering: Ask questions related to the PDF content and get relevant answers.
- 🚀 Powered by Ollama & LangChain: Uses
Deepseek R1
for embeddings and responses.
- Python 3.8+
- Ollama installed
- Dependencies installed via pip
-
Clone this repository:
git clone https://github.com/hasan-py/chat-with-pdf-RAG.git cd chat-with-pdf-RAG
Activate your python env and install the dependencies.
-
Install dependencies:
pip install -r requirements.txt
-
Run the Streamlit app:
streamlit run pdf_rag.py
- Upload a PDF: Use the UI to upload a document.
- Processing: The app extracts text and chunks it for indexing.
- Ask Questions: Enter a question in the chat box.
- Get Answers: The system retrieves relevant text and responds concisely.
To change the model used for inference, you can modify the LLM
variable in the pdf_rag.py
file. The LLM
variable is initialized with the deepseek-r1:8b
model by default. You can replace it with any other model supported by Ollama
.
chat-with-pdf/
│── pdfs/ # Directory for uploaded PDFs
│── pdf_rag.py # Main Streamlit app
│── requirements.txt # Dependencies
│── README.md # Documentation
│── test_pdf_rag.py # Unit Test
- Python
- Streamlit (for UI)
- LangChain (for text processing)
- Ollama (for LLM inference)
- PDFPlumber (for PDF extraction)
Feel free to submit issues and PRs to improve the project! And follow this steps:
- Before submitting PRs, please update the corresponding test cases.
- Please attach a screen recording video to the PR description showing that all functionality is working properly.
Special thanks to the creators of LangChain, Ollama, Streamlit and the community for enabling this functionality.