Skip to content

Chat with PDF using LangChain, Streamlit, Ollama (for LLM inference) and PDFPlumber. Overall which is an example of a Retrieval-Augmented Generation (RAG) system with Deepseek r1 model.

Notifications You must be signed in to change notification settings

hasan-py/chat-with-pdf-RAG

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Chat with Your PDFs using RAG

This project allows you to upload a PDF and ask questions about its content using Deepseek R1 via Ollama. The application processes PDFs, extracts text, indexes them into a vector store, and retrieves relevant context to generate concise answers.

Features

  • 📂 Upload a PDF: Select a PDF file to process.
  • 🔍 Text Extraction & Indexing: Extracts content and indexes it for efficient search.
  • 💡 Question-Answering: Ask questions related to the PDF content and get relevant answers.
  • 🚀 Powered by Ollama & LangChain: Uses Deepseek R1 for embeddings and responses.

Installation

Prerequisites

  • Python 3.8+
  • Ollama installed
  • Dependencies installed via pip

Setup

  1. Clone this repository:

    git clone https://github.com/hasan-py/chat-with-pdf-RAG.git
    cd chat-with-pdf-RAG

    Activate your python env and install the dependencies.

  2. Install dependencies:

    pip install -r requirements.txt
  3. Run the Streamlit app:

    streamlit run pdf_rag.py

How It Works

  1. Upload a PDF: Use the UI to upload a document.
  2. Processing: The app extracts text and chunks it for indexing.
  3. Ask Questions: Enter a question in the chat box.
  4. Get Answers: The system retrieves relevant text and responds concisely.

How to change model?

To change the model used for inference, you can modify the LLM variable in the pdf_rag.py file. The LLM variable is initialized with the deepseek-r1:8b model by default. You can replace it with any other model supported by Ollama.

File Structure

chat-with-pdf/
│── pdfs/                   # Directory for uploaded PDFs
│── pdf_rag.py              # Main Streamlit app
│── requirements.txt        # Dependencies
│── README.md               # Documentation
│── test_pdf_rag.py         # Unit Test 

Technologies Used

  • Python
  • Streamlit (for UI)
  • LangChain (for text processing)
  • Ollama (for LLM inference)
  • PDFPlumber (for PDF extraction)

Contributing

Feel free to submit issues and PRs to improve the project! And follow this steps:

  • Before submitting PRs, please update the corresponding test cases.
  • Please attach a screen recording video to the PR description showing that all functionality is working properly.

Acknowledgments

Special thanks to the creators of LangChain, Ollama, Streamlit and the community for enabling this functionality.

About

Chat with PDF using LangChain, Streamlit, Ollama (for LLM inference) and PDFPlumber. Overall which is an example of a Retrieval-Augmented Generation (RAG) system with Deepseek r1 model.

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages