Skip to content

This project demonstrates a basic Retrieval-Augmented Generation (RAG) system that loads documents from text files, websites, and PDF files.

Notifications You must be signed in to change notification settings

Natan-Asrat/langchain_simple_rag

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Langchain Simple RAG Project

This project demonstrates a basic Retrieval-Augmented Generation (RAG) system that loads documents from text files, websites, and PDF files.

View Live Site here

live

The implementation can be found in rag/simplerag.ipynb and follows the steps outlined below:

Document Loading

  • Text Files: Loaded using TextLoader from langchain_community.document_loaders, with an example file located at rag/robotics.txt.
    text loader

  • Web Content: Loaded from Wikipedia using WebBaseLoader.
    web loader

  • PDF Files: Loaded using PyPDFLoader from the same library, with the PDF located at rag/robotics.pdf.
    pdf loader

Text Splitting

  • Used RecursiveCharacterTextSplitter from langchain.text_splitter to split documents into 1,000-character chunks with 200-character overlap.
    text splitter

Embedding with HuggingFace

  • Tokenized and embedded document chunks using the sentence-transformers/all-MiniLM-L6-v2 model from HuggingFace.
    embeddings

Vector Store with ChromaDB

  • Implemented a vector store using ChromaDB to manage embeddings.
    vector store

Document Chain with Prompt Template

  • Created a document chain using a custom prompt that takes user queries as input and relevant document chunks as context, powered by create_stuff_documents_chain from langchain.chains.combine_documents and the ChatGroq model.
    doc chain

Retrieval Chain with ChromaDB

  • Built a retriever using db.as_retriever() and constructed a retrieval chain via create_retrieval_chain from langchain.chains.retrieval.
    retrieval chain answer

Setup

  • Create virtual environment: python -m venv venv
  • Activate virtual environment: call venv/Scripts/activate.bat in cmd
  • Install dependencies: pip install -r requirements.txt
  • Create environment variables LANGCHAIN_API_KEY and GROQ_API_KEY. You can get your langchain api key from here, and your groq api key from here.

Libraries

  • Langchain
  • Langchain Groq
  • Streamlit
  • Python-Dotenv
  • Langchain
  • PyPDF
  • bs4
  • chromadb
  • transformers
  • torch

Contact

About

This project demonstrates a basic Retrieval-Augmented Generation (RAG) system that loads documents from text files, websites, and PDF files.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published