RAG (Retrievel Augmented Generation) implementation using ChromaDB, Mistral-7B-Instruct-v0.1 and gte-base for embeddings.
This repository hosts the implementation of a sophisticated Retrieval Augmented Generation (RAG) model, leveraging the cutting-edge Mistral 7B model for Language Generation. It utilizes the gte-base model for embedding and ChromaDB as the vector database to store these embeddings. This project is embodied in a Google Colab notebook, fine-tuned for an A100 instance.
The implementation queries data from the “Climate Change 2023 Synthesis Report,” allowing for the extraction of in-depth, coherent, and relevant information pertaining to climate change. With a context window of 8000, the results demonstrate impressive coherence, precise data match-retrieval, and low latency, making it a valuable tool for processing extensive datasets.
RAG_Chromadb_mistral7b.ipynb
: The main Google Colab notebook containing the complete implementation and execution details.
- Google Colab with A100 instance.
- Familiarity with RAG, gte-base model, Mistral 7B, and ChromaDB.
- Clone the repository:
git clone https://github.com/mickymult/RAG-ChromaDB-Mistral7B.git
- Open the
RAG_Chromadb_mistral7b.ipynb
notebook in Google Colab. - Set up the environment with the necessary libraries and dependencies.
- Run the notebook cells in sequence.
Mistral 7B serves as the foundational Language Model, producing coherent, contextually relevant responses based on retrieved documents.
The gte-base model is used for embedding sentences, which facilitates efficient and semantically rich similarity search among them.
ChromaDB is used as the vector database for storing the embedded representations of the data, ensuring efficient data retrieval.
The implementation is based on the “Climate Change 2023 Synthesis Report,” enabling detailed inquiry into the comprehensive insights on climate change covered in the report.
The context window has been increased to 8000 tokens to allow for more extensive contextual understanding and coherence in the generated responses.
The implementation yields highly coherent responses with accurate data match-retrieval and minimal latency, demonstrating its effectiveness in handling extensive and complex datasets like the Climate Change 2023 Synthesis Report.
Execute the implementation by running the cells in sequence in the RAG_Chromadb_mistral7b.ipynb
notebook on Google Colab.
Feel free to contribute to the enhancement of this implementation.
This project is distributed under the MIT License.
- Mistral 7B for providing the advanced language model.
- gte-base for the robust embedding model.
- ChromaDB for efficient vector database storage.
- The authors and contributors to the Climate Change 2023 Synthesis Report.
For any inquiries, discussions, or clarifications related to this implementation, please create an issue in this GitHub repository.