NVIDIA ChatRTX User Guide

Updated 05/01/2024 06:11 AM
NVIDIA ChatRTX User Guide

 
Introduction
ChatRTX is a demo app that lets you personalize a GPT large language model (LLM) connected to your own content—docs, notes, photos, or other data. Leveraging retrieval-augmented generation (RAG), TensorRT-LLM, and RTX acceleration, you can query a custom chatbot to quickly get contextually relevant answers. And because it all runs locally on your Windows RTX PC or workstation, you’ll get fast and secure results.
 
ChatRTX supports various file formats, including text, pdf, doc/docx (with LLMs), and jpeg, gif, and png (with CLIP). Simply point the application at the folder containing your files and it'll load them into the library in a matter of seconds. The ChatRTX tech demo is built from the TensorRT-LLM RAG developer reference project available from github. Developers can use that reference to develop and deploy their own RAG-based applications for RTX, accelerated by TensorRT-LLM.

Prerequisites
  • ChatRTX is currently built for RTX 3xxx and RTX 4xxx series GPUs that have at least 8GB of GPU memory (vGPU configurations are not currently supported)
  • At least 100 GB of available hard disk space
  • Windows 10/11
  • Latest NVIDIA GPU drivers
 
Installation Tips
  • The installer will download various software libraries, AI model weights and engine files. The total size of download will be approximately 11 GB depending on the models that are selected. The download and installation should take between 10 and 30 minutes depending on your internet connection and the load on the servers.
  • Please make sure that your system’s sleep functionality is disabled during the install process
  • If the installation fails with an error message. Rerun the installer and it will resume from where it stopped and continue with the installation process
  • If the installation fails after installing some of the components. Please select ‘do a clean install’ on the next installation attempt. 
  • Even though the installer includes most of the required large files, it still has to download a few files from public servers. If these servers are down, then the installer may fail or stall temporarily
  • If you choose to install the app in a different folder other than the default install location, please make sure that there are no spaces in the folder path or folder name. This is a known issue that will be fixed in a future release
  • If the installation keeps failing after multiple attempts, please delete the following folder before attempting to install: C:\Users\<username>\AppData\Local\NVIDIA\RAG
 
Installation Steps
  • Double click on the setup.exe file to launch the installer. The installer will check system compatibility by verifying that your system has a compatible GPU.
  • You can either choose the default installation folder or choose a different folder by clicking the ‘Browse’ button and selecting a custom folder location.
  • After the installation is complete, a desktop icon will be created and the app will be launched.
  • A browser window tab will be opened displaying the ChatRTXUser Interface as seen in the below image.  Concurrently, a Windows Command Prompt will also be displayed that will show error logs.

     
Chatting with your Data
  • A quick walkthrough of the application covering the below information is also provided in this video.
  • The application will default to the Mistral (specifically, Mistral 7B int4) model and to the default dataset folder that contains a collection of GeForce news articles. You can chat and ask questions on this collection of news articles or point the app to your own data folder.
  • The app currently works with .txt, .pdf and .doc file formats.
  • You can select other TensorRT-LLM compatible models that you have installed (e.g. Llama 2 7B int4) by clicking the selection box labeled “Select AI Model”
  • You can add AI models to the application by clicking on the “Add new models” option and selecting a model from the available list. This will download the AI model to your local system
  • You can point to your dataset of choice by clicking on the pen icon next to the row that shows the current data folder path and navigating to the desired folder. The default dataset (the one that is loaded at first startup) is a sampling of articles recently posted on GeForce news. Sample questions for this dataset are also provided as buttons on the UI.
  • When a new data folder is chosen, the app has to recreate the dataset vector embeddings using the documents contained in your chosen folder. The time taken to do this will vary depending on the size and number of files in the folder.
  • After the app has recreated the vector embeddings you can chat with this new dataset.
  • If you add new files to the folder you had selected, then the vector embeddings of the folder have to be regenerated.  After the files are added, regenerate the embeddings  by clicking on the ‘Refresh’ icon located on the right of the ‘Dataset’ cell
Note: The accuracy and relevance of the responses is determined by specificity of the question being asked, the accuracy of the AI model that is used and the accuracy of the dataset. 
 
Using ChatRTX without a dataset
The application uses a technique called Retrieval Augmented Generation (RAG) to look up the local files you point it to and use that information to provide context when it submits your question to the LLM.  Disabling RAG will result in the LLM generating responses purely based on the data with which it was originally trained.  In order to see how the LLM would respond without RAG, you can disable RAG by selecting “AI Model Default” from the pulldown menu on the right (see image below)
 
Using the CLIP Vision and Language Model
In addition to the pre-installed Mistral LLM model, you can download and install the CLIP vision and language model from the ‘Add new models’ option. After the model is installed you can point the app to your folder of jpeg images and chat with your images. These pictures don’t have to be tagged. You can ask questions such as “Show me images that have cats in them”,  “Show me  pictures taken outdoors”,  “Show me images that have flowers” and other such questions. The accuracy of responses to your questions is determined by the CLIP model training and accuracy.  
 
Using voice to input your questions
This version of ChatRTX has also integrated the Whisper model that does Audio to Text translation. To use this feature, make sure that the microphone on your system is enabled and click on the ‘mic’ icon and ask your question. When you are done asking your question, click on the ‘stop’ icon to stop the recording.  The application will recognize  and output your question into the chat window. You can then click on ‘Send’ to present the text to the LLM for a response. The Whisper model supports multiple languages such as French, Spanish, Mandarin and others. 
 
Guidelines on query results
The data that ChatRTX loads into the vector library is broken into chunks (you can think of it like a paragraph in a document), which are selected, based on their relevance, to formulate a response to a query. This method of storing the data makes ChatRTX good for queries that request information that is covered in few chunks across the dataset, but not good for queries that involve reasoning about the entire dataset at once. For example, asking for some facts covered in a couple of documents is likely to give better results than asking for a summary of a document or set of documents. 
 
As with most AI use cases, the response quality tends to improve with more data. Pointing ChatRTX at more content about a specific subject will tend to result in better responses. 
 
Closing the Application
  • To close the application, click on the Power button icon on the upper right corner of the application. This will close the application. In the Command Prompt window, hit any key on your keyboard to close out the application backend.
 
Known Issues and Limitations
The following known issues exist in the current build
  • The app currently works with Microsoft Edge and Google Chrome browsers.  Due to a bug, the application does not work with FireFox browser. This will be fixed in a future release.
  • The app does not remember context. This means follow up questions  will not be answered based on the context of the previous questions. For example, if you previously asked “What is the price of the RTX 4080 Super?” and follow that up with “What are its hardware specifications?”, the app will not know that you are asking about the RTX 4080 Super.
  • The source file attribution in the response is not always correct. This will be improved in a later release.
  • We have observed some instances where the app gets stuck in an unusable state that cannot be resolved by restarting. This can often be fixed by deleting the preferences.json file (by default located at C:\Users\<user>\AppData\Local\NVIDIA\ChatWithRTX\RAG\trt-llm-rag-windows-main\config\preferences.json) 
  • In the rare case that a reinstallation fails, try removing the install directory (by default located at C:\Users\<user>\AppData\Local\NVIDIA\ChatWithRTX)
  • If you choose to install the app in a different folder other than the default install location, please make sure that there are no spaces in the folder path or folder name

Is this answer helpful?

Live Chat

Chat online with one of our support agents

CHAT NOW

ASK US A QUESTION

Contact Support for assistance

800.797.6530

Ask a Question