Skip to content

Latest commit

 

History

History

chat_app

LLM Chat UI

This is a chat demo using the various versions of the LLMs

The app supports all of the CPU, CUDA and DirectML. CUDA is used as an example.

Contents:

Setup

  1. Install onnxruntime-genai-cuda

    If you want to use DirectML model, you can download onnxruntime-genai-directml package.

    pip install numpy
    pip install --pre onnxruntime-genai-cuda
    
  2. Get this example

    git clone -n --depth=1 --filter=tree:0  https://github.com/microsoft/onnxruntime-genai.git
    cd onnxruntime-genai
    git sparse-checkout set --no-cone examples/chat_app
    git checkout
    cd examples/chat_app
  3. Install the requirements

    pip install -r requirements.txt

Get the model

If you already downloaded your model, you can skip this part and add --model_path when launching the app For example. python chat_app/app.py -m "/mnt/onnx/Phi-3-vision"

cd ..
huggingface-cli download microsoft/Phi-3-vision-128k-instruct-onnx-cuda --include cuda-int4-rtn-block-32/* --local-dir .
mkdir -p models/cuda
mv cuda-int4-rtn-block-32 models/cuda-int4/Phi-3-vision

Folder structure should look as the below:

--chat_app
--models
   --directml
      --phi-3-vision-directml-int4-awq-block-128
      --meta-llama_Llama-2-7b-chat-hf
      --mistralai_Mistral-7B-Instruct-v0.1
            ...
   --cuda-int4
      --Phi-3-vision

Launch the app

python chat_app/app.py

or launch the app by python app.py.

You can also attach your model that is outside of models folder to the app by passing arguments of --model_path and --model_name.

python chat_app/app.py --model_name "Phi-3-vision" --model_path "/mnt/onnx/Phi-3-vision"

You should see output from console

Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.

Then open the local URL in broswer alt text

For vision model, you will have the below UI interface.

alt text