It took less than a week for OpenAI’s ChatGPT to reach a million users, and it crossed the 100 million user mark in under two months. The interest and excitement around this technology has been remarkable. Users around the world are seeing potential for applying these large language models to a broad range of scenarios.
In the context of enterprise applications, the question we hear most often is “how do I build something like ChatGPT that uses my own data as the basis for its responses?”
The combination of Azure Cognitive Search and Azure OpenAI Service yields an effective solution for this scenario. It integrates the enterprise-grade characteristics of Azure, the ability of Cognitive Search to index, understand and retrieve the right pieces of your own data across large knowledge bases, and ChatGPT’s impressive capability for interacting in natural language to answer questions or take turns in a conversation.
In this blog post we’ll describe the above solution pattern, from the internals of orchestrating conversation and knowledge bases to the considerations in user experience necessary to help end users judge responses and their supporting facts appropriately. Our goal is to give you the tools necessary to build ChatGPT-powered applications starting today, using the "gpt-35-turbo" model that's now in preview. We’re also releasing a GitHub repo with examples, including UX, orchestration, prompts, etc., that you can use to learn more or as a starting point for your own application.
The way you interact with large language models like ChatGPT is using natural language, giving the model a “prompt” and requesting it to complete it. This could be a question, a conversation turn, a pattern to extend, etc. When used this way, the responses you get are based on what the model has learned during training, which can be useful for general knowledge questions or an informal chat, but not what you want if you’re building an application where users should see responses based on your own data. You can see in this example (asking about employee healthcare plans) using the base ChatGPT model, the response (in green) is not useful since it doesn’t account for the company’s own benefits policy:
<|im_start|>system Assistant helps Contoso Inc employees with their internal healthcare plan questions. <|im_end|> <|im_start|>user Does my plan cover annual eye exams? <|im_end|> <|im_start|>assistant I'm sorry, I'm not able to access your specific healthcare plan information. However, many healthcare plans do cover annual eye exams. I recommend checking your plan's summary of benefits or contacting your plan administrator for more information.
One approach to have ChatGPT generate responses based on your own data is simple: inject this information into the prompt. ChatGPT can read the information along with any instructions, context or questions, and respond accordingly. This approach doesn’t need retraining or fine-tuning of the model, and the responses can reflect any changes in the underlying data immediately.
This presents a new challenge though: these models have a limit on the “context length” they support (the current ChatGPT model can take up to 4000 tokens in a prompt), and even if they didn’t have those limits, it wouldn’t be practical to inject GBs worth of data into a text prompt in each interaction. The alternative is to keep all the data in an external knowledge base that can retrieve pieces quickly and with good relevance, exactly what Cognitive Search was designed for. Here’s what a starting point prompt and response could look like:
<|im_start|>system Assistant helps Contoso Inc employees with their internal healthcare plan questions. Only answer questions based on the fact listed below. If the facts below don't answer the question, say you don't know. Each fact sentence is numbered. Always include a reference to the fact number in the form [number] when you answer. 1. Employees have a choice of two plans, PlanA and PlanB, and they have different benefits. 2. PlanA includes coverage for annual eye and hearing exams. 3. Those with PlanB have a co-payment for routine eye exams and vision-related purchases such as glasses. 4. Both PlanA and PlanB cover preventive care visits. <|im_end|> <|im_start|>user Does my plan cover annual eye exams? <|im_end|> <|im_start|>assistant PlanA includes coverage for annual eye and hearing exams. [2]
In the example the blue parts were retrieved from Cognitive Search based on the user’s question and any accumulated context that might be applicable, the green part is the response generated by the model, and the rest is the prompt template we used to give the model instructions.
This retrieval-augmented generation approach opens the door for starting simple and getting more sophisticated as needed. There are many options for how to construct prompts, how to formulate queries for effective retrieval from the knowledge base, and how to orchestrate back-and-forth interaction between ChatGPT and the knowledge base. Before we dig into those, let’s talk about one more requirement: helping users validate that responses are trustworthy.
We assume these large language models, prompts, and orchestration systems aren’t perfect, and see the responses generated by them as a candidate response that should include the right information for an end user to validate. As part of exploring this topic we implemented 3 simple experiences as starting points. That’s not to say these are the only ones; we welcome ideas and feedback on the best way to give users better tools to validate that results from the system are factually correct.
As you can see in the picture below, when we produce a response in our examples, we also offer the user 3 “drill down” tools:
Each of these options may or may not be useful for users depending on the audience. There are other options to offer transparency and validation tools for users to have confidence in responses. In particular, in this blog post and initial version of the example code we don’t tackle the critical topic of methods that can be implemented within the application to evaluate quality of responses and possibly reject or retry cases that don’t meet certain criteria. We encourage application developers to explicitly explore this topic in the context of each application experience.
Approaches for more effective prompt design, retrieval query construction, and interaction models between components are emerging quickly. This is a nascent space where we expect to see lots of rapid progress. Here’s a small sampling of starting points for prompt and query generation, with references to literature for those interested in more detail:
The samples that accompany this blog post implement some of these, either directly or through open-source libraries such as Langchain. Just to cherry pick a particular example, the user chat turn for “I have the plus plan” in the screenshot below wouldn’t yield a good answer using a naïve retrieve-then-read approach, but works well with a slightly more sophisticated implementation that carries the context of the conversations:
Since responses will ultimately be based on what we’re able to retrieve from the knowledge base, quality of retrieval becomes a significant aspect of these solutions. Here are a few considerations:
The accompanying sample code includes functionality to easily experiment with some of the options above (click settings icon on the window top-right).
In this blog post we focused on conversation and question answering scenarios that combine ChatGPT from Azure OpenAI with Azure Cognitive Search as a knowledge base and retrieval system. There are other ways in which Azure OpenAI Service and Cognitive Search can be combined to improve existing scenarios or enable new ones. Examples include using natural language for query formulation, powering catalog browsing experiences, and using Azure OpenAI at indexing time to enrich data. We plan on continuing to publish guidance and examples to illustrate how to accomplish many of these.
We posted a few examples, including the complete UX shown in this blog post, in this GitHub repo. We plan on continuously expanding that repo with a focus on covering more scenarios.
You can clone this repo and either use the included sample data or adapt it to use your own. We encourage you to take an iterative approach. Data preparation will take a few tries. Start by uploading what you have and try out the experience.
We’re excited about the prospect of improved and brand-new scenarios powered by the availability of large language models combined with information retrieval technology. We look forward to seeing what you will build with Azure OpenAI and Azure Cognitive Search.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.