Integrate Azure OpenAI into your app

Completed

Azure OpenAI offers both language specific SDKs and a REST API that developers can use to add AI functionality to their applications. Generative AI capabilities in Azure OpenAI are provided through models. The models available in the Azure OpenAI service belong to different families, each with their own focus. To use one of these models, you need to deploy through the Azure OpenAI Service.

Create an Azure OpenAI resource

An Azure OpenAI resource can be deployed through both the Azure command line interface (CLI) and the Azure portal. Creating the Azure OpenAI resource through the Azure portal is similar to deploying individual Azure AI Services resources, and is part of the Azure AI Services services.

  1. Navigate to the Azure portal
  2. Search for Azure OpenAI, select it, and click Create
  3. Enter the appropriate values for the empty fields, and create the resource.

The possible regions for Azure OpenAI are currently limited. Choose the region closest to your physical location, or the closest one that has the availability for the model(s) you want to use.

Once the resource has been created, you'll have keys and an endpoint that you can use in your app.

Choose and deploy a model

Each model family excels at different tasks, and there are different capabilities of the models within each family. Model families break down into three main families:

  • Generative Pre-trained Transformer (GPT) - Models that understand and generate natural language and some code. These models are best at general tasks, conversations, and chat formats.
  • Code (gpt-3 and earlier) - Code models are built on top of GPT models, and trained on millions of lines of code. These models can understand and generate code, including interpreting comments or natural language to generate code. gpt-35-turbo and later models have this code functionality included without the need for a separate code model.
  • Embeddings - These models can understand and use embeddings, which are a special format of data that can be used by machine learning models and algorithms.

This module focuses on general GPT models, with other models being covered in other modules.

For older models, the model family and capability is indicated in the name of the base model, such as text-davinci-003, which specifies that it's a text model, with davinci level capability, and identifier 3. Details on models, capability levels, and naming conventions can be found on the Azure OpenAI Models documentation page.

More recent models specify which gpt generation, and if they are the turbo version, such as gpt-35-turbo representing the GPT 3.5 Turbo model.

To deploy a model for you to use, navigate to the Azure AI Studio and go to the Deployments page. The lab later in this module covers exactly how to do that.

Authentication and specification of deployed model

When you deploy a model in Azure OpenAI, you choose a deployment name to give it. When configuring your app, you need to specify your resource endpoint, key, and deployment name to specify which deploy model to send your request to. This enables you to deploy various models within the same resource, and make requests to the appropriate model depending on the task.

Prompt engineering

How the input prompt is written plays a large part in how the AI model will respond. For example, if prompted with a simple request such as "What is Azure OpenAI", you often get a generic answer similar to using a search engine.

However, if you give it more details about what you want in your response, you get a more specific answer. For example, given this prompt:

Classify the following news headline into 1 of the following categories: Business, Tech, Politics, Sport, Entertainment

Headline 1: Donna Steffensen Is Cooking Up a New Kind of Perfection. The Internet’s most beloved cooking guru has a buzzy new book and a fresh new perspective
Category: Entertainment

Headline 2: Major Retailer Announces Plans to Close Over 100 Stores
Category:

You'll likely get the "Category:" under headline filled out with "Business".

Several examples similar to this one can be found in the Azure AI Studio Playground, under the Prompt samples dropdown. Try to be as specific as possible about what you want in response from the model, and you may be surprised at how insightful it can be!

Note

It is never safe to assume that answers from an AI model are factual or correct. Teams or individuals tasked with developing and deploying AI systems should work to identify, measure, and mitigate harm. It is your responsibility to verify any responses from an AI model, and to use AI responsibly. Check out Microsoft's Transparency Notes on Azure OpenAI for further guidelines on how to use Azure OpenAI models responsibly.

Further details can be found at the Prompt engineering documentation page.

Available endpoints

Azure OpenAI can be accessed via a REST API or an SDK available for Python, C#, JavaScript, and more. The endpoints available for interacting with a deployed model are used differently, and certain endpoints can only use certain models. The available endpoints are:

  • Completion - model takes an input prompt, and generates one or more predicted completions. You'll see this playground in the studio, but won't be covered in depth in this module.
  • ChatCompletion - model takes input in the form of a chat conversation (where roles are specified with the message they send), and the next chat completion is generated.
  • Embeddings - model takes input and returns a vector representation of that input.

For example, the input for ChatCompletion is a conversation with clearly defined roles for each message:

{"role": "system", "content": "You are a helpful assistant, teaching people about AI."},
{"role": "user", "content": "Does Azure OpenAI support multiple languages?"},
{"role": "assistant", "content": "Yes, Azure OpenAI supports several languages, and can translate between them."},
{"role": "user", "content": "Do other Azure AI Services support translation too?"}

When you give the AI model a real conversation, it can generate a better response with more accurate tone, phrasing, and context. The ChatCompletion endpoint enables the ChatGPT model to have a more realistic conversation by sending the history of the chat with the next user message.

ChatCompletion also allows for non-chat scenarios, such as summarization or entity extraction. This can be accomplished by providing a short conversation, specifying the system information and what you want, along with the user input. For example, if you want to generate a job description, provide ChatCompletion with something like the following conversation input.

{"role": "system", "content": "You are an assistant designed to write intriguing job descriptions. "},
{"role": "user", "content": "Write a job description for the following job title: 'Business Intelligence Analyst'. It should include responsibilities, required qualifications, and highlight benefits like time off and flexible hours."}

Note

Completion is available for all gpt-3 generation models, while ChatCompletion is the only supported option for gpt-4 models and is the preferred endpoint when using the gpt-35-turbo model. The lab in this module uses gpt-35-turbo with the ChatCompletion endpoint.