Skip to main content

Azure OpenAI Service pricing

Azure OpenAI Service pricing overview

Unlock the power of Azure OpenAI Service's generative AI models with flexible Standard (On-Demand) and Provisioned Throughput Units (PTUs). The Standard model lets you pay only for tokens processed, while PTUs ensure consistent throughput and minimal latency variance for scalable solutions. Pricing includes costs per 1,000 tokens, and PTU rates provide a predictable cost structure. Azure OpenAI Service offers advanced capabilities like GPT-4o, fine-tuning for customization, DALL-E for image generation, and Whisper for speech-to-text. For personalized guidance on optimizing AI deployments, contact a sales specialist.

Explore pricing options

Apply filters to customise pricing options to your needs.

Prices are estimates only and are not intended as actual price quotes. Actual pricing may vary depending on the type of agreement entered with Microsoft, date of purchase, and the currency exchange rate. Prices are calculated based on US dollars and converted using London closing spot rates that are captured in the two business days prior to the last business day of the previous month end. If the two business days prior to the end of the month autumn on a bank holiday in major markets, the rate setting day is generally the day immediately preceding the two business days. This rate applies to all transactions during the forthcoming month. Sign in to the Azure pricing calculator to see pricing based on your current programme/offer with Microsoft. Contact an Azure sales specialist for more information on pricing or to request a price quote. See frequently asked questions about Azure pricing.

Pricing details:

Language models

Models Context Input (Per 1,000 tokens) Output (Per 1,000 tokens) Price per PTU per Hour Minimum Scaling Increment Monthly Reservation per PTU Yearly Reservation per PTU
GPT-4o Global Deployment 128K $- $- N/A N/A N/A N/A
GPT-4o Regional API 128K $- $- $- 50 PTUs $- $-
GPT-4o-mini Global Deployment 128K $- $- N/A N/A N/A N/A
GPT-4o-mini Regional API 128K $- $- $- 25 PTUs $- $-
GPT-3.5-Turbo-0125 16K $- $- $- 100 PTUs $- $-
GPT-3.5-Turbo-Instruct 4K $- $- N/A N/A N/A N/A
GPT-4-Turbo 128K $- $- $- 100 PTUs $- $-
GPT-4-Turbo-Vision 128K $- $- N/A N/A N/A N/A
GPT-4 8K $- $- $- 50 PTUs $- $-
GPT-4 32K $- $- $- 200 PTUs $- $-

This table provides a detailed comparison of Standard (On-Demand) versus Provisioned (PTU) pricing for various language models. The 'Context' column specifies the maximum number of tokens each model can handle per response. Pricing details for input and output tokens are listed, reflecting the cost per 1,000 tokens. The PTU pricing model includes an hourly rate and a minimum scaling increment, representing the minimum number of PTUs required for each model. The "Monthly Reservation per PTU" and "Yearly Reservation per PTU" columns indicate the reservation costs per PTU. This comparison helps users understand the cost implications of using each model under both Standard (On-Demand) and Provisioned (PTU) billing options, allowing for informed decisions based on their specific usage needs.

Language models are also now available in the Batch API that returns completions within 24 hours for a 50% discount.

Legacy Language Models

Models Context Input (Per 1,000 tokens) Output (Per 1,000 tokens)
GPT-3.5-Turbo-0301 4K $- $-
GPT-3.5-Turbo-0613 4K $- $-
GPT-3.5-Turbo-0613 16K $- $-
GPT-3.5-Turbo-1106 16K $- $-

Assistants API

The Assistants API and its tools make it easy for developers to build AI Assistants in their applications.

The tokens used for the Assistants API are billed at the chosen language model's per token input/output rates used with each Assistant. Additionally, we charge the following fees for tool usage:

Tool Input
File Search* $-/GB of vector-storage per day (1 GB free)
Code Interpreter** $-/session

*GB refers to binary gigabytes, where 1 gb is 2^30 bytes.

**If your assistant calls Code Interpreter simultaneously in two different threads, this would create two Code Interpreter sessions (2 * $-). Each session is active by default for one hour, which means that you would only pay this fee once if your user keeps giving instructions to Code Interpreter in the same thread for up to one hour.

Inference cost (input and output) varies based on the GPT model used with each Assistant. If your assistant calls Code Interpreter simultaneously in two different threads, this would create two Code Interpreter sessions (2 * $-). Each session is active by default for one hour, which means that the price is for up to one hour of giving instructions to Code Interpreter in the same thread.

Base models

Models Usage per 1,000 tokens
Babbage-002 $-
Davinci-002 $-

Fine-tuning models

Models Training per 1,000 tokens Hosting per hour Input Usage per 1,000 tokens Output Usage per 1,000 tokens
Babbage-002 $- $- $- $-
Davinci-002 $- $- $- $-
GPT-3.5-Turbo (4K) $- $- $- $-
GPT-3.5-Turbo (16K) $- $- $- $-

Image models

Models Quality Resolution Price (per 100 images)
Dall-E-3 Standard 1024 * 1024 $-
Standard 1024 * 1792,
1792 * 1024
$-
Dall-E-3 HD 1024 * 1024 $-
HD 1024 * 1792,
1792 * 1024
$-
Dall-E-2 Standard 1024 * 1024 $-

Embedding models

Models Per 1,000 tokens
Ada $-
text-embedding-3-large $-
text-embedding-3-small $-

Speech Models

Models Price
Whisper $-/hour
TTS (Text to Speech) $-/1M characters
TTS HD $-/1M characters

Azure pricing and purchasing options

Connect with us directly

Get a walkthrough of Azure pricing. Understand pricing for your cloud solution, learn about cost optimisation and request a customised proposal.

Talk to a sales specialist

See ways to purchase

Purchase Azure services through the Azure website, a Microsoft representative or an Azure partner.

Explore your options

Additional resources

Azure OpenAI Service

Learn more about Azure OpenAI Service features and capabilities.

Pricing calculator

Estimate your expected monthly costs for using any combination of Azure products.

SLA

Review the Service Level Agreement for Azure OpenAI Service.

Documentation

Review technical tutorials, videos, and more Azure OpenAI Service resources.

  • Azure OpenAI Service offers pricing based on both Pay-As-You-Go and Provisioned Throughput Units (PTUs). Pay-As-You-Go allows you to pay for the resources you consume, making it flexible for variable workloads. PTUs offers a predictable pricing model where you reserve and deploy a specific amount of model processing capacity. This model is ideal for workloads with consistent or predictable usage patterns, providing stability and cost control.
  • To learn more about PTUs and Azure Open AI pricing please read PTU documentation or contact our sales specialist

Talk to a sales specialist for a walk-through of Azure pricing. Understand pricing for your cloud solution.

Get free cloud services and a $200 credit to explore Azure for 30 days.

Added to estimate. Press 'v' to view on calculator
Can we help you?