Save Big on Hosting Your Fine-Tuned Models on Azure OpenAI Service
Published Jul 18 2024 12:54 PM 4,790 Views
Microsoft

We've heard your feedback loud and clear: folks want to fine tune their models, but the pricing can make experimentation too expensive. Following our update last month to switch to token based billing for training, we're reducing the hosting charges for many of your favorite models! 

 

Starting from July 1, we have reduced the hosting charges for many Azure OpenAI Service fine-tuned models, including our most popular models - the GPT-35-Turbo family. For folks less familiar with our service, models need to be deployed before they can be used for inferencing – and when deployed, we charge an hourly rate for hosting models. Don't need to use your model right away? We store up to 100 non-deployed fine tuned models per resource, for free! 

 

The new prices are published on the Azure OpenAI Service Pricing page, and listed below: 

Base Model 

Previous Price 

New Price 
(Effective July 1, 2024) 

Babbage-002 

$1.70 / hour 

$1.70 / hour 

Davinci-002 

$2.00 / hour 

$1.70 / hour (15% off) 

GPT-35-Turbo (4K) 

$3.00 / hour 

$1.70 / hour (43% off) 

GPT-35-Turbo (16K) 

$3.00 / hour 

$1.70 / hour (43% off) 

 

Why do we charge for hosting? When you deploy a fine tune model, you're covered by the same Azure OpenAI SLAs as our base models, with 99.9% uptime, and hosted continuously on Azure infrastructure rather than being loaded on demand. This means that once your model is deployed, there's no wait for inferencing. And, because you're paying for your deployment, we charge a relatively low price for inferencing (the same as the equivalent base model). 

When comparing different services, you can consider the tradeoff between a fixed price for hosting and a higher per-token rate for inferencing. Because Azure OpenAI has a fixed hosting cost and low inferencing charges, for heavier inferencing workloads it may be much cheaper compared to services that just charge a premium on tokens. For example, if we assume a standard 8:1 ratio for input to output tokens and compare the costs of using a fine-tuned GPT-35-Turbo model, when your workload surpasses ~700K tokens / hour (~12K TPM), Azure OpenAI becomes the cheaper option. 

AliciaFrame_0-1721331569460.png

 

We hope this will make it easier for you to use these models and explore their capabilities. Thank you for choosing Azure OpenAI Service. Happy fine tuning! 

1 Comment
Co-Authors
Version history
Last update:
‎Jul 18 2024 12:54 PM
Updated by: