Deploying Hugging Face Hub models in Azure Machine Learning

ManojBableshwar · ‎May 23 2023

We’re excited to share that Microsoft has partnered with Hugging Face to bring open-source models to Azure Machine Learning. Hugging Face is the creator of Transformers, a widely popular library for working with over 200,000 open-source models hosted on the Hugging Face hub. Thanks to this partnership, you can now find thousands of transformer models in the new Hugging Face collection Azure ML model catalog and deploy them in just a few clicks on managed endpoints running on secure and scalable Azure infrastructure.

This interoperability expands the partnership we announced last year when we launched Azure ML-powered Hugging Face endpoints in Azure Marketplace. It further simplifies the experience of deploying large language models on Azure by featuring the models in Azure ML Studio alongside your notebooks, models, and pipelines. While Transformer models are easy to explore and try on the HuggingFace hub, they can be challenging to deploy and scale for production-grade inference endpoints, especially if you work with IaaS options like virtual machines or PaaS options such as Kubernetes. For example, you need to secure endpoints with enterprise-grade authentication, safely test and roll out new versions without disrupting production applications, and scale dynamically as production workloads change.

AzureML managed online endpoints are purpose built for secure and scalable inference

Managed online endpoints in Azure ML help you deploy models to powerful CPU and GPU machines in a turnkey manner. They take care of serving, scaling, securing, and monitoring your models, freeing you from the overhead of setting up and managing the underlying infrastructure. They also automatically provision the underlying virtual machines. To simplify the testing and rollout of new model versions, you can split or mirror endpoint traffic. Mirroring traffic helps you test new model versions with production traffic without releasing them in production. Splitting traffic lets you gradually increase production traffic to new model versions while observing their performance. Auto scale allows you to dynamically ramp up or ramp down infrastructure resources based on traffic. You can configure scaling based on utilization metrics, a specific schedule, or a combination of both. An example of scaling based on utilization metrics is adding nodes if CPU utilization goes higher than 70%. An example of schedule-based scaling is adding nodes during peak business hours. Learn more about safe rollout and auto scaling.

Deploying Hugging Face models in AzureML is easy

Log in to workspace in AzureML Studio, open the model catalog, and follow these simple steps:

Open the Hugging Face registry in AzureML studio.
Click on the Hugging Face collection.
Filter by task or license and search the models.
Click the model tile to open the model page and choose the real-time deployment option to deploy the model.
Once deployed, the REST endpoint you can use to score the model is visible in the endpoints page.

While exploring models on the Hugging Face hub, you can also deploy a model to AzureML directly from its model page on the Hugging Face Hub by clicking “Deploy” and picking “AzureML” from the menu.

Watch Jeff Boudier, Product Director at Hugging Face, introduce the experience:

"The integration of Hugging Face's open-source models into Azure Machine Learning represents our commitment to empowering developers with industry-leading AI tools," said John Montgomery, Corporate Vice President, Azure AI Platform at Microsoft. "This collaboration not only simplifies the deployment process of large language models but also provides a secure and scalable environment for real-time inferencing. It's an exciting milestone in our mission to accelerate AI initiatives and bring innovative solutions to the market swiftly and securely."

“With over 200,000 open source models now available on the Hugging Face Hub, it’s never been easier to start your machine learning journey using open source models and libraries - but deploying these models to production remains a challenge today.” said Julien Simon, Chief Evangelist at Hugging Face. “With the new Hugging Face Hub model catalog, natively integrated within Azure Machine Learning, we are opening a new page in our partnership with Microsoft, offering a super easy way for enterprise customers to deploy Hugging Face models for real-time inference, all within their secure Azure environment.”

We are launching this experience in preview for PyTorch Transformer models. Eleven natural language processing tasks such as text classification, summarization, etc. are supported. Diffuser models and additional tasks such as computer vision are on the roadmap.

Resources to get started

Get an Azure free account and setup your AzureML workspace.
Explore the model catalog in AzureML studio and deploy models.
Review the documentation to learn how to programmatically deploy using the AzureML Python SDK or CLI, find options for support and understand how the model catalog is populated.

Want to learn more about other exciting features we are announcing at Build 2023?

Watch technical breakout sessions featuring AzureML:

Read blogs about new features in AzureML:

Happy deploying and scoring,

Manoj Bableshwar and Vaidyaraman Sambasivam

Azure Machine Learning

Products (49)

Special Topics (27)

Video Hub (462)

Most Active Hubs

Most Active Hubs

Video Hub

Deploying Hugging Face Hub models in Azure Machine Learning