GenAI Gateway Toolkit using API Management (APIM)

GenAI Gateway Toolkit using API Management (APIM)

Introduction

The aim of this toolkit is to provide a quick start for deploying a GenAI Gateway using Azure API Management (APIM), and to demonstrate some of the GenAI capabilities in a controlled environment.

The APIM gateway that's provisioned by this toolkit contains policies that demonstrate different GenAI Gateway capabilities and the end-to-end tests allows to simulate different scenarios and demonstrate the capabilities by adjusting the configuration of the OpenAI API simulator that's used as a backend.

GenAI Gateway

A "GenAI Gateway" serves as an intelligent interface/middleware that dynamically balances incoming traffic across backend resources to achieve optimizing resource utilization. In addition to load balancing, GenAI Gateway can be equipped with extra capabilities to address the challenges around billing, monitoring etc.

To read more about considerations when implementing a GenAI Gateway, see this article.

Architecture

At a high level the toolkit contains 3 main components,

APIM Gateway - The API Management Gateway that will host the GenAI Gateway policies.
OpenAI API Simulator - A simple API that simulates the OpenAI API. The simulator will allow to control the latency, and response to simulate different scenarios.
End-to-End Tests - A set of tests that will demonstrate the GenAI Gateway capabilities in action. These are python scripts written on top of locust.io to simulate the traffic and demonstrate the capabilities.

Getting Started

To see the policies in action you need to set up your environment (you will need an Azure Subscription to deploy into).

For this you can either install the pre-requisites on your local machine or use the Visual Studio Code Dev Containers to set up the environment.

Using Visual Studio Code Dev Containers

Follow the Dev Containers Getting Started Guide to set up Visual Studio Code for using Dev Containers.

Once that is done, open the repository in Visual Studio Code and select Dev Containers: Reopen in Container from the command palette. This will create an environment with all the pre-requisites installed.

NOTE: When the container is built Visual Studio Code will automatically install the python dependencies required for the end-to-end capability tests. If you pull a later version of the code, you make need to run pip install -r end_to_end_tests/requirements.txt to install the dependencies (or rebuild the dev container).

Prerequisites for non Dev Container setup

If you are manually installing the pre-requisites, you will need the following:

Azure CLI
- including the application-insights extension (az extension add --name application-insights)
Docker (if using the OpenAI API simulator)
Python 3 (to run end-to-end tests)
jq (to parse JSON responses in bash scripts)
a bash terminal (see Windows Subsystem for Linux if you are on Windows)
Install python dependencies for the end-to-end tests by running pip install -r end_to_end_tests/requirements.txt

Deploying the Accelerator

To see the GenAI Gateway capabilities in action, you can deploy the infrastructure using the provided Bicep templates.

The templates require parameters set via an .env file and the project contains a sample.env with the required environment variables. Rename sample.env to .env and set the values accordingly.
Sign in with the Azure CLI:

az login

Deploy the Bicep infrastructure:

./scripts/deploy.sh

NOTE: A known KV/ACA bug requires the deploy.sh script to be run twice, if using the OpenAI API simulator.

Gateway Capabilities

This repo currently contains the policies showing how to implement these GenAI Gateway capabilities:

Capability	Description
Latency based routing	Route traffic to the endpoint with the lowest latency.
Load balancing (round-robin)	Load balance traffic across PAYG endpoints using round-robin algorithm.
Managing spikes with PAYG	Manage spikes in traffic by routing traffic to PAYG endpoints when a PTU is out of capacity.
Adaptive rate limiting	Dynamically adjust rate-limits applied to different workloads
Tracking token usage	Record the token consumption for usage tracking and attribution

Testing Gateway Capabilities

The easiest way to see the gateway capabilities in action is to deploy the gateway along with the OpenAI API Simualtor (set the USE_SIMULATOR option in your .env file to true).

Once you have the gateway and simulator deployed, see the README.md in the relevant capability folder for instructions on how to test the capability. (NOTE: currently not all capabilities have tests implemented)

Name		Name	Last commit message	Last commit date
Latest commit History 179 Commits
.devcontainer		.devcontainer
.github		.github
.vscode		.vscode
capabilities		capabilities
docs/assets		docs/assets
end_to_end_tests		end_to_end_tests
infra		infra
scripts		scripts
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
LICENSE.md		LICENSE.md
README.md		README.md
sample.env		sample.env

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GenAI Gateway Toolkit using API Management (APIM)

Introduction

GenAI Gateway

Architecture

Getting Started

Using Visual Studio Code Dev Containers

Prerequisites for non Dev Container setup

Deploying the Accelerator

Gateway Capabilities

Testing Gateway Capabilities

About

Releases

Packages

Languages

License

kshitijcode/apim-genai-gateway-toolkit

Folders and files

Latest commit

History

Repository files navigation

GenAI Gateway Toolkit using API Management (APIM)

Introduction

GenAI Gateway

Architecture

Getting Started

Using Visual Studio Code Dev Containers

Prerequisites for non Dev Container setup

Deploying the Accelerator

Gateway Capabilities

Testing Gateway Capabilities

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages