Introducing in-database embedding generation for Azure Database for PostgreSQL

JoshMSFT · ‎May 21 2024

Introducing in-database embedding generation for Azure Database for PostgreSQL:

via the azure_local_ai extension to Azure Database for PostgreSQL

We are excited to announce the public preview release of azure_local_ai, a new extension for Azure Database for PostgreSQL that enables you to create text embeddings from a model deployed within the same VM as your PostgreSQL database.

Vector embeddings enable AI models to better understand relationships and similarities between data, which is key for intelligent apps. Azure Database for PostgreSQL is proud to be the industry’s that has in-database embedding generation with a text embedding model deployed within the PostgreSQL boundary. Can be generated right within the database, thus offering:

Single-digit millisecond latency.
Predictable costs.
Confidence that data will remain compliant for confidential workloads.

In this release, the extension will deploy a single model, multilingual-e5-small, to your Azure Database for PostgreSQL Flexible Server instance. The first time an embedding is created, the model is loaded into memory. Preview terms for the azure_local_ai extension.

azure_local_ai extension – Preview

Generate embeddings from within the database with a single line of SQL code invoking a UDF.
Harness the power of a text embedding model alongside your operational data without leaving your PostgreSQL database boundary.

During this public preview, the azure_local_ai extension will be available in these Azure regions:

Australia East
East USA
France Central
Japan East
UK South
West Europe
West USA

This preview feature is also only available for newly deployed Azure Database for PostgreSQL Flexible Server instances.

How does the azure_local_ai extension work?

In-database embedding architecture

azure_local_ai extension - Azure Database for PostgreSQL architecture diagram

ONNX Runtime Configuration

azure_local_ai supports reviewing the configuration parameters of ONNX Runtime thread-pool within the ONNX Runtime Service. Changes are not currently allowed. See ONNX Runtime performance tuning.

Valid values for the key are:

intra_op_parallelism: Sets total number of threads used for parallelizing single operator by ONNX Runtime thread-pool. By default, we maximize the number of intra ops threads as much as possible as it improves the overall throughput much (half of the available CPUs by default).
inter_op_parallelism: Sets total number of threads used for computing multiple operators in parallel by ONNX Runtime thread-pool. By default, we set it to minimum possible thread, which is 1. Increasing it often hurts performance due to frequent context switches between threads.
spin_control: Switches ONNX Runtime thread-pool's spinning for requests. When disabled, it uses less cpu and hence causes more latency. By default, it is set to true (enabled).

SELECT azure_local_ai.get_setting(key TEXT);

Generate embeddings

The azure_local_ai extension for Azure Database for PostgreSQL makes it easy to generate an embedding from a simple inline UDF call in your SQL statement passing the model name and the data input to generate the embedding.

-- Single embedding

SELECT azure_local_ai.create_embeddings('multilingual-e5-small:v1', 'Vector embeddings power GenAI applications');

-- Simple array embedding

SELECT azure_local_ai.create_embeddings('multilingual-e5-small:v1', array['Recommendation System with Azure Database for PostgreSQL - Flexible Server and Azure OpenAI.', 'Generative AI with Azure Database for PostgreSQL - Flexible Server.']);

Here’s a quick example that demonstrates:

Adding a vector column to a table with a default that generates an embedding and stores it when data is inserted.
Creating an HNSW index.
Completing a semantic search by generating an embedding for a search string and comparing with stored vectors with a cosine similarity search.

--Create docs table
CREATE TABLE docs(doc_id INT GENERATED ALWAYS AS IDENTITY PRIMARY KEY, doc TEXT NOT NULL, last_update TIMESTAMPTZ DEFAULT NOW());

-- Add a vector column and generate vector embeddings from locally deployed model
ALTER TABLE docs
ADD COLUMN doc_vector vector(384) -- multilingual-e5 embeddings are 384 dimensions
GENERATED ALWAYS AS -- Generated on inserts
(azure_local_ai.create_embeddings('multilingual-e5-small:v1', doc)::vector) STORED; -- TEXT string sent to local model

-- Create a HNSW index
CREATE INDEX ON docs USING hnsw (doc_vector vector_ip_ops);

--Insert data into the docs table
INSERT INTO docs(doc) VALUES ('Create in-database embeddings with azure_local_ai extension.'),
('Enable RAG patterns with in-database embeddings and vectors on Azure Database for PostgreSQL - Flexible server.'), ('Generate vector embeddings in PostgreSQL with azure_local_ai extension.'),('Generate text embeddings in PostgreSQL for retrieval augmented generation (RAG) patterns with azure_local_ai extension and locally deployed LLM.'), ('Use vector indexes and Azure OpenAI embeddings in PostgreSQL for retrieval augmented generation.');

-- Semantic search using vector similarity match
SELECT doc_id, doc, doc_vector
 FROM   docs d
ORDER BY
 d.doc_vector <#> azure_local_ai.create_embeddings('multilingual-e5-small:v1', 'Generate text embeddings in PostgreSQL.')::vector
LIMIT 1;

-- Add a single record to the docs table and the vector embedding using azure_local_ai and locally deployed model will be automatically generated
INSERT INTO docs(doc) VALUES ('Semantic Search with Azure Database for PostgreSQL - Flexible Server and Azure OpenAI');

--View all doc entries and their doc_vector column. A vector embedding will have been generated for single record added above.
SELECT doc, doc_vector, last_update FROM docs;

Getting Started

To get started, review the azure_local_ai extension documentation, enable the extension and begin creating embeddings from your text data without leaving the Azure Database for PostgreSQL boundary.

Products (49)

Special Topics (26)

Video Hub (462)

Most Active Hubs