Concrete-ML is a Privacy-Preserving Machine Learning (PPML) open-source set of tools built on top of The Concrete Framework by Zama. It aims to simplify the use of fully homomorphic encryption (FHE) for data scientists to help them automatically turn machine learning models into their homomorphic equivalent. Concrete-ML was designed with ease-of-use in mind, so that data scientists can use it without knowledge of cryptography. Notably, the Concrete-ML model classes are similar to those in scikit-learn and it is also possible to convert PyTorch models to FHE.
Data scientists can use models with APIs which are close to the frameworks they use, with additional options to run inferences in FHE.
Concrete-ML features:
- built-in models, which are ready-to-use FHE-friendly models with a user interface that is equivalent to their the scikit-learn and XGBoost counterparts
- support for customs models that can use quantization aware training. These are developed by the user using pytorch or keras/tensorflow and are imported into Concrete-ML through ONNX
Depending on your OS, Concrete-ML may be installed with Docker or with pip:
OS / HW | Available on Docker | Available on pip |
---|---|---|
Linux | Yes | Yes |
Windows | Yes | Coming soon |
Windows Subsystem for Linux | Yes | Yes |
macOS (Intel) | Yes | Yes |
macOS (Apple Silicon, ie M1, M2 etc) | Yes | Coming soon |
Note: Concrete-ML only supports Python 3.7
(linux only), 3.8
and 3.9
.
To install with Docker, pull the concrete-ml
image as follows:
docker pull zamafhe/concrete-ml:latest
To install Concrete-ML from PyPi, run the following:
pip install -U pip wheel setuptools
pip install concrete-ml
You can find more detailed installation instructions in this part of the documentation
A simple example which is very close to scikit-learn is as follows, for a logistic regression :
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from concrete.ml.sklearn import LogisticRegression
# Create the data for classification
x, y = make_classification(n_samples=100, class_sep=2, n_features=4, random_state=42)
# Retrieve train and test sets
X_train, X_test, y_train, y_test = train_test_split(
x, y, test_size=10, random_state=42
)
# Fix the number of bits to used for quantization
model = LogisticRegression(n_bits=2)
# Fit the model
model.fit(X_train, y_train)
# Run the predictions on non-encrypted data as a reference
y_pred_clear = model.predict(X_test, execute_in_fhe=False)
# Compile into a FHE model
model.compile(x)
# Run the inference in FHE
y_pred_fhe = model.predict(X_test, execute_in_fhe=True)
print("In clear :", y_pred_clear)
print("In FHE :", y_pred_fhe)
print(f"Comparison: {int((y_pred_fhe == y_pred_clear).sum()/len(y_pred_fhe)*100)}% similar")
# Output:
# In clear : [0 0 0 1 0 1 0 1 1 1]
# In FHE : [0 0 0 1 0 1 0 1 1 1]
# Comparison: 100% similar
This example is explained in more detail in the linear model documentation. Concrete-ML built-in models have APIs that are almost identical to their scikit-learn counterparts. It is also possible to convert PyTorch networks to FHE with the Concrete-ML conversion APIs. Please refer to the linear models, tree-based models and neural networks documentation for more examples, showing the scikit-learn-like API of the built-in models.
Full, comprehensive documentation is available here: https://docs.zama.ai/concrete-ml.
Various tutorials are proposed for the built-in models and for deep learning. In addition, several complete use-cases are explored:
-
MNIST:a python script and notebook showing quantization-aware training following FHE constraints. The model is implemented with Brevitas and is converted to FHE with Concrete-ML.
-
Titanic: a notebook, which gives a solution to the Kaggle Titanic competition. Done with XGBoost from Concrete-ML. It comes as a companion of Kaggle notebook, and was the subject of a blogpost in KDnuggets.
-
Encrypted sentiment analysis: a gradio demo which predicts if a tweet / short message is positive, negative or neutral, in FHE of course! The live interactive demo is available on Hugging Face. And read the official blog post explaining how we do it!
More generally, if you have built awesome projects using Concrete-ML, feel free to let us know and we'll link to it!
To cite Concrete-ML, notably in academic papers, please use the following entry, which list authors by order of first commit:
@Misc{ConcreteML,
title={Concrete-{ML}: a Privacy-Preserving Machine Learning Library using Fully Homomorphic Encryption for Data Scientists},
author={Arthur Meyre and Benoit Chevallier-Mames and Jordan Frery and Andrei Stoian and Roman Bredehoft and Luis Montero and Celia Kherfallah},
year={2022-*},
note={\url{https://github.com/zama-ai/concrete-ml}},
}
This software is distributed under the BSD-3-Clause-Clear license. If you have any questions, please contact us at hello@zama.ai.