SynapseML
Simple and Distributed Machine Learning
- Cognitive Services
- Deep Learning
- Responsible AI
- LightGBM
- OpenCV
Read morefrom synapse.ml.cognitive import *sentiment_df = (TextSentiment().setTextCol("text").setLocation("eastus").setSubscriptionKey(key).setOutputCol("sentiment").setErrorCol("error").setLanguageCol("language").transform(input_df))
Read morefrom synapse.ml.onnx import *model_prediction_df = (ONNXModel().setModelPayload(model_payload_ml).setDeviceType("CPU").setFeedDict({"input": "features"}).setFetchDict({"probability": "probabilities", "prediction": "label"}).setMiniBatchSize(64).transform(input_df))
Read morefrom synapse.ml.explainers import *interpretation_df = (TabularSHAP().setInputCols(features).setOutputCol("shapValues").setTargetCol("probability").setTargetClasses([1]).setNumSamples(5000).setModel(model).transform(input_df))
Read morefrom synapse.ml.lightgbm import *quantile_df = (LightGBMRegressor().setApplication('quantile').setAlpha(0.3).setLearningRate(0.3).setNumIterations(100).setNumLeaves(31).fit(train_df).transform(test_df))
Read morefrom synapse.ml.opencv import *image_df = (ImageTransformer().setInputCol("images").setOutputCol("transformed_images").resize(224, True).centerCrop(224, 224).normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225], color_scale_factor = 1/255).transform(input_df))
Simple
Quickly create, train, and use distributed machine learning tools in only a few lines of code.
Scalable
Scale ML workloads to hundreds of machines on your Apache Spark cluster.
Multilingual
Use SynapseML from any Spark compatible language including Python, Scala, R, Java, .NET and C#.
Open
SynapseML is Open Source and can be installed and used on any Spark 3 infrastructure including your local machine, Databricks, Synapse Analytics, and others.
Installation
Written in Scala, and support multiple languages. Open source and cloud native.
- Synapse
- Fabric
- Spark Packages
- Databricks
- Docker
- Python
- SBT
SynapseML can be installed on Synapse adding the following to the first cell of a notebook:
For Spark3.4 pools:For Spark3.3 pools:%%configure -f{"name": "synapseml","conf": {"spark.jars.packages": "com.microsoft.azure:synapseml_2.12:1.0.8","spark.jars.repositories": "https://mmlspark.azureedge.net/maven","spark.jars.excludes": "org.scala-lang:scala-reflect,org.apache.spark:spark-tags_2.12,org.scalactic:scalactic_2.12,org.scalatest:scalatest_2.12,com.fasterxml.jackson.core:jackson-databind","spark.yarn.user.classpath.first": "true","spark.sql.parquet.enableVectorizedReader": "false"}}
%%configure -f{"name": "synapseml","conf": {"spark.jars.packages": "com.microsoft.azure:synapseml_2.12:0.11.4-spark3.3","spark.jars.repositories": "https://mmlspark.azureedge.net/maven","spark.jars.excludes": "org.scala-lang:scala-reflect,org.apache.spark:spark-tags_2.12,org.scalactic:scalactic_2.12,org.scalatest:scalatest_2.12,com.fasterxml.jackson.core:jackson-databind","spark.yarn.user.classpath.first": "true","spark.sql.parquet.enableVectorizedReader": "false"}}
SynapseML is preinstalled on Fabric. To install a different version, add the following to the first cell of a notebook:
%%configure -f{"name": "synapseml","conf": {"spark.jars.packages": "com.microsoft.azure:synapseml_2.12:[THE_SYNAPSEML_VERSION_YOU_WANT]","spark.jars.repositories": "https://mmlspark.azureedge.net/maven","spark.jars.excludes": "org.scala-lang:scala-reflect,org.apache.spark:spark-tags_2.12,org.scalactic:scalactic_2.12,org.scalatest:scalatest_2.12,com.fasterxml.jackson.core:jackson-databind","spark.yarn.user.classpath.first": "true","spark.sql.parquet.enableVectorizedReader": "false"}}
This can be used in other Spark contexts too. For example, you can use SynapseML in AZTK by adding it to the .aztk/spark-defaults.conf file.spark-shell --packages com.microsoft.azure:synapseml_2.12:1.0.8 # Please use 1.0.8 version for Spark3.4 and 0.11.4-spark3.3 version for Spark3.3pyspark --packages com.microsoft.azure:synapseml_2.12:1.0.8spark-submit --packages com.microsoft.azure:synapseml_2.12:1.0.8 MyApp.jar
To install SynapseML on the Databricks cloud, create a new library from Maven coordinates in your workspace. in your workspace.
For the coordinates:
Spark 3.4 Cluster:Spark 3.3 Cluster:com.microsoft.azure:synapseml_2.12:1.0.8
with the resolver:com.microsoft.azure:synapseml_2.12:0.11.4-spark3.3
Ensure this library is attached to your target cluster(s).https://mmlspark.azureedge.net/maven
Finally, ensure that your Spark cluster has at least Spark 3.4 and Scala 2.12.
You can use SynapseML in both your Scala and PySpark notebooks. To get started with our example notebooks import the following databricks archive:https://mmlspark.blob.core.windows.net/dbcs/SynapseMLExamplesv1.0.8.dbc
docker run -it -p 8888:8888 -e ACCEPT_EULA=yes mcr.microsoft.com/mmlspark/release
Navigate to http://localhost:8888 in your web browser to run the sample notebooks. See the documentation for more on Docker use.
To read the EULA for using the docker image, rundocker run -it -p 8888:8888 mcr.microsoft.com/mmlspark/release eula
You can then use pyspark as in the above example, or from python:pip install pyspark
import pysparkspark = (pyspark.sql.SparkSession.builder.appName("MyApp").config("spark.jars.packages", "com.microsoft.azure:synapseml_2.12:1.0.8") # Please use 1.0.8 version for Spark3.4 and 0.11.4-spark3.3 version for Spark3.3.config("spark.jars.repositories", "https://mmlspark.azureedge.net/maven").getOrCreate())import synapse.ml
resolvers += "SynapseML" at "https://mmlspark.azureedge.net/maven"libraryDependencies += "com.microsoft.azure" %% "synapseml_2.12" % "1.0.8" // Please use 1.0.8 version for Spark3.2 and 1.0.8-spark3.3 version for Spark3.3