SynapseML | SynapseML

Coming from MMLSpark? We have been renamed to SynapseML!

Cognitive Services
Deep Learning
Responsible AI
LightGBM
OpenCV

from synapse.ml.cognitive import *

sentiment_df = (TextSentiment()
    .setTextCol("text")
    .setLocation("eastus")
    .setSubscriptionKey(key)
    .setOutputCol("sentiment")
    .setErrorCol("error")
    .setLanguageCol("language")
    .transform(input_df))

from synapse.ml.onnx import *

model_prediction_df = (ONNXModel()
    .setModelPayload(model_payload_ml)
    .setDeviceType("CPU")
    .setFeedDict({"input": "features"})
    .setFetchDict({"probability": "probabilities", "prediction": "label"})
    .setMiniBatchSize(64)
    .transform(input_df))

from synapse.ml.explainers import *

interpretation_df = (TabularSHAP()
    .setInputCols(features)
    .setOutputCol("shapValues")
    .setTargetCol("probability")
    .setTargetClasses([1])
    .setNumSamples(5000)
    .setModel(model)
    .transform(input_df))

from synapse.ml.lightgbm import *

quantile_df = (LightGBMRegressor()
    .setApplication('quantile')
    .setAlpha(0.3)
    .setLearningRate(0.3)
    .setNumIterations(100)
    .setNumLeaves(31)
    .fit(train_df)
    .transform(test_df))

from synapse.ml.opencv import *

image_df = (ImageTransformer()
    .setInputCol("images")
    .setOutputCol("transformed_images")
    .resize(224, True)
    .centerCrop(224, 224)
    .normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225], color_scale_factor = 1/255)
    .transform(input_df))

Simple

Quickly create, train, and use distributed machine learning tools in only a few lines of code.

Scalable

Scale ML workloads to hundreds of machines on your Apache Spark cluster.

Multilingual

Use SynapseML from any Spark compatible language including Python, Scala, R, Java, .NET and C#.

Open

SynapseML is Open Source and can be installed and used on any Spark 3 infrastructure including your local machine, Databricks, Synapse Analytics, and others.

Installation

Written in Scala, and support multiple languages. Open source and cloud native.

SynapseML can be installed on Synapse adding the following to the first cell of a notebook:

For Spark3.4 pools:

%%configure -f
{
  "name": "synapseml",
  "conf": {
      "spark.jars.packages": "com.microsoft.azure:synapseml_2.12:1.0.8",
      "spark.jars.repositories": "https://mmlspark.azureedge.net/maven",
      "spark.jars.excludes": "org.scala-lang:scala-reflect,org.apache.spark:spark-tags_2.12,org.scalactic:scalactic_2.12,org.scalatest:scalatest_2.12,com.fasterxml.jackson.core:jackson-databind",
      "spark.yarn.user.classpath.first": "true",
      "spark.sql.parquet.enableVectorizedReader": "false"
  }
}

For Spark3.3 pools:

%%configure -f
{
  "name": "synapseml",
  "conf": {
      "spark.jars.packages": "com.microsoft.azure:synapseml_2.12:0.11.4-spark3.3",
      "spark.jars.repositories": "https://mmlspark.azureedge.net/maven",
      "spark.jars.excludes": "org.scala-lang:scala-reflect,org.apache.spark:spark-tags_2.12,org.scalactic:scalactic_2.12,org.scalatest:scalatest_2.12,com.fasterxml.jackson.core:jackson-databind",
      "spark.yarn.user.classpath.first": "true",
      "spark.sql.parquet.enableVectorizedReader": "false"
  }
}

SynapseML is preinstalled on Fabric. To install a different version, add the following to the first cell of a notebook:

%%configure -f
{
  "name": "synapseml",
  "conf": {
      "spark.jars.packages": "com.microsoft.azure:synapseml_2.12:[THE_SYNAPSEML_VERSION_YOU_WANT]",
      "spark.jars.repositories": "https://mmlspark.azureedge.net/maven",
      "spark.jars.excludes": "org.scala-lang:scala-reflect,org.apache.spark:spark-tags_2.12,org.scalactic:scalactic_2.12,org.scalatest:scalatest_2.12,com.fasterxml.jackson.core:jackson-databind",
      "spark.yarn.user.classpath.first": "true",
      "spark.sql.parquet.enableVectorizedReader": "false"
  }
}

SynapseML can be conveniently installed on existing Spark clusters via the --packages option, examples:

spark-shell --packages com.microsoft.azure:synapseml_2.12:1.0.8 # Please use 1.0.8 version for Spark3.4 and 0.11.4-spark3.3 version for Spark3.3
pyspark --packages com.microsoft.azure:synapseml_2.12:1.0.8
spark-submit --packages com.microsoft.azure:synapseml_2.12:1.0.8 MyApp.jar

This can be used in other Spark contexts too. For example, you can use SynapseML in AZTK by adding it to the .aztk/spark-defaults.conf file.

To install SynapseML on the Databricks cloud, create a new library from Maven coordinates in your workspace. in your workspace.

For the coordinates:

Spark 3.4 Cluster:

com.microsoft.azure:synapseml_2.12:1.0.8

Spark 3.3 Cluster:

com.microsoft.azure:synapseml_2.12:0.11.4-spark3.3

with the resolver:

https://mmlspark.azureedge.net/maven

Ensure this library is attached to your target cluster(s).

Finally, ensure that your Spark cluster has at least Spark 3.4 and Scala 2.12.

You can use SynapseML in both your Scala and PySpark notebooks. To get started with our example notebooks import the following databricks archive:

https://mmlspark.blob.core.windows.net/dbcs/SynapseMLExamplesv1.0.8.dbc

The easiest way to evaluate SynapseML is via our pre-built Docker container. To do so, run the following command:

docker run -it -p 8888:8888 -e ACCEPT_EULA=yes mcr.microsoft.com/mmlspark/release

Navigate to http://localhost:8888 in your web browser to run the sample notebooks. See the documentation for more on Docker use.

To read the EULA for using the docker image, run

docker run -it -p 8888:8888 mcr.microsoft.com/mmlspark/release eula

To try out SynapseML on a Python (or Conda) installation you can get Spark installed via pip with

pip install pyspark

You can then use pyspark as in the above example, or from python:

import pyspark
spark = (pyspark.sql.SparkSession.builder.appName("MyApp")
        .config("spark.jars.packages", "com.microsoft.azure:synapseml_2.12:1.0.8") # Please use 1.0.8 version for Spark3.4 and 0.11.4-spark3.3 version for Spark3.3
        .config("spark.jars.repositories", "https://mmlspark.azureedge.net/maven")
        .getOrCreate())
import synapse.ml

If you are building a Spark application in Scala, add the following lines to your build.sbt:

resolvers += "SynapseML" at "https://mmlspark.azureedge.net/maven"
libraryDependencies += "com.microsoft.azure" %% "synapseml_2.12" % "1.0.8" // Please use 1.0.8 version for Spark3.2 and 1.0.8-spark3.3 version for Spark3.3