Preprocess data for machine learning and deep learning
You can use Databricks Feature Store to create new features, explore and re-use existing features, select features for training and scoring machine learning models, and publish features to low-latency online stores for real-time inference.
On large datasets, you can use Spark SQL and MLlib for feature engineering. Third-party libraries included in Databricks Runtime ML such as scikit-learn also provide useful helper methods. For examples, see the following machine learning notebooks for scikit-learn and MLlib:
For more complex deep learning feature processing, this example notebook illustrates how to use transfer learning for featurization:
Feedback
https://aka.ms/ContentUserFeedback.
Coming soon: Throughout 2024 we will be phasing out GitHub Issues as the feedback mechanism for content and replacing it with a new feedback system. For more information see:Submit and view feedback for