dpe hero

Data Processing Engine

Data Platform automation service: transform data, run and orchestrate production-grade ETL/ELT workflows.

Process

Execute batch data processing jobs to extract, transform and load data from any source to any destination.

Automate

Create workflows with a low-code visual builder and schedule them to run in order to automate data-intensive tasks.

Develop

Code and run any custom Python or PySpark script and leverage a complete SDK with over 40 connectors.

Iterate

Organise and version your code, via native versioning systems or Git integration

dpe 1 tasks img

Create and customise data processing tasks

Connect to any data source and process data to any destination. A rich catalogue of pre-built job templates allows you to create actions for data extraction, loading, aggregation, cleaning, and metadata updates. Code and run any custom script in Python or PySpark to tackle specific use cases, while benefitting from a complete SDK with over 40 connectors. If you have existing data processing scripts in Python, simply import them to centralise and orchestrate them into Data Platform.

Custom actions let you manage packages and dependencies, including your own custom libraries, which you can reuse for several projects. Data Processing Engine comes with two version control systems to ensure production-critical workloads are never impacted. Data Platform version control allows you to track simple version scalability on the platform, and developers to synchronise with any external Git repository.

Define and orchestrate workflows

Benefit from the drag-and-drop experience in Data Processing Engine’s orchestrator. You can use it to seamlessly define, sequence, and schedule jobs and resource management to scale appropriately, with workers you can control, as needed. A user-friendly visual builder enables you to visualise and execute your plan on the cloud, whether or not you have solid technical knowledge, or the know-how to manage cloud infrastructure. Schedule triggers to automate job executions, including CRON-based triggers.

dpe 3 workflows img
dpe 3 sclalepipelines img

Run and scale data processing pipelines on the cloud

Execute single actions or whole workflows as jobs, in one API call. Data Processing Engine integrates two engines for you to choose from: a Pandas engine (in Python 3) optimised for smaller data processing tasks, and Spark engine (in PySpark) for data-intensive workloads.

Scale your jobs horizontally and vertically for faster execution, using OVHcloud compute resources. Take advantage of the power of segmentation to parallelize tasks and accelerate processing. Use our perimeter option to include or exclude data points beyond a given perimeter.

Monitor job executions and performance

View comprehensive, detailed reports of completed jobs, including workers’ CPU and RAM over time, as well as completed job logs. Troubleshoot your jobs and optimise resource consumption by pinpointing checkpoints in your workflows.

Get notified about completed and failed jobs, duration, or RAM usage by integrating Data Platform Control Center and setting up job execution alerts. Manage fine-grained access control with Data Platform IAM.

dpe 4 jobexecutions img
Start building your data-driven solutions today

Join the Data Platform Beta and try it for free

Who is this for ?

Data Engineers

Create pipelines to extract data from enterprise data sources and aggregate them into data warehouse tables at any given time.

MLOps Engineers

Carry out all the data cleaning and feature engineering needed for the training of machine learning (ML) models.

Software Engineers

Deploy any data-intensive code, such as custom Python optimisation solvers to compute equation optimums.

Pricing Public Cloud

Simple, transparent, pay-as-you-go pricing

Data Platform all inclusive usage-based pricing means you only pay for what you use:

  • High-performance storage used up by your data stored on the platform, billed per GB per month.
  • Serverless computing for lakehouse service, with queries billed per TB of scanned data.
  • Reserved computing power billed per hour or per month, available for all Data Platform services.

No additional user licence or traffic costs to scale data projects without blowing up budgets.