What is Databricks?

Databricks is a unified cloud native data intelligence platform.

It combines data ingestion, storage, processing, analytics, and even machine learning so you don’t

have to stitch together multiple disconnected tools.

Under the hood, Databricks runs on Apache Spark, the industry standard engine for large scale data

processing.

Spark gives you fast parallel execution for both batch and streaming workloads.

Databricks runs on AWS, Azure, and GCP and works seamlessly with each cloud’s core services, including

storage, compute and security, and access control.

At its heart sits the Lakehouse, a relatively new open architecture that combines the best of data

lakes and data warehouses.

You get modern data structures and management features like reliable transactions, schema enforcement,

and built in version history directly on top of low cost cloud storage in open formats, Databricks

provides five core engines to turn your data into value.

Business intelligence.

Build live dashboards and run reports directly on your data lake.

Analysts can use standard BI tools against fresh data without exporting or moving it.

Data warehousing.

Run high performance, large scale SQL queries with Databricks SQL, offering the elasticity of the

cloud for warehousing workloads, AI and data science.

Explore data interactively and train machine learning models in the same environment, with built in

experiment tracking and model management via MLflow, ETL and real time analytics.

Create both batch ETL pipelines and streaming jobs in one platform, leveraging Delta Live tables or

Spark Structured streaming for reliable, scalable data processing orchestration.

Automate end to end workflows with scheduling, automatic retries, and integrated alerting so your

pipelines run consistently without manual intervention.

That’s Databricks in a nutshell, a unified cloud native lakehouse powered by spark that streamlines

every stage of the data life cycle.

Category: Blog