By the end of this course you should be able to:
Understand how to use and the benefits of using the Databricks Lakehouse Platform and its tools, including:
- Data Lakehouse (architecture, descriptions, benefits)
- Data Science and Engineering workspace (clusters, notebooks, data storage)
- Delta Lake (general concepts, table management and manipulation, optimizations)
Build ETL pipelines using Apache Spark SQL and Python, including:
- Relational entities (databases, tables, views)
- ELT (creating tables, writing data to tables, cleaning data, combining and reshaping tables, SQL UDFs)
- Python (facilitating Spark SQL with string manipulation and control flow, passing data between PySpark and Spark SQL)
Incrementally process data, including:
- Structured Streaming (general concepts, triggers, watermarks)
- Auto Loader (streaming reads)
- Multi-hop Architecture (bronze-silver-gold, streaming applications)
- Delta Live Tables (benefits and features)
Build production pipelines for data engineering applications and Databricks SQL queries and dashboards, including:
- Jobs (scheduling, task orchestration, UI)
- Dashboards (endpoints, scheduling, alerting, refreshing)
- Understand and follow best security practices, including:
- Unity Catalog (benefits and features)
- Entity Permissions (data objects Privileges)
Who this course is for:
- Anyone aiming to pass the Databricks Data Engineer Associate certification exam
- University students looking for a career in Data Engineering
- Data Engineers moving from other technologies and aiming to apply their skills to Databricks
- Data Engineers/ Data Warehouse Developers currently working on on-premises technologies
- Anyone new to Databricks and want to save time by learning Databricks fundamentals