Microsoft Fabric Explained – Lakehouse vs Warehouse vs Eventhouse

Microsoft Fabric is transforming the modern data platform landscape by bringing together: …all inside a single unified SaaS platform. One of the biggest strengths of Microsoft Fabric is flexibility. Fabric gives organizations multiple ways to store, process, and analyze data depending on: At the center of everything is OneLake — Microsoft Fabric’s unified data lake. … Read more

Materialized Gold Table in Databricks

We are going to give a quick overview of how stored views and materialized views can be created in Databricks. For this demo, we will create our two gold layer entities. We will start by creating a view in the gold layer against our silver table customers_orders. Our view will simply contain some aggregations for … Read more

Delta Lake CDF in Databricks

Learning Objectives Change Data Feed or CDF is a new feature built into Delta Lake that allows to automatically generate CDC feeds about Delta Lake tables. CDF records row-level changes for all the data written into a delta table. These include the raw data along with metadata indicating whether the specified row was inserted, deleted … Read more

Slowly Changing Dimensions in Databricks

Introduction Slowly Changing Dimensions, or SCD is a data management concept that determines how tables handle data which change over time. For example, whether you want to overwrite values in the table or maybe retain their history. This can be determined by the SCD type to implement. There are three types of SCD which are … Read more

Streaming from Multiplex Bronze in Databricks

In this notebook, we are going to parse raw data from a single topic in our multiplexer bronze table. As you can see, in our architecture diagram, we will create the orders silver table. Let us start by copying our dataset files. Before starting, we need to cast our Kafka binary fields as strings. Here … Read more

Delta Lake Check Constraints Explained (Ensure Data Quality in Data Pipelines)

Introduction In modern data engineering, ensuring data quality is just as important as building scalable pipelines. No matter how advanced your architecture is, poor data quality can lead to incorrect analytics, broken dashboards, and unreliable machine learning models. This is where Delta Lake check constraints come into play. Check constraints allow you to enforce rules … Read more

Configuring Auto Loader for Reliable Data Ingestion (Complete Guide 2026)

Introduction In modern data engineering, ingesting data reliably and efficiently is one of the most critical challenges. With the rise of cloud-based data platforms, tools like Auto Loader (commonly used in Databricks) have become essential for building scalable and fault-tolerant data pipelines. Auto Loader is designed to automatically detect and process new files as they … Read more