Databricks – Datatorials

Microsoft Fabric Explained – Lakehouse vs Warehouse vs Eventhouse

May 11, 2026May 7, 2026 by Abdus Sattar

Microsoft Fabric is transforming the modern data platform landscape by bringing together: …all inside a single unified SaaS platform. One of the biggest strengths of Microsoft Fabric is flexibility. Fabric gives organizations multiple ways to store, process, and analyze data depending on: At the center of everything is OneLake — Microsoft Fabric’s unified data lake. … Read more

Materialized Gold Table in Databricks

May 11, 2026April 29, 2026 by Abdus Sattar

We are going to give a quick overview of how stored views and materialized views can be created in Databricks. For this demo, we will create our two gold layer entities. We will start by creating a view in the gold layer against our silver table customers_orders. Our view will simply contain some aggregations for … Read more

Stream-Stream & Stream-Static Joins

May 11, 2026April 28, 2026 by Abdus Sattar

We are going to see how to use CDF data to propagate changes to downstream tables. For this demo, we will create a new silver table called customers_orders by joining two streams: The orders table with the CDF data of the customers table. We will start by creating a function to upsert ranked updates into … Read more

Change Data Capture (CDC) in Databricks

May 11, 2026April 26, 2026 by Abdus Sattar

Change Data Capture or CDC refers to the process of identifying and capturing changes made to data in the data source and then delivering those changes to the target. Those changes could be obviously new records to be inserted from the source to the target. Updated records in the source that need to be reflected … Read more

Delta Lake CDF in Databricks

May 11, 2026April 26, 2026 by Abdus Sattar

Learning Objectives Change Data Feed or CDF is a new feature built into Delta Lake that allows to automatically generate CDC feeds about Delta Lake tables. CDF records row-level changes for all the data written into a delta table. These include the raw data along with metadata indicating whether the specified row was inserted, deleted … Read more

Slowly Changing Dimensions in Databricks

May 11, 2026April 24, 2026 by Abdus Sattar

Introduction Slowly Changing Dimensions, or SCD is a data management concept that determines how tables handle data which change over time. For example, whether you want to overwrite values in the table or maybe retain their history. This can be determined by the SCD type to implement. There are three types of SCD which are … Read more

Streaming from Multiplex Bronze in Databricks

May 11, 2026April 24, 2026 by Abdus Sattar

In this notebook, we are going to parse raw data from a single topic in our multiplexer bronze table. As you can see, in our architecture diagram, we will create the orders silver table. Let us start by copying our dataset files. Before starting, we need to cast our Kafka binary fields as strings. Here … Read more

Bronze Ingestion Patterns in Databricks

May 11, 2026April 24, 2026 by Abdus Sattar

Learning Objectives You will understand the available ingestion models in the bronze layers and the difference between them, and you will learn how the ingested data will be promoted to the silver layer. When setting up ingestion into the bronze layer, we need to decide how input datasets should be mapped to the bronze tables. … Read more

Delta Lake Check Constraints Explained (Ensure Data Quality in Data Pipelines)

May 11, 2026April 21, 2026 by Abdus Sattar

Introduction In modern data engineering, ensuring data quality is just as important as building scalable pipelines. No matter how advanced your architecture is, poor data quality can lead to incorrect analytics, broken dashboards, and unreliable machine learning models. This is where Delta Lake check constraints come into play. Check constraints allow you to enforce rules … Read more

Configuring Auto Loader for Reliable Data Ingestion (Complete Guide 2026)

April 20, 2026 by Abdus Sattar

Introduction In modern data engineering, ingesting data reliably and efficiently is one of the most critical challenges. With the rise of cloud-based data platforms, tools like Auto Loader (commonly used in Databricks) have become essential for building scalable and fault-tolerant data pipelines. Auto Loader is designed to automatically detect and process new files as they … Read more