Microsoft Fabric Explained – Lakehouse vs Warehouse vs Eventhouse

Microsoft Fabric is transforming the modern data platform landscape by bringing together: …all inside a single unified SaaS platform. One of the biggest strengths of Microsoft Fabric is flexibility. Fabric gives organizations multiple ways to store, process, and analyze data depending on: At the center of everything is OneLake — Microsoft Fabric’s unified data lake. … Read more

100 Azure Data Factory (ADF) Scenarios Explained – Complete Practical Guide for Data Engineers

Azure Data Factory (ADF) is one of the most important cloud ETL and orchestration tools in the modern Azure ecosystem.In real-world enterprise projects, ADF is used for: This guide explains 100 practical ADF scenarios in a tutorial/blog format that is useful for: Section 1 – Core Azure Data Factory (ADF) Scenarios Scenario 1 – Incremental … Read more

What is Data Modeling?

Introduction Data modeling is one of the most important steps in designing and building a database. Before storing information in any system, businesses need a clear structure for how that data will be organized, connected, and maintained. This is where data modeling becomes essential. Think of data modeling as creating a blueprint for a database. … Read more

Materialized Gold Table in Databricks

We are going to give a quick overview of how stored views and materialized views can be created in Databricks. For this demo, we will create our two gold layer entities. We will start by creating a view in the gold layer against our silver table customers_orders. Our view will simply contain some aggregations for … Read more

Stream-Stream & Stream-Static Joins

We are going to see how to use CDF data to propagate changes to downstream tables. For this demo, we will create a new silver table called customers_orders by joining two streams: The orders table with the CDF data of the customers table. We will start by creating a function to upsert ranked updates into … Read more

Change Data Capture (CDC) in Databricks

Change Data Capture or CDC refers to the process of identifying and capturing changes made to data in the data source and then delivering those changes to the target. Those changes could be obviously new records to be inserted from the source to the target. Updated records in the source that need to be reflected … Read more

Delta Lake CDF in Databricks

Learning Objectives Change Data Feed or CDF is a new feature built into Delta Lake that allows to automatically generate CDC feeds about Delta Lake tables. CDF records row-level changes for all the data written into a delta table. These include the raw data along with metadata indicating whether the specified row was inserted, deleted … Read more

300 Real SQL Interview Medium to Advanced SQL Questions

Find the second highest salary from the Employee table. Find duplicate records in a table. Retrieve employees who earn more than their manager. Count employees in each department having more than 5 employees. Find employees who joined in the last 6 months. Get departments with no employees. Write a query to find the median salary. … Read more

Data Engineering Terms You Need to Know

Data Pipeline Data pipeline: Mapping data flow from collection to analysis for organizational insights. Source, process, deliver for analytics efficiency. Database vs Schema vs Table Database: Organized collection of related data, like a file cabinet for storing information. Schema: Blueprint defining structure and organization within a database. Table: Grid-like structure within a schema, where data … Read more

Slowly Changing Dimensions in Databricks

Introduction Slowly Changing Dimensions, or SCD is a data management concept that determines how tables handle data which change over time. For example, whether you want to overwrite values in the table or maybe retain their history. This can be determined by the SCD type to implement. There are three types of SCD which are … Read more