Materialized Gold Table in Databricks

We are going to give a quick overview of how stored views and materialized views can be created in Databricks. For this demo, we will create our two gold layer entities. We will start by creating a view in the gold layer against our silver table customers_orders. Our view will simply contain some aggregations for … Read more

Delta Lake CDF in Databricks

Learning Objectives Change Data Feed or CDF is a new feature built into Delta Lake that allows to automatically generate CDC feeds about Delta Lake tables. CDF records row-level changes for all the data written into a delta table. These include the raw data along with metadata indicating whether the specified row was inserted, deleted … Read more

300 Real SQL Interview Medium to Advanced SQL Questions

Find the second highest salary from the Employee table. Find duplicate records in a table. Retrieve employees who earn more than their manager. Count employees in each department having more than 5 employees. Find employees who joined in the last 6 months. Get departments with no employees. Write a query to find the median salary. … Read more

Data Engineering Terms You Need to Know

Data Pipeline Data pipeline: Mapping data flow from collection to analysis for organizational insights. Source, process, deliver for analytics efficiency. Database vs Schema vs Table Database: Organized collection of related data, like a file cabinet for storing information. Schema: Blueprint defining structure and organization within a database. Table: Grid-like structure within a schema, where data … Read more

Slowly Changing Dimensions in Databricks

Introduction Slowly Changing Dimensions, or SCD is a data management concept that determines how tables handle data which change over time. For example, whether you want to overwrite values in the table or maybe retain their history. This can be determined by the SCD type to implement. There are three types of SCD which are … Read more

Streaming from Multiplex Bronze in Databricks

In this notebook, we are going to parse raw data from a single topic in our multiplexer bronze table. As you can see, in our architecture diagram, we will create the orders silver table. Let us start by copying our dataset files. Before starting, we need to cast our Kafka binary fields as strings. Here … Read more

Databricks Certified Data Engineer Professional

6 broad categories related to the For Databricks Certified Data Engineer Professional exam topics, which are: Having the Data Engineer Associate certification from Databricks is not a prerequisite to appear for the Professional-level certification exam. However, this exam assumes that you have the skills of a data engineer associate on Databricks platform. The source code … Read more