Time-series calculations — running totals, moving averages, period-over-period comparisons — are among the most common requirements in data engineering. Analytics teams need them in dashboard tables, finance teams need them for revenue tracking, and ML pipelines use them as features.
This tutorial covers the full range of time-series SQL patterns with real data, including how to handle gaps in dates that trip up most implementations.
The Dataset
Monthly revenue data per product category, with some months missing (gaps).
CREATE TABLE daily_revenue (
revenue_date DATE,
category VARCHAR(30),
revenue DECIMAL(12,2),
order_count INT
);
INSERT INTO daily_revenue VALUES
('2024-01-01', 'Hardware', 12000.00, 45),
('2024-01-01', 'Software', 3000.00, 30),
('2024-02-01', 'Hardware', 15000.00, 52),
('2024-02-01', 'Software', 4500.00, 40),
('2024-03-01', 'Hardware', 9000.00, 35),
('2024-03-01', 'Software', 5000.00, 45),
('2024-04-01', 'Hardware', 18000.00, 65),
-- April Software is missing (gap)
('2024-05-01', 'Hardware', 16000.00, 60),
('2024-05-01', 'Software', 6000.00, 55),
('2024-06-01', 'Hardware', 20000.00, 72),
('2024-06-01', 'Software', 7500.00, 62);
Running Total (Cumulative Sum)
A running total accumulates the sum from the first row up to the current row, ordered by date.
SELECT
revenue_date,
category,
revenue,
SUM(revenue) OVER (
PARTITION BY category
ORDER BY revenue_date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS running_total
FROM daily_revenue
ORDER BY category, revenue_date;
Output:
| revenue_date | category | revenue | running_total |
|---|---|---|---|
| 2024-01-01 | Hardware | 12000.00 | 12000.00 |
| 2024-02-01 | Hardware | 15000.00 | 27000.00 |
| 2024-03-01 | Hardware | 9000.00 | 36000.00 |
| 2024-04-01 | Hardware | 18000.00 | 54000.00 |
| 2024-05-01 | Hardware | 16000.00 | 70000.00 |
| 2024-06-01 | Hardware | 20000.00 | 90000.00 |
| 2024-01-01 | Software | 3000.00 | 3000.00 |
| 2024-02-01 | Software | 4500.00 | 7500.00 |
| 2024-03-01 | Software | 5000.00 | 12500.00 |
| 2024-05-01 | Software | 6000.00 | 18500.00 |
| 2024-06-01 | Software | 7500.00 | 26000.00 |
PARTITION BY category means each category has its own independent running total that resets at the start.
Pipeline use case: Year-to-date revenue tables, quota attainment tracking, cumulative event counts in clickstream analysis.
Running Total Percentage (Running % of Total)
Shows what percentage of the full-period total has been reached at each point in time.
WITH totals AS (
SELECT
category,
SUM(revenue) AS total_revenue
FROM daily_revenue
GROUP BY category
)
SELECT
d.revenue_date,
d.category,
d.revenue,
SUM(d.revenue) OVER (
PARTITION BY d.category
ORDER BY d.revenue_date
) AS running_total,
ROUND(
SUM(d.revenue) OVER (
PARTITION BY d.category
ORDER BY d.revenue_date
) * 100.0 / t.total_revenue,
1
) AS pct_of_total
FROM daily_revenue d
JOIN totals t ON d.category = t.category
ORDER BY d.category, d.revenue_date;
Output (Hardware only, abbreviated):
| revenue_date | category | revenue | running_total | pct_of_total |
|---|---|---|---|---|
| 2024-01-01 | Hardware | 12000.00 | 12000.00 | 13.3% |
| 2024-02-01 | Hardware | 15000.00 | 27000.00 | 30.0% |
| 2024-03-01 | Hardware | 9000.00 | 36000.00 | 40.0% |
| 2024-04-01 | Hardware | 18000.00 | 54000.00 | 60.0% |
| 2024-05-01 | Hardware | 16000.00 | 70000.00 | 77.8% |
| 2024-06-01 | Hardware | 20000.00 | 90000.00 | 100.0% |
Moving Average — Fixed Window
A moving average calculates the average over a sliding window of N rows. The window slides forward one row at a time.
SELECT
revenue_date,
category,
revenue,
ROUND(
AVG(revenue) OVER (
PARTITION BY category
ORDER BY revenue_date
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
), 2
) AS moving_avg_3mo
FROM daily_revenue
ORDER BY category, revenue_date;
Output (Hardware):
| revenue_date | category | revenue | moving_avg_3mo |
|---|---|---|---|
| 2024-01-01 | Hardware | 12000.00 | 12000.00 |
| 2024-02-01 | Hardware | 15000.00 | 13500.00 |
| 2024-03-01 | Hardware | 9000.00 | 12000.00 |
| 2024-04-01 | Hardware | 18000.00 | 14000.00 |
| 2024-05-01 | Hardware | 16000.00 | 14333.33 |
| 2024-06-01 | Hardware | 20000.00 | 18000.00 |
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW = current row + 2 rows before = 3-month window.
Important: This window is based on physical rows, not calendar months. If there is a gap in your data (April Software is missing), the window will span non-consecutive months. To fix this, fill the gaps first — see the gap-filling section below.
Period-Over-Period Comparison (Month-Over-Month, Year-Over-Year)
SELECT
revenue_date,
category,
revenue,
LAG(revenue, 1) OVER (PARTITION BY category ORDER BY revenue_date) AS prev_month_revenue,
revenue - LAG(revenue, 1) OVER (PARTITION BY category ORDER BY revenue_date) AS mom_change,
ROUND(
(revenue - LAG(revenue, 1) OVER (PARTITION BY category ORDER BY revenue_date))
* 100.0
/ NULLIF(LAG(revenue, 1) OVER (PARTITION BY category ORDER BY revenue_date), 0),
1
) AS mom_pct_change
FROM daily_revenue
ORDER BY category, revenue_date;
Output (Hardware):
| revenue_date | revenue | prev_month_revenue | mom_change | mom_pct_change |
|---|---|---|---|---|
| 2024-01-01 | 12000.00 | NULL | NULL | NULL |
| 2024-02-01 | 15000.00 | 12000.00 | +3000.00 | +25.0% |
| 2024-03-01 | 9000.00 | 15000.00 | -6000.00 | -40.0% |
| 2024-04-01 | 18000.00 | 9000.00 | +9000.00 | +100.0% |
| 2024-05-01 | 16000.00 | 18000.00 | -2000.00 | -11.1% |
| 2024-06-01 | 20000.00 | 16000.00 | +4000.00 | +25.0% |
NULLIF(..., 0) prevents division by zero when prev_month_revenue = 0.
For year-over-year, change LAG(revenue, 1) to LAG(revenue, 12) (for monthly data spanning multiple years).
Handling Date Gaps — The Biggest Pitfall
The Software category is missing April data. If you calculate a 3-month moving average without filling this gap, May’s average incorrectly includes March and February as the “previous 2 months” — skipping the gap entirely.
Step 1: Generate a complete date spine
-- PostgreSQL / most databases
WITH date_spine AS (
SELECT
generate_series(
'2024-01-01'::DATE,
'2024-06-01'::DATE,
'1 month'::INTERVAL
)::DATE AS revenue_date
),
categories AS (
SELECT DISTINCT category FROM daily_revenue
),
full_grid AS (
SELECT d.revenue_date, c.category
FROM date_spine d
CROSS JOIN categories c
)
SELECT
g.revenue_date,
g.category,
COALESCE(r.revenue, 0) AS revenue, -- fill missing months with 0
r.revenue IS NULL AS is_gap -- flag for reference
FROM full_grid g
LEFT JOIN daily_revenue r
ON g.revenue_date = r.revenue_date
AND g.category = r.category
ORDER BY g.category, g.revenue_date;
Output (Software with gap filled):
| revenue_date | category | revenue | is_gap |
|---|---|---|---|
| 2024-01-01 | Software | 3000.00 | false |
| 2024-02-01 | Software | 4500.00 | false |
| 2024-03-01 | Software | 5000.00 | false |
| 2024-04-01 | Software | 0.00 | true |
| 2024-05-01 | Software | 6000.00 | false |
| 2024-06-01 | Software | 7500.00 | false |
Step 2: Apply moving average to the gap-filled data
WITH date_spine AS (
SELECT generate_series('2024-01-01'::DATE, '2024-06-01'::DATE, '1 month'::INTERVAL)::DATE AS d
),
full_grid AS (
SELECT d.d AS revenue_date, c.category
FROM date_spine d
CROSS JOIN (SELECT DISTINCT category FROM daily_revenue) c
),
gap_filled AS (
SELECT
g.revenue_date,
g.category,
COALESCE(r.revenue, 0) AS revenue
FROM full_grid g
LEFT JOIN daily_revenue r ON g.revenue_date = r.revenue_date AND g.category = r.category
)
SELECT
revenue_date,
category,
revenue,
ROUND(
AVG(revenue) OVER (
PARTITION BY category
ORDER BY revenue_date
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
), 2
) AS moving_avg_3mo
FROM gap_filled
ORDER BY category, revenue_date;
Now April’s gap (0) is properly included in the 3-month window, giving an accurate moving average.
Databricks/Spark SQL date spine:
-- Spark SQL: use SEQUENCE instead of generate_series
SELECT EXPLODE(SEQUENCE(DATE '2024-01-01', DATE '2024-06-01', INTERVAL 1 MONTH)) AS revenue_date
Percentile and Distribution Analysis
For anomaly detection in pipelines, you need to understand the distribution of your values.
SELECT
category,
COUNT(*) AS months,
ROUND(AVG(revenue), 2) AS avg_revenue,
ROUND(STDDEV(revenue), 2) AS stddev_revenue,
PERCENTILE_CONT(0.25) WITHIN GROUP (ORDER BY revenue) AS p25,
PERCENTILE_CONT(0.50) WITHIN GROUP (ORDER BY revenue) AS median,
PERCENTILE_CONT(0.75) WITHIN GROUP (ORDER BY revenue) AS p75,
PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY revenue) AS p95,
MIN(revenue) AS min_revenue,
MAX(revenue) AS max_revenue
FROM daily_revenue
GROUP BY category;
Output:
| category | avg_revenue | stddev_revenue | p25 | median | p75 | p95 |
|---|---|---|---|---|---|---|
| Hardware | 15000.00 | 3899.24 | 12750.00 | 15500.00 | 17500.00 | 19600.00 |
| Software | 5200.00 | 1597.92 | 3750.00 | 5000.00 | 6375.00 | 7350.00 |
Pipeline use case: Run this against your incoming batch and compare to historical percentiles. If today’s average is below the historical p25 or above the p95, flag it as a potential anomaly before loading.
Common Mistakes
Mistake 1 — Forgetting PARTITION BY on running totals
Without PARTITION BY, the running total accumulates across all categories combined — not independently per category.
Mistake 2 — Using RANGE instead of ROWS for moving averages
RANGE BETWEEN 2 PRECEDING AND CURRENT ROW works on value ranges, not row counts. On a date column with no ties this behaves like ROWS, but it produces unexpected results when dates repeat. Always use ROWS BETWEEN for moving windows.
Mistake 3 — Not handling gaps before applying window functions
Moving averages and period comparisons give incorrect results when your time series has missing periods. Always build a date spine and LEFT JOIN your data to it before applying window functions.
Mistake 4 — NULLIF not applied to denominator in pct change
If a previous period had zero revenue, (current - 0) / 0 causes a division by zero error. Always wrap the denominator in NULLIF(value, 0).
What to Learn Next
Time-series aggregations are typically the final step in your Gold layer. The next tutorial covers how to detect and handle common SQL mistakes that silently break data pipelines — covering NULL propagation, implicit type casting, and timezone traps.