SQL Running Totals and Moving Averages for Time-Series Pipeline Data

Time-series calculations — running totals, moving averages, period-over-period comparisons — are among the most common requirements in data engineering. Analytics teams need them in dashboard tables, finance teams need them for revenue tracking, and ML pipelines use them as features.

This tutorial covers the full range of time-series SQL patterns with real data, including how to handle gaps in dates that trip up most implementations.

The Dataset

Monthly revenue data per product category, with some months missing (gaps).

CREATE TABLE daily_revenue (
    revenue_date  DATE,
    category      VARCHAR(30),
    revenue       DECIMAL(12,2),
    order_count   INT
);

INSERT INTO daily_revenue VALUES
('2024-01-01', 'Hardware', 12000.00, 45),
('2024-01-01', 'Software',  3000.00, 30),
('2024-02-01', 'Hardware', 15000.00, 52),
('2024-02-01', 'Software',  4500.00, 40),
('2024-03-01', 'Hardware',  9000.00, 35),
('2024-03-01', 'Software',  5000.00, 45),
('2024-04-01', 'Hardware', 18000.00, 65),
-- April Software is missing (gap)
('2024-05-01', 'Hardware', 16000.00, 60),
('2024-05-01', 'Software',  6000.00, 55),
('2024-06-01', 'Hardware', 20000.00, 72),
('2024-06-01', 'Software',  7500.00, 62);

Running Total (Cumulative Sum)

A running total accumulates the sum from the first row up to the current row, ordered by date.

SELECT
    revenue_date,
    category,
    revenue,
    SUM(revenue) OVER (
        PARTITION BY category
        ORDER BY revenue_date
        ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
    ) AS running_total
FROM daily_revenue
ORDER BY category, revenue_date;

Output:

revenue_date	category	revenue	running_total
2024-01-01	Hardware	12000.00	12000.00
2024-02-01	Hardware	15000.00	27000.00
2024-03-01	Hardware	9000.00	36000.00
2024-04-01	Hardware	18000.00	54000.00
2024-05-01	Hardware	16000.00	70000.00
2024-06-01	Hardware	20000.00	90000.00
2024-01-01	Software	3000.00	3000.00
2024-02-01	Software	4500.00	7500.00
2024-03-01	Software	5000.00	12500.00
2024-05-01	Software	6000.00	18500.00
2024-06-01	Software	7500.00	26000.00

PARTITION BY category means each category has its own independent running total that resets at the start.

Pipeline use case: Year-to-date revenue tables, quota attainment tracking, cumulative event counts in clickstream analysis.

Running Total Percentage (Running % of Total)

Shows what percentage of the full-period total has been reached at each point in time.

WITH totals AS (
    SELECT
        category,
        SUM(revenue) AS total_revenue
    FROM daily_revenue
    GROUP BY category
)
SELECT
    d.revenue_date,
    d.category,
    d.revenue,
    SUM(d.revenue) OVER (
        PARTITION BY d.category
        ORDER BY d.revenue_date
    ) AS running_total,
    ROUND(
        SUM(d.revenue) OVER (
            PARTITION BY d.category
            ORDER BY d.revenue_date
        ) * 100.0 / t.total_revenue,
        1
    ) AS pct_of_total
FROM daily_revenue d
JOIN totals t ON d.category = t.category
ORDER BY d.category, d.revenue_date;

Output (Hardware only, abbreviated):

revenue_date	category	revenue	running_total	pct_of_total
2024-01-01	Hardware	12000.00	12000.00	13.3%
2024-02-01	Hardware	15000.00	27000.00	30.0%
2024-03-01	Hardware	9000.00	36000.00	40.0%
2024-04-01	Hardware	18000.00	54000.00	60.0%
2024-05-01	Hardware	16000.00	70000.00	77.8%
2024-06-01	Hardware	20000.00	90000.00	100.0%

Moving Average — Fixed Window

A moving average calculates the average over a sliding window of N rows. The window slides forward one row at a time.

SELECT
    revenue_date,
    category,
    revenue,
    ROUND(
        AVG(revenue) OVER (
            PARTITION BY category
            ORDER BY revenue_date
            ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
        ), 2
    ) AS moving_avg_3mo
FROM daily_revenue
ORDER BY category, revenue_date;

Output (Hardware):

revenue_date	category	revenue	moving_avg_3mo
2024-01-01	Hardware	12000.00	12000.00
2024-02-01	Hardware	15000.00	13500.00
2024-03-01	Hardware	9000.00	12000.00
2024-04-01	Hardware	18000.00	14000.00
2024-05-01	Hardware	16000.00	14333.33
2024-06-01	Hardware	20000.00	18000.00

ROWS BETWEEN 2 PRECEDING AND CURRENT ROW = current row + 2 rows before = 3-month window.

Important: This window is based on physical rows, not calendar months. If there is a gap in your data (April Software is missing), the window will span non-consecutive months. To fix this, fill the gaps first — see the gap-filling section below.

Period-Over-Period Comparison (Month-Over-Month, Year-Over-Year)

SELECT
    revenue_date,
    category,
    revenue,
    LAG(revenue, 1) OVER (PARTITION BY category ORDER BY revenue_date) AS prev_month_revenue,
    revenue - LAG(revenue, 1) OVER (PARTITION BY category ORDER BY revenue_date) AS mom_change,
    ROUND(
        (revenue - LAG(revenue, 1) OVER (PARTITION BY category ORDER BY revenue_date))
        * 100.0
        / NULLIF(LAG(revenue, 1) OVER (PARTITION BY category ORDER BY revenue_date), 0),
        1
    ) AS mom_pct_change
FROM daily_revenue
ORDER BY category, revenue_date;

Output (Hardware):

revenue_date	revenue	prev_month_revenue	mom_change	mom_pct_change
2024-01-01	12000.00	NULL	NULL	NULL
2024-02-01	15000.00	12000.00	+3000.00	+25.0%
2024-03-01	9000.00	15000.00	-6000.00	-40.0%
2024-04-01	18000.00	9000.00	+9000.00	+100.0%
2024-05-01	16000.00	18000.00	-2000.00	-11.1%
2024-06-01	20000.00	16000.00	+4000.00	+25.0%

NULLIF(..., 0) prevents division by zero when prev_month_revenue = 0.

For year-over-year, change LAG(revenue, 1) to LAG(revenue, 12) (for monthly data spanning multiple years).

Handling Date Gaps — The Biggest Pitfall

The Software category is missing April data. If you calculate a 3-month moving average without filling this gap, May’s average incorrectly includes March and February as the “previous 2 months” — skipping the gap entirely.

Step 1: Generate a complete date spine

-- PostgreSQL / most databases
WITH date_spine AS (
    SELECT
        generate_series(
            '2024-01-01'::DATE,
            '2024-06-01'::DATE,
            '1 month'::INTERVAL
        )::DATE AS revenue_date
),
categories AS (
    SELECT DISTINCT category FROM daily_revenue
),
full_grid AS (
    SELECT d.revenue_date, c.category
    FROM date_spine d
    CROSS JOIN categories c
)
SELECT
    g.revenue_date,
    g.category,
    COALESCE(r.revenue, 0) AS revenue,  -- fill missing months with 0
    r.revenue IS NULL AS is_gap         -- flag for reference
FROM full_grid g
LEFT JOIN daily_revenue r
    ON g.revenue_date = r.revenue_date
    AND g.category = r.category
ORDER BY g.category, g.revenue_date;

Output (Software with gap filled):

revenue_date	category	revenue	is_gap
2024-01-01	Software	3000.00	false
2024-02-01	Software	4500.00	false
2024-03-01	Software	5000.00	false
2024-04-01	Software	0.00	true
2024-05-01	Software	6000.00	false
2024-06-01	Software	7500.00	false

Step 2: Apply moving average to the gap-filled data

WITH date_spine AS (
    SELECT generate_series('2024-01-01'::DATE, '2024-06-01'::DATE, '1 month'::INTERVAL)::DATE AS d
),
full_grid AS (
    SELECT d.d AS revenue_date, c.category
    FROM date_spine d
    CROSS JOIN (SELECT DISTINCT category FROM daily_revenue) c
),
gap_filled AS (
    SELECT
        g.revenue_date,
        g.category,
        COALESCE(r.revenue, 0) AS revenue
    FROM full_grid g
    LEFT JOIN daily_revenue r ON g.revenue_date = r.revenue_date AND g.category = r.category
)
SELECT
    revenue_date,
    category,
    revenue,
    ROUND(
        AVG(revenue) OVER (
            PARTITION BY category
            ORDER BY revenue_date
            ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
        ), 2
    ) AS moving_avg_3mo
FROM gap_filled
ORDER BY category, revenue_date;

Now April’s gap (0) is properly included in the 3-month window, giving an accurate moving average.

Databricks/Spark SQL date spine:

-- Spark SQL: use SEQUENCE instead of generate_series
SELECT EXPLODE(SEQUENCE(DATE '2024-01-01', DATE '2024-06-01', INTERVAL 1 MONTH)) AS revenue_date

Percentile and Distribution Analysis

For anomaly detection in pipelines, you need to understand the distribution of your values.

SELECT
    category,
    COUNT(*)                                              AS months,
    ROUND(AVG(revenue), 2)                                AS avg_revenue,
    ROUND(STDDEV(revenue), 2)                             AS stddev_revenue,
    PERCENTILE_CONT(0.25) WITHIN GROUP (ORDER BY revenue) AS p25,
    PERCENTILE_CONT(0.50) WITHIN GROUP (ORDER BY revenue) AS median,
    PERCENTILE_CONT(0.75) WITHIN GROUP (ORDER BY revenue) AS p75,
    PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY revenue) AS p95,
    MIN(revenue)                                          AS min_revenue,
    MAX(revenue)                                          AS max_revenue
FROM daily_revenue
GROUP BY category;

Output:

category	avg_revenue	stddev_revenue	p25	median	p75	p95
Hardware	15000.00	3899.24	12750.00	15500.00	17500.00	19600.00
Software	5200.00	1597.92	3750.00	5000.00	6375.00	7350.00

Pipeline use case: Run this against your incoming batch and compare to historical percentiles. If today’s average is below the historical p25 or above the p95, flag it as a potential anomaly before loading.

Common Mistakes

Mistake 1 — Forgetting PARTITION BY on running totals

Without PARTITION BY, the running total accumulates across all categories combined — not independently per category.

Mistake 2 — Using RANGE instead of ROWS for moving averages

RANGE BETWEEN 2 PRECEDING AND CURRENT ROW works on value ranges, not row counts. On a date column with no ties this behaves like ROWS, but it produces unexpected results when dates repeat. Always use ROWS BETWEEN for moving windows.

Mistake 3 — Not handling gaps before applying window functions

Moving averages and period comparisons give incorrect results when your time series has missing periods. Always build a date spine and LEFT JOIN your data to it before applying window functions.

Mistake 4 — NULLIF not applied to denominator in pct change

If a previous period had zero revenue, (current - 0) / 0 causes a division by zero error. Always wrap the denominator in NULLIF(value, 0).

What to Learn Next

Time-series aggregations are typically the final step in your Gold layer. The next tutorial covers how to detect and handle common SQL mistakes that silently break data pipelines — covering NULL propagation, implicit type casting, and timezone traps.