SQL Running Totals and Moving Averages for Time-Series Pipeline Data

Time-series calculations — running totals, moving averages, period-over-period comparisons — are among the most common requirements in data engineering. Analytics teams need them in dashboard tables, finance teams need them for revenue tracking, and ML pipelines use them as features.

This tutorial covers the full range of time-series SQL patterns with real data, including how to handle gaps in dates that trip up most implementations.

The Dataset

Monthly revenue data per product category, with some months missing (gaps).

CREATE TABLE daily_revenue (
    revenue_date  DATE,
    category      VARCHAR(30),
    revenue       DECIMAL(12,2),
    order_count   INT
);

INSERT INTO daily_revenue VALUES
('2024-01-01', 'Hardware', 12000.00, 45),
('2024-01-01', 'Software',  3000.00, 30),
('2024-02-01', 'Hardware', 15000.00, 52),
('2024-02-01', 'Software',  4500.00, 40),
('2024-03-01', 'Hardware',  9000.00, 35),
('2024-03-01', 'Software',  5000.00, 45),
('2024-04-01', 'Hardware', 18000.00, 65),
-- April Software is missing (gap)
('2024-05-01', 'Hardware', 16000.00, 60),
('2024-05-01', 'Software',  6000.00, 55),
('2024-06-01', 'Hardware', 20000.00, 72),
('2024-06-01', 'Software',  7500.00, 62);

Running Total (Cumulative Sum)

A running total accumulates the sum from the first row up to the current row, ordered by date.

SELECT
    revenue_date,
    category,
    revenue,
    SUM(revenue) OVER (
        PARTITION BY category
        ORDER BY revenue_date
        ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
    ) AS running_total
FROM daily_revenue
ORDER BY category, revenue_date;

Output:

revenue_datecategoryrevenuerunning_total
2024-01-01Hardware12000.0012000.00
2024-02-01Hardware15000.0027000.00
2024-03-01Hardware9000.0036000.00
2024-04-01Hardware18000.0054000.00
2024-05-01Hardware16000.0070000.00
2024-06-01Hardware20000.0090000.00
2024-01-01Software3000.003000.00
2024-02-01Software4500.007500.00
2024-03-01Software5000.0012500.00
2024-05-01Software6000.0018500.00
2024-06-01Software7500.0026000.00

PARTITION BY category means each category has its own independent running total that resets at the start.

Pipeline use case: Year-to-date revenue tables, quota attainment tracking, cumulative event counts in clickstream analysis.

Running Total Percentage (Running % of Total)

Shows what percentage of the full-period total has been reached at each point in time.

WITH totals AS (
    SELECT
        category,
        SUM(revenue) AS total_revenue
    FROM daily_revenue
    GROUP BY category
)
SELECT
    d.revenue_date,
    d.category,
    d.revenue,
    SUM(d.revenue) OVER (
        PARTITION BY d.category
        ORDER BY d.revenue_date
    ) AS running_total,
    ROUND(
        SUM(d.revenue) OVER (
            PARTITION BY d.category
            ORDER BY d.revenue_date
        ) * 100.0 / t.total_revenue,
        1
    ) AS pct_of_total
FROM daily_revenue d
JOIN totals t ON d.category = t.category
ORDER BY d.category, d.revenue_date;

Output (Hardware only, abbreviated):

revenue_datecategoryrevenuerunning_totalpct_of_total
2024-01-01Hardware12000.0012000.0013.3%
2024-02-01Hardware15000.0027000.0030.0%
2024-03-01Hardware9000.0036000.0040.0%
2024-04-01Hardware18000.0054000.0060.0%
2024-05-01Hardware16000.0070000.0077.8%
2024-06-01Hardware20000.0090000.00100.0%

Moving Average — Fixed Window

A moving average calculates the average over a sliding window of N rows. The window slides forward one row at a time.

SELECT
    revenue_date,
    category,
    revenue,
    ROUND(
        AVG(revenue) OVER (
            PARTITION BY category
            ORDER BY revenue_date
            ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
        ), 2
    ) AS moving_avg_3mo
FROM daily_revenue
ORDER BY category, revenue_date;

Output (Hardware):

revenue_datecategoryrevenuemoving_avg_3mo
2024-01-01Hardware12000.0012000.00
2024-02-01Hardware15000.0013500.00
2024-03-01Hardware9000.0012000.00
2024-04-01Hardware18000.0014000.00
2024-05-01Hardware16000.0014333.33
2024-06-01Hardware20000.0018000.00

ROWS BETWEEN 2 PRECEDING AND CURRENT ROW = current row + 2 rows before = 3-month window.

Important: This window is based on physical rows, not calendar months. If there is a gap in your data (April Software is missing), the window will span non-consecutive months. To fix this, fill the gaps first — see the gap-filling section below.

Period-Over-Period Comparison (Month-Over-Month, Year-Over-Year)

SELECT
    revenue_date,
    category,
    revenue,
    LAG(revenue, 1) OVER (PARTITION BY category ORDER BY revenue_date) AS prev_month_revenue,
    revenue - LAG(revenue, 1) OVER (PARTITION BY category ORDER BY revenue_date) AS mom_change,
    ROUND(
        (revenue - LAG(revenue, 1) OVER (PARTITION BY category ORDER BY revenue_date))
        * 100.0
        / NULLIF(LAG(revenue, 1) OVER (PARTITION BY category ORDER BY revenue_date), 0),
        1
    ) AS mom_pct_change
FROM daily_revenue
ORDER BY category, revenue_date;

Output (Hardware):

revenue_daterevenueprev_month_revenuemom_changemom_pct_change
2024-01-0112000.00NULLNULLNULL
2024-02-0115000.0012000.00+3000.00+25.0%
2024-03-019000.0015000.00-6000.00-40.0%
2024-04-0118000.009000.00+9000.00+100.0%
2024-05-0116000.0018000.00-2000.00-11.1%
2024-06-0120000.0016000.00+4000.00+25.0%

NULLIF(..., 0) prevents division by zero when prev_month_revenue = 0.

For year-over-year, change LAG(revenue, 1) to LAG(revenue, 12) (for monthly data spanning multiple years).

Handling Date Gaps — The Biggest Pitfall

The Software category is missing April data. If you calculate a 3-month moving average without filling this gap, May’s average incorrectly includes March and February as the “previous 2 months” — skipping the gap entirely.

Step 1: Generate a complete date spine

-- PostgreSQL / most databases
WITH date_spine AS (
    SELECT
        generate_series(
            '2024-01-01'::DATE,
            '2024-06-01'::DATE,
            '1 month'::INTERVAL
        )::DATE AS revenue_date
),
categories AS (
    SELECT DISTINCT category FROM daily_revenue
),
full_grid AS (
    SELECT d.revenue_date, c.category
    FROM date_spine d
    CROSS JOIN categories c
)
SELECT
    g.revenue_date,
    g.category,
    COALESCE(r.revenue, 0) AS revenue,  -- fill missing months with 0
    r.revenue IS NULL AS is_gap         -- flag for reference
FROM full_grid g
LEFT JOIN daily_revenue r
    ON g.revenue_date = r.revenue_date
    AND g.category = r.category
ORDER BY g.category, g.revenue_date;

Output (Software with gap filled):

revenue_datecategoryrevenueis_gap
2024-01-01Software3000.00false
2024-02-01Software4500.00false
2024-03-01Software5000.00false
2024-04-01Software0.00true
2024-05-01Software6000.00false
2024-06-01Software7500.00false

Step 2: Apply moving average to the gap-filled data

WITH date_spine AS (
    SELECT generate_series('2024-01-01'::DATE, '2024-06-01'::DATE, '1 month'::INTERVAL)::DATE AS d
),
full_grid AS (
    SELECT d.d AS revenue_date, c.category
    FROM date_spine d
    CROSS JOIN (SELECT DISTINCT category FROM daily_revenue) c
),
gap_filled AS (
    SELECT
        g.revenue_date,
        g.category,
        COALESCE(r.revenue, 0) AS revenue
    FROM full_grid g
    LEFT JOIN daily_revenue r ON g.revenue_date = r.revenue_date AND g.category = r.category
)
SELECT
    revenue_date,
    category,
    revenue,
    ROUND(
        AVG(revenue) OVER (
            PARTITION BY category
            ORDER BY revenue_date
            ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
        ), 2
    ) AS moving_avg_3mo
FROM gap_filled
ORDER BY category, revenue_date;

Now April’s gap (0) is properly included in the 3-month window, giving an accurate moving average.

Databricks/Spark SQL date spine:

-- Spark SQL: use SEQUENCE instead of generate_series
SELECT EXPLODE(SEQUENCE(DATE '2024-01-01', DATE '2024-06-01', INTERVAL 1 MONTH)) AS revenue_date

Percentile and Distribution Analysis

For anomaly detection in pipelines, you need to understand the distribution of your values.

SELECT
    category,
    COUNT(*)                                              AS months,
    ROUND(AVG(revenue), 2)                                AS avg_revenue,
    ROUND(STDDEV(revenue), 2)                             AS stddev_revenue,
    PERCENTILE_CONT(0.25) WITHIN GROUP (ORDER BY revenue) AS p25,
    PERCENTILE_CONT(0.50) WITHIN GROUP (ORDER BY revenue) AS median,
    PERCENTILE_CONT(0.75) WITHIN GROUP (ORDER BY revenue) AS p75,
    PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY revenue) AS p95,
    MIN(revenue)                                          AS min_revenue,
    MAX(revenue)                                          AS max_revenue
FROM daily_revenue
GROUP BY category;

Output:

categoryavg_revenuestddev_revenuep25medianp75p95
Hardware15000.003899.2412750.0015500.0017500.0019600.00
Software5200.001597.923750.005000.006375.007350.00

Pipeline use case: Run this against your incoming batch and compare to historical percentiles. If today’s average is below the historical p25 or above the p95, flag it as a potential anomaly before loading.

Common Mistakes

Mistake 1 — Forgetting PARTITION BY on running totals

Without PARTITION BY, the running total accumulates across all categories combined — not independently per category.

Mistake 2 — Using RANGE instead of ROWS for moving averages

RANGE BETWEEN 2 PRECEDING AND CURRENT ROW works on value ranges, not row counts. On a date column with no ties this behaves like ROWS, but it produces unexpected results when dates repeat. Always use ROWS BETWEEN for moving windows.

Mistake 3 — Not handling gaps before applying window functions

Moving averages and period comparisons give incorrect results when your time series has missing periods. Always build a date spine and LEFT JOIN your data to it before applying window functions.

Mistake 4 — NULLIF not applied to denominator in pct change

If a previous period had zero revenue, (current - 0) / 0 causes a division by zero error. Always wrap the denominator in NULLIF(value, 0).

What to Learn Next

Time-series aggregations are typically the final step in your Gold layer. The next tutorial covers how to detect and handle common SQL mistakes that silently break data pipelines — covering NULL propagation, implicit type casting, and timezone traps.

Leave a Comment