SQL for Customer Cohort Analysis — Retention, Churn, and Lifetime Value by Acquisition Month

Cohort analysis is one of the most valuable analytical patterns in any business — it answers “are customers we acquired last quarter sticking around better than customers from two years ago?” Without cohort analysis, retention metrics are meaningless averages that hide whether your business is improving or decaying.

This tutorial builds a complete cohort analysis pipeline in SQL, from raw order data to a retention matrix ready for a dashboard.


The Dataset

CREATE TABLE orders (
    order_id     INT,
    customer_id  INT,
    amount       DECIMAL(10,2),
    order_date   DATE
);

INSERT INTO orders VALUES
-- Customer 101: first order Jan, orders in Feb and Apr
(1001, 101, 120.00, '2024-01-05'),
(1002, 101,  85.00, '2024-02-12'),
(1003, 101, 200.00, '2024-04-03'),
-- Customer 102: first order Jan, only one order
(1004, 102,  95.00, '2024-01-18'),
-- Customer 103: first order Feb, orders in Feb and Mar
(1005, 103, 150.00, '2024-02-07'),
(1006, 103,  60.00, '2024-03-15'),
-- Customer 104: first order Feb, orders in Feb, Mar, Apr
(1007, 104, 300.00, '2024-02-20'),
(1008, 104, 210.00, '2024-03-22'),
(1009, 104, 175.00, '2024-04-18'),
-- Customer 105: first order Mar, only one order
(1010, 105,  80.00, '2024-03-10'),
-- Customer 106: first order Mar, orders in Mar and May
(1011, 106, 140.00, '2024-03-25'),
(1012, 106,  90.00, '2024-05-10'),
-- Customer 107: first order Apr, orders in Apr and May
(1013, 107, 250.00, '2024-04-08'),
(1014, 107, 180.00, '2024-05-20');

Step 1 — Assign Each Customer to a Cohort (First Order Month)

WITH first_orders AS (
    SELECT
        customer_id,
        MIN(order_date)                                   AS first_order_date,
        DATE_TRUNC('month', MIN(order_date))::DATE        AS cohort_month
    FROM orders
    GROUP BY customer_id
)
SELECT * FROM first_orders ORDER BY customer_id;

Output:

customer_idfirst_order_datecohort_month
1012024-01-052024-01-01
1022024-01-182024-01-01
1032024-02-072024-02-01
1042024-02-202024-02-01
1052024-03-102024-03-01
1062024-03-252024-03-01
1072024-04-082024-04-01

January cohort: customers 101 and 102. February cohort: 103 and 104. And so on.


Step 2 — Calculate Month Index for Each Order

The month index tells you how many months after the first order a customer made each subsequent purchase. Month 0 = the acquisition month, Month 1 = one month later, etc.

WITH first_orders AS (
    SELECT
        customer_id,
        DATE_TRUNC('month', MIN(order_date))::DATE AS cohort_month
    FROM orders
    GROUP BY customer_id
),
order_months AS (
    SELECT
        o.customer_id,
        fo.cohort_month,
        DATE_TRUNC('month', o.order_date)::DATE AS order_month,
        -- Calculate how many months since first order
        (EXTRACT(YEAR FROM o.order_date)  - EXTRACT(YEAR FROM fo.cohort_month)) * 12
        + (EXTRACT(MONTH FROM o.order_date) - EXTRACT(MONTH FROM fo.cohort_month))
            AS month_index
    FROM orders o
    JOIN first_orders fo ON o.customer_id = fo.customer_id
)
SELECT * FROM order_months ORDER BY customer_id, month_index;

Output:

customer_idcohort_monthorder_monthmonth_index
1012024-01-012024-01-010
1012024-01-012024-02-011
1012024-01-012024-04-013
1022024-01-012024-01-010
1032024-02-012024-02-010
1032024-02-012024-03-011
1042024-02-012024-02-010
1042024-02-012024-03-011
1042024-02-012024-04-012

Customer 101 ordered in Month 0 (Jan), Month 1 (Feb), and Month 3 (Apr). No order in Month 2 (Mar) — they churned temporarily.


Step 3 — Build the Retention Matrix

Count unique customers retained in each cohort at each month index.

WITH first_orders AS (
    SELECT
        customer_id,
        DATE_TRUNC('month', MIN(order_date))::DATE AS cohort_month
    FROM orders
    GROUP BY customer_id
),
order_months AS (
    SELECT
        o.customer_id,
        fo.cohort_month,
        (EXTRACT(YEAR FROM o.order_date)  - EXTRACT(YEAR FROM fo.cohort_month)) * 12
        + (EXTRACT(MONTH FROM o.order_date) - EXTRACT(MONTH FROM fo.cohort_month))
            AS month_index
    FROM orders o
    JOIN first_orders fo ON o.customer_id = fo.customer_id
),
cohort_sizes AS (
    SELECT cohort_month, COUNT(DISTINCT customer_id) AS cohort_size
    FROM first_orders
    GROUP BY cohort_month
),
retention_raw AS (
    SELECT
        cohort_month,
        month_index,
        COUNT(DISTINCT customer_id) AS active_customers
    FROM order_months
    GROUP BY cohort_month, month_index
)
SELECT
    r.cohort_month,
    cs.cohort_size,
    r.month_index,
    r.active_customers,
    ROUND(r.active_customers * 100.0 / cs.cohort_size, 1) AS retention_pct
FROM retention_raw r
JOIN cohort_sizes cs ON r.cohort_month = cs.cohort_month
ORDER BY r.cohort_month, r.month_index;

Output:

cohort_monthcohort_sizemonth_indexactive_customersretention_pct
2024-01-01202100.0%
2024-01-0121150.0%
2024-01-0123150.0%
2024-02-01202100.0%
2024-02-01212100.0%
2024-02-0122150.0%
2024-03-01202100.0%
2024-03-0122150.0%
2024-04-01101100.0%
2024-04-01111100.0%

January cohort lost 50% by Month 1. February cohort retained 100% through Month 1 before dropping. This is the insight cohort analysis provides.


Step 4 — Pivot Into a Retention Grid

WITH first_orders AS (
    SELECT customer_id,
           DATE_TRUNC('month', MIN(order_date))::DATE AS cohort_month
    FROM orders GROUP BY customer_id
),
order_months AS (
    SELECT o.customer_id, fo.cohort_month,
        (EXTRACT(YEAR FROM o.order_date) - EXTRACT(YEAR FROM fo.cohort_month)) * 12
        + (EXTRACT(MONTH FROM o.order_date) - EXTRACT(MONTH FROM fo.cohort_month)) AS month_index
    FROM orders o JOIN first_orders fo ON o.customer_id = fo.customer_id
),
cohort_sizes AS (
    SELECT cohort_month, COUNT(DISTINCT customer_id) AS cohort_size FROM first_orders GROUP BY cohort_month
),
retention_raw AS (
    SELECT cohort_month, month_index, COUNT(DISTINCT customer_id) AS retained
    FROM order_months GROUP BY cohort_month, month_index
)
SELECT
    r.cohort_month,
    cs.cohort_size                                                                AS "Size",
    MAX(CASE WHEN month_index = 0 THEN ROUND(retained*100.0/cs.cohort_size,0) END) || '%' AS "M0",
    MAX(CASE WHEN month_index = 1 THEN ROUND(retained*100.0/cs.cohort_size,0) END) || '%' AS "M1",
    MAX(CASE WHEN month_index = 2 THEN ROUND(retained*100.0/cs.cohort_size,0) END) || '%' AS "M2",
    MAX(CASE WHEN month_index = 3 THEN ROUND(retained*100.0/cs.cohort_size,0) END) || '%' AS "M3",
    MAX(CASE WHEN month_index = 4 THEN ROUND(retained*100.0/cs.cohort_size,0) END) || '%' AS "M4"
FROM retention_raw r
JOIN cohort_sizes cs ON r.cohort_month = cs.cohort_month
GROUP BY r.cohort_month, cs.cohort_size
ORDER BY r.cohort_month;

Output — the classic retention grid:

cohort_monthSizeM0M1M2M3M4
2024-01-012100%50%NULL50%NULL
2024-02-012100%100%50%NULLNULL
2024-03-012100%NULL50%NULLNULL
2024-04-011100%100%NULLNULLNULL

This is the exact format used in every SaaS retention dashboard. NULL means no customers from that cohort ordered in that month.


Step 5 — Revenue Cohort Analysis (LTV by Cohort)

Retention tells you if customers came back. Revenue cohort analysis tells you how much they spent.

WITH first_orders AS (
    SELECT customer_id,
           DATE_TRUNC('month', MIN(order_date))::DATE AS cohort_month
    FROM orders GROUP BY customer_id
),
order_months AS (
    SELECT
        o.customer_id,
        o.amount,
        fo.cohort_month,
        (EXTRACT(YEAR FROM o.order_date) - EXTRACT(YEAR FROM fo.cohort_month)) * 12
        + (EXTRACT(MONTH FROM o.order_date) - EXTRACT(MONTH FROM fo.cohort_month)) AS month_index
    FROM orders o JOIN first_orders fo ON o.customer_id = fo.customer_id
),
cohort_sizes AS (
    SELECT cohort_month, COUNT(DISTINCT customer_id) AS cohort_size FROM first_orders GROUP BY cohort_month
)
SELECT
    om.cohort_month,
    cs.cohort_size,
    om.month_index,
    ROUND(SUM(om.amount), 2)                              AS cohort_revenue,
    ROUND(SUM(om.amount) / cs.cohort_size, 2)             AS revenue_per_customer,
    ROUND(SUM(SUM(om.amount)) OVER (
        PARTITION BY om.cohort_month
        ORDER BY om.month_index
        ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
    ) / cs.cohort_size, 2)                                AS cumulative_ltv
FROM order_months om
JOIN cohort_sizes cs ON om.cohort_month = cs.cohort_month
GROUP BY om.cohort_month, om.month_index, cs.cohort_size
ORDER BY om.cohort_month, om.month_index;

Output:

cohort_monthcohort_sizemonth_indexcohort_revenuerevenue_per_customercumulative_ltv
2024-01-0120215.00107.50107.50
2024-01-012185.0042.50150.00
2024-01-0123200.00100.00250.00
2024-02-0120450.00225.00225.00
2024-02-0121270.00135.00360.00
2024-02-0122175.0087.50447.50

February cohort has significantly higher LTV than January — even with the same size. This is the kind of insight that drives acquisition strategy decisions.


Common Mistakes

Mistake 1 — Using calendar months instead of month index

If you group by the actual order month (Jan, Feb, Mar) instead of the month index (0, 1, 2), all cohorts collapse into the same columns and you cannot compare retention across cohorts.

Mistake 2 — Counting orders instead of customers

A customer who ordered 3 times in Month 1 should count as 1 retained customer, not 3. Always use COUNT(DISTINCT customer_id) for retention metrics.

Mistake 3 — Including trial or test accounts in cohorts

If your data has internal test users or free trial accounts, filter them out before building cohorts. They inflate Month 0 size and distort all downstream percentages.


What to Learn Next

Cohort analysis shows how existing customers behave over time. The next tutorial covers funnel analysis — measuring how potential customers progress through acquisition steps, and where they drop off.

Leave a Comment