SQL for RFM Customer Segmentation — Recency, Frequency, Monetary Analysis

RFM segmentation is the most practical customer segmentation model for e-commerce and subscription businesses. It classifies every customer on three dimensions: how recently they purchased (Recency), how often they purchase (Frequency), and how much they spend (Monetary). The combination of these three scores places every customer in a segment that tells you exactly how to treat them.

This tutorial builds the complete RFM pipeline in SQL from raw orders to labeled segments.


The Dataset

CREATE TABLE orders (
    order_id    INT,
    customer_id INT,
    amount      DECIMAL(10,2),
    order_date  DATE
);

INSERT INTO orders VALUES
(1001, 201, 250.00, '2024-01-05'),
(1002, 201, 180.00, '2024-02-10'),
(1003, 201,  90.00, '2024-03-20'),
(1004, 202, 500.00, '2023-06-15'),
(1005, 203, 120.00, '2024-03-01'),
(1006, 203,  85.00, '2024-03-25'),
(1007, 203,  95.00, '2024-04-10'),
(1008, 204, 750.00, '2024-04-05'),
(1009, 204, 620.00, '2024-04-18'),
(1010, 205,  30.00, '2022-11-20'),
(1011, 206, 200.00, '2024-04-01'),
(1012, 206, 175.00, '2024-04-15'),
(1013, 207,  45.00, '2024-01-30'),
(1014, 208, 900.00, '2024-04-20'),
(1015, 209,  60.00, '2023-12-01'),
(1016, 210, 310.00, '2024-02-28'),
(1017, 210, 420.00, '2024-04-12');

Analysis date: 2024-04-30 (the “today” for our recency calculation).


Step 1 — Calculate Raw RFM Values Per Customer

WITH rfm_raw AS (
    SELECT
        customer_id,
        MAX(order_date)                          AS last_order_date,
        -- Recency: days since last order (lower = more recent = better)
        DATE '2024-04-30' - MAX(order_date)      AS recency_days,
        COUNT(*)                                  AS frequency,
        SUM(amount)                               AS monetary
    FROM orders
    GROUP BY customer_id
)
SELECT * FROM rfm_raw ORDER BY recency_days;

Output:

customer_idlast_order_daterecency_daysfrequencymonetary
2082024-04-20101900.00
2042024-04-181221370.00
2062024-04-15152375.00
2102024-04-12182730.00
2072024-04-1020145.00
2032024-04-10203300.00
2012024-03-20413520.00
2052022-11-20527130.00

Step 2 — Score Each Dimension 1–5 Using NTILE

We divide customers into 5 equal-sized buckets per dimension. Score 5 = best, Score 1 = worst.

WITH rfm_raw AS (
    SELECT
        customer_id,
        DATE '2024-04-30' - MAX(order_date) AS recency_days,
        COUNT(*)                             AS frequency,
        SUM(amount)                          AS monetary
    FROM orders
    GROUP BY customer_id
),
rfm_scores AS (
    SELECT
        customer_id,
        recency_days,
        frequency,
        monetary,
        -- Recency: lower days = better, so reverse the order (ASC = 5)
        NTILE(5) OVER (ORDER BY recency_days ASC)  AS r_score,
        -- Frequency: higher = better
        NTILE(5) OVER (ORDER BY frequency DESC)    AS f_score,
        -- Monetary: higher = better
        NTILE(5) OVER (ORDER BY monetary DESC)     AS m_score
    FROM rfm_raw
)
SELECT * FROM rfm_scores ORDER BY r_score DESC, f_score DESC;

Output:

customer_idrecency_daysfrequencymonetaryr_scoref_scorem_score
208101900.00535
2041221370.00555
206152375.00553
210182730.00554
20720145.00431
203203300.00452
201413520.00353
209150160.00231
2023191500.00233
205527130.00131

Step 3 — Assign Segment Labels

With R, F, M scores in hand, use business rules to label each customer into a meaningful segment.

WITH rfm_raw AS (
    SELECT customer_id,
           DATE '2024-04-30' - MAX(order_date) AS recency_days,
           COUNT(*) AS frequency, SUM(amount) AS monetary
    FROM orders GROUP BY customer_id
),
rfm_scores AS (
    SELECT customer_id, recency_days, frequency, monetary,
           NTILE(5) OVER (ORDER BY recency_days ASC)  AS r_score,
           NTILE(5) OVER (ORDER BY frequency DESC)    AS f_score,
           NTILE(5) OVER (ORDER BY monetary DESC)     AS m_score
    FROM rfm_raw
),
rfm_labeled AS (
    SELECT *,
        r_score || f_score || m_score AS rfm_cell,
        CASE
            WHEN r_score >= 4 AND f_score >= 4 AND m_score >= 4 THEN 'Champions'
            WHEN r_score >= 3 AND f_score >= 3 AND m_score >= 3 THEN 'Loyal Customers'
            WHEN r_score >= 4 AND f_score <= 2                  THEN 'New Customers'
            WHEN r_score >= 3 AND f_score >= 3 AND m_score <= 2 THEN 'Potential Loyalists'
            WHEN r_score = 5 AND f_score = 1                    THEN 'New Customers'
            WHEN r_score <= 2 AND f_score >= 4 AND m_score >= 4 THEN 'At Risk'
            WHEN r_score = 1 AND f_score >= 4 AND m_score >= 4  THEN 'Cannot Lose Them'
            WHEN r_score <= 2 AND f_score <= 2                  THEN 'Lost'
            WHEN r_score >= 3 AND f_score <= 2 AND m_score >= 3 THEN 'Promising'
            ELSE 'Needs Attention'
        END AS segment
    FROM rfm_scores
)
SELECT
    customer_id,
    recency_days,
    frequency,
    ROUND(monetary, 2) AS monetary,
    r_score,
    f_score,
    m_score,
    rfm_cell,
    segment
FROM rfm_labeled
ORDER BY segment, m_score DESC;

Output:

customer_idrecency_daysfrequencymonetaryr_scoref_scorem_scorerfm_cellsegment
2041221370.00555555Champions
210182730.00554554Champions
201413520.00353353Loyal Customers
206152375.00553553Loyal Customers
208101900.00535535New Customers
203203300.00452452Potential Loyalists
2023191500.00233233Needs Attention
209150160.00231231Lost
205527130.00131131Lost

Step 4 — Segment Summary for Business Actions

WITH rfm_raw AS (
    SELECT customer_id,
           DATE '2024-04-30' - MAX(order_date) AS recency_days,
           COUNT(*) AS frequency, SUM(amount) AS monetary
    FROM orders GROUP BY customer_id
),
rfm_scores AS (
    SELECT customer_id, recency_days, frequency, monetary,
           NTILE(5) OVER (ORDER BY recency_days ASC)  AS r_score,
           NTILE(5) OVER (ORDER BY frequency DESC)    AS f_score,
           NTILE(5) OVER (ORDER BY monetary DESC)     AS m_score
    FROM rfm_raw
),
segments AS (
    SELECT *,
        CASE
            WHEN r_score >= 4 AND f_score >= 4 AND m_score >= 4 THEN 'Champions'
            WHEN r_score >= 3 AND f_score >= 3 AND m_score >= 3 THEN 'Loyal Customers'
            WHEN r_score >= 4 AND f_score <= 2                  THEN 'New Customers'
            WHEN r_score >= 3 AND f_score >= 3 AND m_score <= 2 THEN 'Potential Loyalists'
            WHEN r_score <= 2 AND f_score >= 4 AND m_score >= 4 THEN 'At Risk'
            WHEN r_score = 1 AND f_score >= 4 AND m_score >= 4  THEN 'Cannot Lose Them'
            WHEN r_score <= 2 AND f_score <= 2                  THEN 'Lost'
            ELSE 'Needs Attention'
        END AS segment
    FROM rfm_scores
)
SELECT
    segment,
    COUNT(*)                        AS customer_count,
    ROUND(AVG(recency_days))        AS avg_recency_days,
    ROUND(AVG(frequency), 1)        AS avg_orders,
    ROUND(AVG(monetary), 2)         AS avg_monetary,
    ROUND(SUM(monetary), 2)         AS total_monetary
FROM segments
GROUP BY segment
ORDER BY total_monetary DESC;

Output:

segmentcustomer_countavg_recency_daysavg_ordersavg_monetarytotal_monetary
Champions2152.01050.002100.00
New Customers1101.0900.00900.00
Loyal Customers2282.5447.50895.00
Potential Loyalists1203.0300.00300.00
Needs Attention13191.0500.00500.00
Lost23391.045.0090.00

This summary table feeds directly into your CRM or marketing automation system. Champions get VIP rewards. Lost customers get a win-back campaign. Potential Loyalists get a loyalty program offer.


Step 5 — Load RFM Scores to Gold Layer

-- Refresh the Gold RFM table daily
DELETE FROM gold.customer_rfm WHERE score_date = '2024-04-30';

INSERT INTO gold.customer_rfm (
    score_date, customer_id, recency_days, frequency, monetary,
    r_score, f_score, m_score, rfm_cell, segment
)
WITH rfm_raw AS (
    SELECT customer_id,
           CURRENT_DATE - MAX(order_date) AS recency_days,
           COUNT(*) AS frequency, SUM(amount) AS monetary
    FROM orders GROUP BY customer_id
),
rfm_scores AS (
    SELECT customer_id, recency_days, frequency, monetary,
           NTILE(5) OVER (ORDER BY recency_days ASC)  AS r_score,
           NTILE(5) OVER (ORDER BY frequency DESC)    AS f_score,
           NTILE(5) OVER (ORDER BY monetary DESC)     AS m_score
    FROM rfm_raw
)
SELECT
    CURRENT_DATE,
    customer_id, recency_days, frequency, monetary,
    r_score, f_score, m_score,
    r_score::VARCHAR || f_score::VARCHAR || m_score::VARCHAR,
    CASE
        WHEN r_score >= 4 AND f_score >= 4 AND m_score >= 4 THEN 'Champions'
        WHEN r_score >= 3 AND f_score >= 3 AND m_score >= 3 THEN 'Loyal Customers'
        WHEN r_score >= 4 AND f_score <= 2                  THEN 'New Customers'
        WHEN r_score >= 3 AND f_score >= 3 AND m_score <= 2 THEN 'Potential Loyalists'
        WHEN r_score <= 2 AND f_score >= 4 AND m_score >= 4 THEN 'At Risk'
        WHEN r_score = 1 AND f_score >= 4 AND m_score >= 4  THEN 'Cannot Lose Them'
        WHEN r_score <= 2 AND f_score <= 2                  THEN 'Lost'
        ELSE 'Needs Attention'
    END
FROM rfm_scores;

Common Mistakes

Mistake 1 — Using fixed score thresholds instead of NTILE

Hardcoding WHERE recency_days < 30 as “score 5” breaks when your customer base shifts — during peak seasons all customers may have recency under 30 days. NTILE always distributes scores relative to your current customer population.

Mistake 2 — Running RFM on all-time data without a lookback window

A customer who bought 50 times in 2019 but nothing in 5 years will score extremely high on Frequency. Use a lookback window (last 12–24 months) for Frequency and Monetary to reflect recent behavior.

Mistake 3 — Refreshing RFM scores without keeping history

Segment migrations are valuable — a Champion becoming At Risk is a business alert. Store the score_date and keep all historical scores so you can track how segments change over time.


What to Learn Next

RFM gives you a snapshot of current customer value. The next tutorial builds on this with SQL for A/B test analysis — measuring whether a campaign targeting your At Risk segment actually improved their purchasing behavior.

Leave a Comment