RFM segmentation is the most practical customer segmentation model for e-commerce and subscription businesses. It classifies every customer on three dimensions: how recently they purchased (Recency), how often they purchase (Frequency), and how much they spend (Monetary). The combination of these three scores places every customer in a segment that tells you exactly how to treat them.
This tutorial builds the complete RFM pipeline in SQL from raw orders to labeled segments.
The Dataset
CREATE TABLE orders (
order_id INT,
customer_id INT,
amount DECIMAL(10,2),
order_date DATE
);
INSERT INTO orders VALUES
(1001, 201, 250.00, '2024-01-05'),
(1002, 201, 180.00, '2024-02-10'),
(1003, 201, 90.00, '2024-03-20'),
(1004, 202, 500.00, '2023-06-15'),
(1005, 203, 120.00, '2024-03-01'),
(1006, 203, 85.00, '2024-03-25'),
(1007, 203, 95.00, '2024-04-10'),
(1008, 204, 750.00, '2024-04-05'),
(1009, 204, 620.00, '2024-04-18'),
(1010, 205, 30.00, '2022-11-20'),
(1011, 206, 200.00, '2024-04-01'),
(1012, 206, 175.00, '2024-04-15'),
(1013, 207, 45.00, '2024-01-30'),
(1014, 208, 900.00, '2024-04-20'),
(1015, 209, 60.00, '2023-12-01'),
(1016, 210, 310.00, '2024-02-28'),
(1017, 210, 420.00, '2024-04-12');
Analysis date: 2024-04-30 (the “today” for our recency calculation).
Step 1 — Calculate Raw RFM Values Per Customer
WITH rfm_raw AS (
SELECT
customer_id,
MAX(order_date) AS last_order_date,
-- Recency: days since last order (lower = more recent = better)
DATE '2024-04-30' - MAX(order_date) AS recency_days,
COUNT(*) AS frequency,
SUM(amount) AS monetary
FROM orders
GROUP BY customer_id
)
SELECT * FROM rfm_raw ORDER BY recency_days;
Output:
| customer_id | last_order_date | recency_days | frequency | monetary |
|---|---|---|---|---|
| 208 | 2024-04-20 | 10 | 1 | 900.00 |
| 204 | 2024-04-18 | 12 | 2 | 1370.00 |
| 206 | 2024-04-15 | 15 | 2 | 375.00 |
| 210 | 2024-04-12 | 18 | 2 | 730.00 |
| 207 | 2024-04-10 | 20 | 1 | 45.00 |
| 203 | 2024-04-10 | 20 | 3 | 300.00 |
| 201 | 2024-03-20 | 41 | 3 | 520.00 |
| 205 | 2022-11-20 | 527 | 1 | 30.00 |
| … | … | … | … | … |
Step 2 — Score Each Dimension 1–5 Using NTILE
We divide customers into 5 equal-sized buckets per dimension. Score 5 = best, Score 1 = worst.
WITH rfm_raw AS (
SELECT
customer_id,
DATE '2024-04-30' - MAX(order_date) AS recency_days,
COUNT(*) AS frequency,
SUM(amount) AS monetary
FROM orders
GROUP BY customer_id
),
rfm_scores AS (
SELECT
customer_id,
recency_days,
frequency,
monetary,
-- Recency: lower days = better, so reverse the order (ASC = 5)
NTILE(5) OVER (ORDER BY recency_days ASC) AS r_score,
-- Frequency: higher = better
NTILE(5) OVER (ORDER BY frequency DESC) AS f_score,
-- Monetary: higher = better
NTILE(5) OVER (ORDER BY monetary DESC) AS m_score
FROM rfm_raw
)
SELECT * FROM rfm_scores ORDER BY r_score DESC, f_score DESC;
Output:
| customer_id | recency_days | frequency | monetary | r_score | f_score | m_score |
|---|---|---|---|---|---|---|
| 208 | 10 | 1 | 900.00 | 5 | 3 | 5 |
| 204 | 12 | 2 | 1370.00 | 5 | 5 | 5 |
| 206 | 15 | 2 | 375.00 | 5 | 5 | 3 |
| 210 | 18 | 2 | 730.00 | 5 | 5 | 4 |
| 207 | 20 | 1 | 45.00 | 4 | 3 | 1 |
| 203 | 20 | 3 | 300.00 | 4 | 5 | 2 |
| 201 | 41 | 3 | 520.00 | 3 | 5 | 3 |
| 209 | 150 | 1 | 60.00 | 2 | 3 | 1 |
| 202 | 319 | 1 | 500.00 | 2 | 3 | 3 |
| 205 | 527 | 1 | 30.00 | 1 | 3 | 1 |
Step 3 — Assign Segment Labels
With R, F, M scores in hand, use business rules to label each customer into a meaningful segment.
WITH rfm_raw AS (
SELECT customer_id,
DATE '2024-04-30' - MAX(order_date) AS recency_days,
COUNT(*) AS frequency, SUM(amount) AS monetary
FROM orders GROUP BY customer_id
),
rfm_scores AS (
SELECT customer_id, recency_days, frequency, monetary,
NTILE(5) OVER (ORDER BY recency_days ASC) AS r_score,
NTILE(5) OVER (ORDER BY frequency DESC) AS f_score,
NTILE(5) OVER (ORDER BY monetary DESC) AS m_score
FROM rfm_raw
),
rfm_labeled AS (
SELECT *,
r_score || f_score || m_score AS rfm_cell,
CASE
WHEN r_score >= 4 AND f_score >= 4 AND m_score >= 4 THEN 'Champions'
WHEN r_score >= 3 AND f_score >= 3 AND m_score >= 3 THEN 'Loyal Customers'
WHEN r_score >= 4 AND f_score <= 2 THEN 'New Customers'
WHEN r_score >= 3 AND f_score >= 3 AND m_score <= 2 THEN 'Potential Loyalists'
WHEN r_score = 5 AND f_score = 1 THEN 'New Customers'
WHEN r_score <= 2 AND f_score >= 4 AND m_score >= 4 THEN 'At Risk'
WHEN r_score = 1 AND f_score >= 4 AND m_score >= 4 THEN 'Cannot Lose Them'
WHEN r_score <= 2 AND f_score <= 2 THEN 'Lost'
WHEN r_score >= 3 AND f_score <= 2 AND m_score >= 3 THEN 'Promising'
ELSE 'Needs Attention'
END AS segment
FROM rfm_scores
)
SELECT
customer_id,
recency_days,
frequency,
ROUND(monetary, 2) AS monetary,
r_score,
f_score,
m_score,
rfm_cell,
segment
FROM rfm_labeled
ORDER BY segment, m_score DESC;
Output:
| customer_id | recency_days | frequency | monetary | r_score | f_score | m_score | rfm_cell | segment |
|---|---|---|---|---|---|---|---|---|
| 204 | 12 | 2 | 1370.00 | 5 | 5 | 5 | 555 | Champions |
| 210 | 18 | 2 | 730.00 | 5 | 5 | 4 | 554 | Champions |
| 201 | 41 | 3 | 520.00 | 3 | 5 | 3 | 353 | Loyal Customers |
| 206 | 15 | 2 | 375.00 | 5 | 5 | 3 | 553 | Loyal Customers |
| 208 | 10 | 1 | 900.00 | 5 | 3 | 5 | 535 | New Customers |
| 203 | 20 | 3 | 300.00 | 4 | 5 | 2 | 452 | Potential Loyalists |
| 202 | 319 | 1 | 500.00 | 2 | 3 | 3 | 233 | Needs Attention |
| 209 | 150 | 1 | 60.00 | 2 | 3 | 1 | 231 | Lost |
| 205 | 527 | 1 | 30.00 | 1 | 3 | 1 | 131 | Lost |
Step 4 — Segment Summary for Business Actions
WITH rfm_raw AS (
SELECT customer_id,
DATE '2024-04-30' - MAX(order_date) AS recency_days,
COUNT(*) AS frequency, SUM(amount) AS monetary
FROM orders GROUP BY customer_id
),
rfm_scores AS (
SELECT customer_id, recency_days, frequency, monetary,
NTILE(5) OVER (ORDER BY recency_days ASC) AS r_score,
NTILE(5) OVER (ORDER BY frequency DESC) AS f_score,
NTILE(5) OVER (ORDER BY monetary DESC) AS m_score
FROM rfm_raw
),
segments AS (
SELECT *,
CASE
WHEN r_score >= 4 AND f_score >= 4 AND m_score >= 4 THEN 'Champions'
WHEN r_score >= 3 AND f_score >= 3 AND m_score >= 3 THEN 'Loyal Customers'
WHEN r_score >= 4 AND f_score <= 2 THEN 'New Customers'
WHEN r_score >= 3 AND f_score >= 3 AND m_score <= 2 THEN 'Potential Loyalists'
WHEN r_score <= 2 AND f_score >= 4 AND m_score >= 4 THEN 'At Risk'
WHEN r_score = 1 AND f_score >= 4 AND m_score >= 4 THEN 'Cannot Lose Them'
WHEN r_score <= 2 AND f_score <= 2 THEN 'Lost'
ELSE 'Needs Attention'
END AS segment
FROM rfm_scores
)
SELECT
segment,
COUNT(*) AS customer_count,
ROUND(AVG(recency_days)) AS avg_recency_days,
ROUND(AVG(frequency), 1) AS avg_orders,
ROUND(AVG(monetary), 2) AS avg_monetary,
ROUND(SUM(monetary), 2) AS total_monetary
FROM segments
GROUP BY segment
ORDER BY total_monetary DESC;
Output:
| segment | customer_count | avg_recency_days | avg_orders | avg_monetary | total_monetary |
|---|---|---|---|---|---|
| Champions | 2 | 15 | 2.0 | 1050.00 | 2100.00 |
| New Customers | 1 | 10 | 1.0 | 900.00 | 900.00 |
| Loyal Customers | 2 | 28 | 2.5 | 447.50 | 895.00 |
| Potential Loyalists | 1 | 20 | 3.0 | 300.00 | 300.00 |
| Needs Attention | 1 | 319 | 1.0 | 500.00 | 500.00 |
| Lost | 2 | 339 | 1.0 | 45.00 | 90.00 |
This summary table feeds directly into your CRM or marketing automation system. Champions get VIP rewards. Lost customers get a win-back campaign. Potential Loyalists get a loyalty program offer.
Step 5 — Load RFM Scores to Gold Layer
-- Refresh the Gold RFM table daily
DELETE FROM gold.customer_rfm WHERE score_date = '2024-04-30';
INSERT INTO gold.customer_rfm (
score_date, customer_id, recency_days, frequency, monetary,
r_score, f_score, m_score, rfm_cell, segment
)
WITH rfm_raw AS (
SELECT customer_id,
CURRENT_DATE - MAX(order_date) AS recency_days,
COUNT(*) AS frequency, SUM(amount) AS monetary
FROM orders GROUP BY customer_id
),
rfm_scores AS (
SELECT customer_id, recency_days, frequency, monetary,
NTILE(5) OVER (ORDER BY recency_days ASC) AS r_score,
NTILE(5) OVER (ORDER BY frequency DESC) AS f_score,
NTILE(5) OVER (ORDER BY monetary DESC) AS m_score
FROM rfm_raw
)
SELECT
CURRENT_DATE,
customer_id, recency_days, frequency, monetary,
r_score, f_score, m_score,
r_score::VARCHAR || f_score::VARCHAR || m_score::VARCHAR,
CASE
WHEN r_score >= 4 AND f_score >= 4 AND m_score >= 4 THEN 'Champions'
WHEN r_score >= 3 AND f_score >= 3 AND m_score >= 3 THEN 'Loyal Customers'
WHEN r_score >= 4 AND f_score <= 2 THEN 'New Customers'
WHEN r_score >= 3 AND f_score >= 3 AND m_score <= 2 THEN 'Potential Loyalists'
WHEN r_score <= 2 AND f_score >= 4 AND m_score >= 4 THEN 'At Risk'
WHEN r_score = 1 AND f_score >= 4 AND m_score >= 4 THEN 'Cannot Lose Them'
WHEN r_score <= 2 AND f_score <= 2 THEN 'Lost'
ELSE 'Needs Attention'
END
FROM rfm_scores;
Common Mistakes
Mistake 1 — Using fixed score thresholds instead of NTILE
Hardcoding WHERE recency_days < 30 as “score 5” breaks when your customer base shifts — during peak seasons all customers may have recency under 30 days. NTILE always distributes scores relative to your current customer population.
Mistake 2 — Running RFM on all-time data without a lookback window
A customer who bought 50 times in 2019 but nothing in 5 years will score extremely high on Frequency. Use a lookback window (last 12–24 months) for Frequency and Monetary to reflect recent behavior.
Mistake 3 — Refreshing RFM scores without keeping history
Segment migrations are valuable — a Champion becoming At Risk is a business alert. Store the score_date and keep all historical scores so you can track how segments change over time.
What to Learn Next
RFM gives you a snapshot of current customer value. The next tutorial builds on this with SQL for A/B test analysis — measuring whether a campaign targeting your At Risk segment actually improved their purchasing behavior.