Choose a customer
Logistic regression
The simplest possible approach. It multiplies each feature by a fixed weight — learned once from training — adds them up, and converts the total to a probability. The same weights apply to every customer, every time.
Random forest
Grows 500 independent trees, each trained on a random sample of customers and a random subset of features. They never talk to each other. When a new customer arrives, all 500 vote simultaneously — majority wins.
How all three compare
Logistic regression
Random forest
Gradient boosting
How it decides
Weighted sum of features → sigmoid
500 independent trees vote
Trees correct each other sequentially
Explainability
Very high — each weight is a plain number
Medium — feature importance but no single path
High — SHAP shows each tree's contribution
Edge cases
Poor — a straight line misses curved patterns
Good — averaging smooths unusual customers
Best — each tree specifically targets hard cases
Overfitting risk
Low — hard to overfit a simple model
Low — 500-tree average reduces variance
Medium — needs careful depth tuning
Typical accuracy
~74%
~78%
~80%
Best when
You need to explain every decision to a non-technical audience
Data is noisy or robustness matters more than precision
You want maximum accuracy with clean, well-chosen features