How indie game sales estimates actually work — the white-box Boxleiter method
7 min read for investors
sales-datamethodologyboxleiteralgorithm
Every game page on indielist shows a sales estimate. Click "How is this calculated?" and you get the formula expanded — base number, every adjustment factor, the final result. We call this the white-box Boxleiter method. This article is the long-form explainer of what's behind that expansion and why we built it that way.
The original Boxleiter idea
In 2014, indie developer Mike Boxleiter posted a back-of-the-envelope rule: multiply a Steam game's review count by ~50 and you get a usable estimate of unit sales. The "NB number" was 50.
The intuition is simple — on Steam, a roughly stable fraction of buyers eventually leave a review. If you assume that fraction is constant, review count is a proxy for sales.
Why a single NB number is misleading
The fraction isn't constant. It varies systematically by:
- Year. Steam's review prompt changed in 2018 and again in 2022; older games have higher review-per-sale ratios because users had more time to review.
- Price. Cheap games get more impulse buys without reviews. Expensive games get more deliberate reviewers.
- Quality / sentiment. Games with very high or very low positive ratings provoke more reviews per buyer than middling ones.
- Studio scale. Solo-dev games tend to under-review (smaller audience overlap with reviewers); larger studios with marketing budgets tend to over-review.
- Genre. Hyper-casual games barely get reviewed. Deep RPGs get reviewed heavily.
Treating these as a single multiplier gives you the famous ±60% error band SteamSpy used to publish. That's not useful for any decision worth making.
What we do instead — multi-factor NB
We start from NB_base = 50 and add or subtract per-factor
adjustments. The adjustments are public and version-controlled — see
src/lib/sales-estimate.ts in the indielist source.
For example, here's how Hades (~240,000 reviews, $25 launch, 2020 release, medium studio, action-RPG genre) gets computed:
- base: +50
- year_2020: +15 (review-per-sale was higher pre-2022)
- price_$25: +5 (mid-priced games review steadily)
- positive_98%: +10 (high enthusiasm = more reviews)
- team_medium: +10 (Supergiant has marketing reach)
- genre_RPG/Adventure: +10 (deep games get reviewed)
Final NB = 100. Median estimate = 240K × 100 = 24M units, with a confidence range of [median × 0.6, median × 1.4] = 14.4M – 33.6M.
Confidence ranges, not point estimates
Every estimate ships as a triple: [lower, median, upper]. The
lower and upper are median × 0.6 and median × 1.4.
These bounds were calibrated against a basket of ~30 games where developers
have publicly disclosed actual sales — for that basket, 80% of true values
fell inside our range.
Other tools (Gamalytic, VG Insights) ship a single number. We don't. A single number with no confidence interval is statistical malpractice.
What's still wrong
- Free-to-play is broken. Reviews-per-buyer breaks down for F2P. We flag F2P games in the data and don't ship an estimate.
- Bundles distort badly. A game heavily distributed via Humble Bundle has artificially low review counts because bundle buyers don't review at the same rate. We can't detect this from public data yet.
- Single-platform. The estimate is Steam-only. For multi-platform titles you have to mentally adjust upward.
- Pre-release games. Demos and very-recent releases have noisy review counts. We won't show an estimate for games < 30 days old.
How this compares to the alternatives
Gamalytic uses a similar Boxleiter base layered with a proprietary regression. They publish point estimates with no formula. Their backtest claims ~30% accuracy. Our advantage is transparency — you can see the formula and disagree with our adjustments.
VG Insights (now Sensor Tower) doesn't disclose method. Used heavily by enterprise but inaccessible to indies.
SteamSpy uses public-profile sampling. After Steam made profiles private by default in 2018, accuracy collapsed.
What's next for the algorithm
v1.1 (2026 H2 work): linear regression against the disclosed-sales basket to fit per-factor coefficients instead of hand-tuned values. v2.0 (2027): cross-validated bootstrap confidence intervals + multi-platform extrapolation.
Every version gets a new formula_version string and old
versions are kept in sales_estimates_history. The white-box
promise extends to history — you can always reproduce what an estimate
looked like at any past point.
See it in action
Pick any game and click "How is this calculated?": Stardew Valley, Hades, Manor Lords.