Abstract
Most published evaluations of cross-sectional ML on cryptocurrency markets report gross returns — the per-bar PnL of the unconstrained predicted positions, without subtracting the cost of moving the portfolio from one bar to the next. On a typical h = 24 hourly panel with modest portfolio concentration, the gap between gross and net at realistic per-leg costs is not a minor correction. It frequently flips a +2 Sharpe candidate to deeply negative. We document the cost sensitivity of one such candidate from our own work, lay out the cost model that produces it, and argue that this decomposition belongs in every backtest reported as live-deployable.
1 · The cost model
We use a deliberately simple, conservative cost model: at each bar, the per-coin transaction cost is the absolute change in portfolio weight, scaled by a constant per-leg cost. Stacked across the universe, the total cost charged against the next bar's return is:
cost_t = sum over i in universe of |w_{i,t} − w_{i,t−1}| × leg_cost
where leg_cost = 4 bps = 0.0004 (decimal)
The 4 bps per-leg figure is the program's operational estimate of the all-in
cost of crossing a Hyperliquid book at our intended size, including taker
fees and average slippage at typical depth. It is constant across coins for
simplicity; in production the cost is somewhat tighter on majors (BTC, ETH)
and somewhat wider on smaller perpetuals. The constant 4 bps is the version
that lives in src/strategy/metrics.py and produces
every published Sharpe number in the research base.
2 · The case study
The candidate we audit here is a LightGBM lambdarank model on the 26-coin panel at h = 2 (two-hour forecast horizon), with cross-sectional K = 5 long / 5 short selection per bar. It was once the top of our scoreboard under the now-retracted evaluation pipeline (see Paper № 03); after correction to raw returns, it produces +1.99 Sharpe at zero cost — a real and reproducible gross signal — and the chart below shows what happens when realistic costs are applied at different portfolio concentrations.
The shaded region indicates the operational cost band (3.5 – 4.5 bps per leg). At zero cost every concentration is positive. At our central 4 bps estimate, every concentration is solidly net negative. The slope is steeper for higher concentration because the absolute weight delta per bar is larger.
3 · The decomposition: turnover is the lever
Two strategies with the same gross Sharpe at zero cost will diverge sharply once cost is applied if their turnover differs. The chart below decomposes three candidates we have run head-to-head: the K = 5 / 5 LightGBM ranker from above, the seven-architecture predictive blend (STRAT-08), and the static eight-coin consensus basket (STRAT-48).
The static consensus basket has by far the lowest turnover and almost no gross-to-net gap. The predictive blend rebalances every week and has a meaningful cost drag. The K = 5 / 5 ranker rebalances every two hours and is the most cost-exposed of the three.
4 · Why this is the most common failure mode in crypto-ML papers
Three patterns we see repeatedly, both in our own early scoreboards and in published work:
- Cost model not stated. The paper reports Sharpe but the “Methods” section does not specify a per-leg cost or a turnover treatment. The Sharpe is therefore implicitly gross.
- Cost model stated but artificially low. Per-leg costs in the 1–2 bps range exclude meaningful slippage. The result is a Sharpe that survives the paper but does not survive live.
- Concentration not disclosed. “Top-decile vs bottom-decile” on a 26-coin universe is 2 or 3 names. The portfolio weight delta per bar is large; the cost is correspondingly large. The reported Sharpe rarely reflects this.
5 · The cost gate
Our promotion gate (the “Mistake #3” gate, per the internal mistakes catalogue[1]) is:
- Reported Sharpe is always net of 4 bps per leg.
- Net Sharpe at 4 bps must exceed +1.0 in Pass B.
- Cost sensitivity is reported: net Sharpe at 6 bps and 8 bps must also be reported and must remain positive at 6 bps (the “stressed cost” threshold).
The K = 5 / 5 LightGBM ranker fails this gate — it is positive only at 0 bps and 2 bps. STRAT-48 and STRAT-04b both pass through 4 bps and stay positive at 8 bps because their turnover is low.
6 · A note on cost models
The cost model in src/strategy/metrics.py uses 4 bps
in the LIVE configuration. Some older model cards in
research/scoreboards/consolidated_arms_table.md
use 8 bps — which is a double-counting artefact from a previous internal
convention and overstates costs by 2×. Current canonical: 4 bps live,
full stop. The 4 bps figure is our central estimate; the 6 bps and
8 bps numbers exist only as stress tests.
We do not consider a backtest credible until it has been costed at 4 bps per leg using the standard cost model and is still positive. Gross Sharpe numbers are useful diagnostically — they tell you whether the underlying signal has any predictive content. They are not deployable.
7 · What this is not
The 4 bps figure is a constant approximation. Real costs are state-dependent: they widen in stress, they narrow on quiet days, and they vary by coin. A full execution-cost model would be coin-by-coin and time-of-day. We treat 4 bps as our central operating estimate — sufficient for ranking candidates and rejecting clearly cost-fragile strategies — and revisit it as the live execution stack accumulates a sample of actual fill prices vs mid.
Sources & references
- Axon Ridge internal — `research/scoreboards/06_top20_mistakes.md` #3 (no transaction costs)
- Axon Ridge internal — `research/experiments/results/STRATEGY_COMPARISON_RAW_2026-05-05.md`
- Axon Ridge internal — `docs/CALCULATION_GLOSSARY.md` §3 (cost conventions)
- Axon Ridge internal — `research/backtesting/cpcv_dsr_practical.md`
- Bailey, D. H., & López de Prado, M. (2014). The Deflated Sharpe Ratio.