Abstract

Most published evaluations of cross-sectional ML on cryptocurrency markets report gross returns — the per-bar PnL of the unconstrained predicted positions, without subtracting the cost of moving the portfolio from one bar to the next. On a typical h = 24 hourly panel with modest portfolio concentration, the gap between gross and net at realistic per-leg costs is not a minor correction. It frequently flips a +2 Sharpe candidate to deeply negative. We document the cost sensitivity of one such candidate from our own work, lay out the cost model that produces it, and argue that this decomposition belongs in every backtest reported as live-deployable.

+1.99
Gross Sharpe @ 0 bps
−3.53
Net Sharpe @ 4 bps
0 / 27
Seeds positive after cost

1 · The cost model

We use a deliberately simple, conservative cost model: at each bar, the per-coin transaction cost is the absolute change in portfolio weight, scaled by a constant per-leg cost. Stacked across the universe, the total cost charged against the next bar's return is:

cost_t = sum over i in universe of  |w_{i,t} − w_{i,t−1}|  × leg_cost
where leg_cost = 4 bps = 0.0004 (decimal)

The 4 bps per-leg figure is the program's operational estimate of the all-in cost of crossing a Hyperliquid book at our intended size, including taker fees and average slippage at typical depth. It is constant across coins for simplicity; in production the cost is somewhat tighter on majors (BTC, ETH) and somewhat wider on smaller perpetuals. The constant 4 bps is the version that lives in src/strategy/metrics.py and produces every published Sharpe number in the research base.

2 · The case study

The candidate we audit here is a LightGBM lambdarank model on the 26-coin panel at h = 2 (two-hour forecast horizon), with cross-sectional K = 5 long / 5 short selection per bar. It was once the top of our scoreboard under the now-retracted evaluation pipeline (see Paper № 03); after correction to raw returns, it produces +1.99 Sharpe at zero cost — a real and reproducible gross signal — and the chart below shows what happens when realistic costs are applied at different portfolio concentrations.

Net Sharpe as a function of per-leg cost
Same model · varied portfolio concentration K · raw returns · 27 seeds median

The shaded region indicates the operational cost band (3.5 – 4.5 bps per leg). At zero cost every concentration is positive. At our central 4 bps estimate, every concentration is solidly net negative. The slope is steeper for higher concentration because the absolute weight delta per bar is larger.

3 · The decomposition: turnover is the lever

Two strategies with the same gross Sharpe at zero cost will diverge sharply once cost is applied if their turnover differs. The chart below decomposes three candidates we have run head-to-head: the K = 5 / 5 LightGBM ranker from above, the seven-architecture predictive blend (STRAT-08), and the static eight-coin consensus basket (STRAT-48).

Gross vs net by turnover
Mean turnover per bar (sum |Δw|) on the x-axis · Pass B

The static consensus basket has by far the lowest turnover and almost no gross-to-net gap. The predictive blend rebalances every week and has a meaningful cost drag. The K = 5 / 5 ranker rebalances every two hours and is the most cost-exposed of the three.

4 · Why this is the most common failure mode in crypto-ML papers

Three patterns we see repeatedly, both in our own early scoreboards and in published work:

5 · The cost gate

Our promotion gate (the “Mistake #3” gate, per the internal mistakes catalogue[1]) is:

  1. Reported Sharpe is always net of 4 bps per leg.
  2. Net Sharpe at 4 bps must exceed +1.0 in Pass B.
  3. Cost sensitivity is reported: net Sharpe at 6 bps and 8 bps must also be reported and must remain positive at 6 bps (the “stressed cost” threshold).

The K = 5 / 5 LightGBM ranker fails this gate — it is positive only at 0 bps and 2 bps. STRAT-48 and STRAT-04b both pass through 4 bps and stay positive at 8 bps because their turnover is low.

6 · A note on cost models

The cost model in src/strategy/metrics.py uses 4 bps in the LIVE configuration. Some older model cards in research/scoreboards/consolidated_arms_table.md use 8 bps — which is a double-counting artefact from a previous internal convention and overstates costs by 2×. Current canonical: 4 bps live, full stop. The 4 bps figure is our central estimate; the 6 bps and 8 bps numbers exist only as stress tests.

A working position

We do not consider a backtest credible until it has been costed at 4 bps per leg using the standard cost model and is still positive. Gross Sharpe numbers are useful diagnostically — they tell you whether the underlying signal has any predictive content. They are not deployable.

7 · What this is not

The 4 bps figure is a constant approximation. Real costs are state-dependent: they widen in stress, they narrow on quiet days, and they vary by coin. A full execution-cost model would be coin-by-coin and time-of-day. We treat 4 bps as our central operating estimate — sufficient for ranking candidates and rejecting clearly cost-fragile strategies — and revisit it as the live execution stack accumulates a sample of actual fill prices vs mid.

Sources & references

  1. Axon Ridge internal — `research/scoreboards/06_top20_mistakes.md` #3 (no transaction costs)
  2. Axon Ridge internal — `research/experiments/results/STRATEGY_COMPARISON_RAW_2026-05-05.md`
  3. Axon Ridge internal — `docs/CALCULATION_GLOSSARY.md` §3 (cost conventions)
  4. Axon Ridge internal — `research/backtesting/cpcv_dsr_practical.md`
  5. Bailey, D. H., & López de Prado, M. (2014). The Deflated Sharpe Ratio.