Regime-routed experts don't beat always-on

Abstract

A mixture-of-experts (MoE) router conditions which sub-models trade on an observed market state. We test the strongest version available to us: route the validated Phase 1.7 per-coin directional specialists through a freshly fit, strictly-causal three-state market-regime detector. Under an honest, leakage-controlled evaluation, the routed book underperforms an always-on book of the same experts by −4.36 Sharpe on the well-powered discovery window and is mildly negative on the walk-forward window. The regime-conditional skill that exists in-sample does not persist out-of-sample, and routing additionally throws away the diversification that makes the 175-expert always-on book strong. This replicates two earlier regime-detector findings on a much stronger expert set and with three balanced regimes — so the failure is not a regime-coverage artefact. We did not build the live router.

1 · The idea, and why it's tempting

By the end of the Phase 1.7 funnel we hold a roster of per-coin specialists: (architecture, coin) pairs whose directional signal is sign-consistent across three out-of-sample regimes and clears its noise floor. The natural hypothesis is that these specialists are conditionally good — a coin's short specialist should fire in a risk-off regime, a long specialist in a risk-on grind. If true, a regime router would lift the book by activating the right experts at the right time and silencing the rest. This is the classic mixture-of-experts gating argument.

The hypothesis is testable, and the honest test is unkind to it.

2 · The detector we built

A Gaussian mixture model over ten causal, market-wide hourly features — BTC realised volatility at 24h and 168h, cross-sectional mean realised vol and return dispersion, average coin-to-market correlation, BTC trend at 24h and 168h, funding mean and dispersion, and open-interest velocity. Every feature is backward-looking. The standardiser and the GMM are fit on the training window only (≤ 2025-12-31, 11,361 bars), then frozen and applied by argmax posterior to all later bars — strictly causal across both out-of-sample windows. The number of states is chosen by a knee rule over BIC (raw min-BIC degenerately fragments into thin near-duplicate mid-vol states); the knee selects k = 3.

Three interpretable regimes · centroids in original units

Fit on train only, frozen, applied causally · 14,528 labelled hours · mean dwell ~34h

Regime	BTC RV 168h	BTC trend 168h	XS corr	OI velocity	Reading
LOWVOL_UP	0.0041 (low)	+3.2%	+0.81	+4.2%	risk-on grind, OI building
MEDVOL_DN	0.0044	−1.1%	+0.87	−1.5%	choppy mid-vol drift-down
HIVOL_DN	0.0075 (high)	−4.8%	+0.88	+1.3%	high-vol stress / sell-off

Both out-of-sample windows contain all three regimes with non-trivial mass (the discovery window is near-balanced at 25 / 45 / 30 percent), so the negative verdict below is not caused by a missing regime.

3 · The decisive numbers

The book is an equal-weight portfolio of the per-coin specialist directional trades: honest rolling-median centring, sign-corrected, non-overlapping 24-hour holds, 4 bps per leg, phase-averaged over 24 entry offsets. We fit the static regime → specialist mapping on a train half and evaluate it on a disjoint validate half. Lift = routed book Sharpe − always-on book Sharpe on the same bars.

Routed vs always-on — annualised net Sharpe

Static regime→specialist map, fit on train half, evaluated on disjoint validate half · 4 bps / leg

On the well-powered discovery window the always-on book earns +4.98 Sharpe and the routed book just +0.63 — a −4.36 deficit. On the walk-forward window the routed book is also slightly worse (−0.76 vs −0.52). The conditional skill found in-sample does not generalise, and routing discards diversification.

4 · The "adaptive" variant is a trap, not a green light

A more flexible router — a causal expanding-window gate that activates an expert once its cumulative signed return in the current regime has been positive — looks spectacular: +6.20 Sharpe on discovery, +5.86 on walk-forward. It is not deployable evidence, for two reasons.

Every plausibility flag fires. Both adaptive books trip EXTREME_SR (Sharpe > 5) and UNUSUALLY_CLEAN (per-bar mean/std > 0.3 — e.g. +47 bps/bar on discovery against +13.6 for always-on). A clean, high mean is the signature of a selection / survivorship artefact: a gate that keeps recently-winning experts manufactures a smooth equity curve regardless of whether any regime structure exists.
The regime label barely contributes. A regime-agnostic control — the same "keep what's winning" gate, ignoring the regime entirely — already captures +4.97 / +4.02 Sharpe. The incremental regime contribution is only +1.23 / +1.84 Sharpe, sitting on top of a flagged, selection-biased base. Most of the apparent adaptive edge is generic momentum plus diversification, not regime alpha.

Why we don't publish the +6.2

Our numerical-plausibility contract flags any Sharpe above 5 and any per-bar mean/std above 0.3 as "verify before you believe." Both adaptive books trip both flags. We treat them as confounded and do not report them as deployable alpha — the same discipline that caught two earlier too-clean headline Sharpes that turned out to be metric bugs.

5 · This is the third time

Two earlier internal experiments reached the same verdict with weaker tooling: a rule-baseline regime MoE and a hidden-Markov regime detector both failed to beat an always-on baseline. The standard objection to those was regime coverage — one of them evaluated on a window that was 89 percent a single regime. This run removes that objection: three balanced, well-populated regimes over both out-of-sample windows, routing the much stronger Phase 1.7 IC specialists. The conclusion holds. Market-wide regime routing of a per-coin specialist book does not generalise.

6 · What would change the conclusion

A negative result is a verdict on a specific construction, not on the whole idea. We would revisit regime conditioning if:

the detector were tied to a per-coin or per-cluster state rather than a single market-wide label — specialists may switch on a coin-specific regime the market index can't see;
we had a longer out-of-sample with more independent regime cycles (the current windows are ~2.5 months each, and thin per-regime samples already flag low-sample warnings);
the object being routed were the cross-sectional ensemble — which is genuinely deployable — rather than the per-coin specialist book.

None of these were pursued here. The gate's only job was to decide whether to build the live router, and the answer is no. The capital stays in the always-on book and the cross-sectional ensemble, both of which we can defend.

Units

Book Sharpe is the annualised Sharpe of non-overlapping 24-hour-hold net per-trade returns on the 24-hour forward log-return series, sign-corrected, 4 bps per leg, from the canonical metrics implementation. Lift is routed minus always-on on identical validate-half bars. Targets were verified to be raw forward log returns, not a rank transform, before any Sharpe was computed.

Sources & references

Jacobs, R. A., Jordan, M. I., Nowlan, S. J., & Hinton, G. E. (1991). Adaptive Mixtures of Local Experts. Neural Computation.
Hamilton, J. D. (1989). A New Approach to the Economic Analysis of Nonstationary Time Series and the Business Cycle. Econometrica. (regime-switching)
López de Prado, M. (2018). Advances in Financial Machine Learning. Wiley. (selection bias under online gating)
Axon Ridge internal — `research/experiments/results/Phase1_7_regime_routing_gate_2026-05-28.md`
Axon Ridge internal — `src/signals/regime_gmm.py`, `scripts/build_gmm_regime_labels.py`