A Funnel Through 28 Architectures

Abstract

Most published evaluations of time-series forecasting architectures use RMSE on a handful of common benchmarks. For a directional cross-sectional trader on a 24-hour-ahead horizon, what we care about is rank correlation between predicted and realised forward returns — not squared error. We ran 32 candidate arms (28 neural networks, 2 rule-based deciles, plus failed runs) through a single canonical screening protocol on a 26-coin Hyperliquid perpetual universe and found that the family ranking changes substantially. The strongest arm on our universe — TimeMixer — is not the strongest on the long-horizon literature leaderboards, and the literature's top picks (ModernTCN, PatchTST) rank in the middle or bottom of our roster.

1 · The funnel

The full pipeline is three gates wide:

Universe IC screen. Each arm is trained on the 26-coin panel and produces predictions for every coin at every bar in the test window. We compute Spearman rank correlation between predictions and raw forward returns, per coin and per architecture, under two splits (described below). One canonical script — scripts/funnel/audit_universe_ic.py — produces every IC number we publish.
Strategy simulation. Surviving arms enter STRAT-level sims that translate predictions into positions, apply cost (4 bps per leg), and report annualised dollar Sharpe over 30 seeds.
Paper validation. Strategy survivors run on live data in simulated execution before any live capital is risked.

This paper is about gate 1. The other two are covered in Paper № 04 (dual-pass OOS) and the strategy-specific writeups.

2 · The metric: HiConv-6-inverse

Reporting 26 separate per-coin IC values per architecture is not useful for ranking. We aggregate using a deliberately stress-tested statistic: the HiConv-6-inverse IC is the mean of negated Spearman IC on a fixed six-coin cluster — TON, JUP, POPCAT, LINK, kPEPE, TIA — that the early arch-funnel work identified as the cleanest short-side cluster on this universe.

An arm with a positive HiConv-6-inverse is one whose predictions, when used as a short signal on these six names, would have shown positive directional rank correlation against realised returns over the Pass-B test window. It is not a Sharpe; it is a screening metric. We use it consistently because it ranks architectures by their cross-sectional discrimination on the historically harder side of the universe.

3 · Results

All 28 arms ranked by Pass-B HiConv-6-inverse IC

Audited 2026-05-15 · positive = stronger cross-sectional short signal on the 6-coin cluster

The top cluster (TimeMixer through TFT, ranks 1-15) is dominated by hybrid mixers and decomposition Transformers. The bottom of the table contains ModernTCN and the linear floor NLinear-RevIN. xs_funding_z and xs_oi_velocity are non-neural rule deciles kept on the screen as orthogonal baselines.

4 · Literature transfer is uneven

The long-horizon time-series literature ranks architectures on RMSE. The 918-experiment study^[1] places ModernTCN at the very top of the crypto leaderboard. Our HiConv-6-inverse IC places it at rank 24. The chart below shows the disagreement for five canonical “literature top-tier” arms:

Literature rank vs in-program empirical rank

Lower bar = stronger rank · 918-study RMSE rank (left) vs Hyperliquid HiConv-6-inverse IC rank (right)

PatchTST and TimesNet transfer reasonably (slightly worse on our universe than in the literature). ModernTCN and DLinear do not transfer at all by this metric. Autoformer transfers up — it is mid-tier in the literature and a top arm on our screen.

5 · The redundancy problem

Many arms produce almost identical IC vectors. A pairwise correlation audit of the IC-per-coin profile across the top-15 arms found pairs with correlation above +0.95 — meaning that despite having different mechanisms, they predict the same coins in the same direction with the same intensity. The most extreme case is BITCN and StemGNN with a pairwise correlation of +0.999.

Distinctness scores: how independent is each arm's signal?

1.0 = fully orthogonal · 0.0 = identical to ensemble mean · per IC pattern analysis 2026-05-15

The rule-based xs_oi_velocity and xs_funding_z are the most independent. lgbm_rank is the most independent neural-adjacent arm and is our default diversifier. The TimeMixer / TSMixer / iTransformer / TSMixerx cluster is highly redundant — stacking three of them buys nothing.

Practical consequence

“Seven architectures in an ensemble” sounds diverse. On this universe, it is one signal repeated seven times. The diversification properties of the roster need to be measured on the predictions, not on the architecture taxonomy.

6 · What this is and isn't

The funnel ranks architectures by their cross-sectional discrimination on a specific universe, on a specific horizon, under a specific cost-aware OOS protocol. It is not a universal claim that TimeMixer is “better than” ModernTCN — on a different benchmark with different metrics that ranking can flip, as the 918 paper demonstrates. The funnel is the input to our strategy work; the validation of any actual edge happens downstream in the dual-pass OOS work covered separately.

The other constraint worth being explicit about: IC ≠ Sharpe. A high HiConv-6-inverse IC is a necessary condition for an architecture to be worth simulating as a strategy. It is not a sufficient condition for that strategy to be profitable at full cost. The follow-up paper on cost-aware backtesting covers cases where a +0.15 IC becomes a deeply negative net Sharpe^[2].

7 · Reproducibility

Every number on this page can be regenerated from the repository with a single command:

$ python3 -m scripts.funnel.audit_universe_ic
$ # writes research/experiments/results/audit_universe_ic_<date>.md

The 2026-05-15 audit caught seven cells in the pre-audit scoreboard that were hand-typed and wrong by more than 0.03 in IC. Those errors were corrected and the audit is now part of the standing automation.

Sources & references

Saidd, M. et al. (2026). A 918-experiment empirical study of long-horizon forecasters. arXiv:2603.16886
Axon Ridge — Cost-Aware Backtesting (Paper № 06). /research/cost-aware-backtesting.html
Axon Ridge internal — `research/experiments/results/audit_universe_ic_2026-05-15.md`
Axon Ridge internal — `research/scoreboards/01_top20_architectures.md`
Axon Ridge internal — `research/experiments/results/IC_pattern_analysis_2026-05-15.md`