TimeMixer — Axon Ridge Capital

Mechanism diagram · Hybrid / Mixer

How it works

TimeMixer decomposes the input series across multiple temporal scales (e.g. raw, downsampled by 2, by 4) before mixing them.

Each scale is processed by MLP-mixer-style blocks that alternate time-mixing and feature-mixing — exchanging information first along the time axis, then along the channel axis.

The multi-scale representations are fused before a single linear head emits the forecast horizon. The design replaces full self-attention over the lookback with cheaper mixing operations.

Pros and cons on this universe

Pros

Top walk-forward cross-sectional IC (+0.2043) on the latest shared-metric architecture ranking.
Deepest TON short signal (per-coin IC −0.310 on the validation sweep) — the most informative single name.
Anchor architecture for the inverse-cluster STRAT family.
Cheaper than full Transformers at the same lookback length.

Cons / failure modes

Pairwise IC correlation with other modern-mixer arms above +0.95 — adds little to an ensemble.
No dossier in the research base; mechanism description here comes from Nixtla source.
STRAT-level dollar Sharpe (when used inside STRAT-50w) fires OVER_CAPITAL plausibility flags.

How it works

Pros and cons on this universe

Pros

Cons / failure modes

References