Mechanism diagram · Hybrid / Mixer
How it works
TimeMixer decomposes the input series across multiple temporal scales (e.g. raw, downsampled by 2, by 4) before mixing them.
Each scale is processed by MLP-mixer-style blocks that alternate time-mixing and feature-mixing — exchanging information first along the time axis, then along the channel axis.
The multi-scale representations are fused before a single linear head emits the forecast horizon. The design replaces full self-attention over the lookback with cheaper mixing operations.
Pros and cons on this universe
Pros
- Top walk-forward cross-sectional IC (+0.2043) on the latest shared-metric architecture ranking.
- Deepest TON short signal (per-coin IC −0.310 on the validation sweep) — the most informative single name.
- Anchor architecture for the inverse-cluster STRAT family.
- Cheaper than full Transformers at the same lookback length.
Cons / failure modes
- Pairwise IC correlation with other modern-mixer arms above +0.95 — adds little to an ensemble.
- No dossier in the research base; mechanism description here comes from Nixtla source.
- STRAT-level dollar Sharpe (when used inside STRAT-50w) fires OVER_CAPITAL plausibility flags.