Mechanism diagram · Hybrid / Mixer
MLP MIXER / HYBRID Input series lookback × N coins RevIN Patching (optional) Embed → d-model Time-mixing MLP mix across time tokens Feature-mixing MLP mix across channels Forecast head Forecast

How it works

TimeMixer decomposes the input series across multiple temporal scales (e.g. raw, downsampled by 2, by 4) before mixing them.

Each scale is processed by MLP-mixer-style blocks that alternate time-mixing and feature-mixing — exchanging information first along the time axis, then along the channel axis.

The multi-scale representations are fused before a single linear head emits the forecast horizon. The design replaces full self-attention over the lookback with cheaper mixing operations.

Pros and cons on this universe

Pros

  • Top walk-forward cross-sectional IC (+0.2043) on the latest shared-metric architecture ranking.
  • Deepest TON short signal (per-coin IC −0.310 on the validation sweep) — the most informative single name.
  • Anchor architecture for the inverse-cluster STRAT family.
  • Cheaper than full Transformers at the same lookback length.

Cons / failure modes

  • Pairwise IC correlation with other modern-mixer arms above +0.95 — adds little to an ensemble.
  • No dossier in the research base; mechanism description here comes from Nixtla source.
  • STRAT-level dollar Sharpe (when used inside STRAT-50w) fires OVER_CAPITAL plausibility flags.

References