Mechanism diagram · Hybrid / Mixer
MLP MIXER / HYBRID Input series lookback × N coins RevIN Patching (optional) Embed → d-model Time-mixing MLP mix across time tokens Feature-mixing MLP mix across channels Forecast head Forecast

How it works

TSMixer alternates time-mixing MLPs (over time tokens) and feature-mixing MLPs (over channels).

RevIN normalisation handles non-stationary statistics window-by-window.

No patching in the baseline variant; the model operates directly on the lookback length.

Pros and cons on this universe

Pros

  • Pure-MLP — fast training, cheap inference.
  • Strong rank #7 IC on this universe.
  • Reference baseline for the mixer family.

Cons / failure modes

  • Pairwise IC correlation ≈ +0.995 with iTransformer and SoFTS — heavily redundant.
  • Feature mixing can overfit on tail-extreme values without careful regularisation.

References