Mechanism diagram · Hybrid / Mixer
How it works
TSMixer alternates time-mixing MLPs (over time tokens) and feature-mixing MLPs (over channels).
RevIN normalisation handles non-stationary statistics window-by-window.
No patching in the baseline variant; the model operates directly on the lookback length.
Pros and cons on this universe
Pros
- Pure-MLP — fast training, cheap inference.
- Strong rank #7 IC on this universe.
- Reference baseline for the mixer family.
Cons / failure modes
- Pairwise IC correlation ≈ +0.995 with iTransformer and SoFTS — heavily redundant.
- Feature mixing can overfit on tail-extreme values without careful regularisation.