Mechanism diagram · Transformer / Attention
How it works
Autoformer decomposes the input into trend and seasonal components inside the architecture, not just as preprocessing.
It replaces standard dot-product attention with an Auto-Correlation mechanism that discovers long-range temporal lags using FFT-based circular correlation.
Trend and seasonal streams are processed by separate decoder blocks before merging at the head.
Pros and cons on this universe
Pros
- Rank #2 on our audited screen — surprisingly strong despite being mid-tier in the 918-paper benchmark.
- Native handling of non-stationary trend components.
- Strong on the major-long side (ETH +0.144 per-coin IC).
Cons / failure modes
- Heavy redundancy with the TimeMixer family on this universe.
- Auto-correlation cost is O(L log L) — slower than patched attention.
- Public 918-paper rankings place it middle of the pack on RMSE.