Mechanism diagram · Transformer / Attention
PATCH / ATTENTION TRANSFORMER Input series lookback × N coins RevIN norm + reverse Patching L → P patches Linear proj → d-model Transformer encoder multi-head self-attention + FFN × L Flatten + head → horizon h Forecast — h × N coins

How it works

Autoformer decomposes the input into trend and seasonal components inside the architecture, not just as preprocessing.

It replaces standard dot-product attention with an Auto-Correlation mechanism that discovers long-range temporal lags using FFT-based circular correlation.

Trend and seasonal streams are processed by separate decoder blocks before merging at the head.

Pros and cons on this universe

Pros

  • Rank #2 on our audited screen — surprisingly strong despite being mid-tier in the 918-paper benchmark.
  • Native handling of non-stationary trend components.
  • Strong on the major-long side (ETH +0.144 per-coin IC).

Cons / failure modes

  • Heavy redundancy with the TimeMixer family on this universe.
  • Auto-correlation cost is O(L log L) — slower than patched attention.
  • Public 918-paper rankings place it middle of the pack on RMSE.

References