Autoformer — Axon Ridge Capital

Mechanism diagram · Transformer / Attention

How it works

Autoformer decomposes the input into trend and seasonal components inside the architecture, not just as preprocessing.

It replaces standard dot-product attention with an Auto-Correlation mechanism that discovers long-range temporal lags using FFT-based circular correlation.

Trend and seasonal streams are processed by separate decoder blocks before merging at the head.

Pros and cons on this universe

Pros

Rank #2 on our audited screen — surprisingly strong despite being mid-tier in the 918-paper benchmark.
Native handling of non-stationary trend components.
Strong on the major-long side (ETH +0.144 per-coin IC).

Cons / failure modes

Heavy redundancy with the TimeMixer family on this universe.
Auto-correlation cost is O(L log L) — slower than patched attention.
Public 918-paper rankings place it middle of the pack on RMSE.

References

Wu et al., 2021 — Autoformer
research/coin_universe/autoformer.md