iTransformer — Axon Ridge Capital

Mechanism diagram · Transformer / Attention

How it works

iTransformer inverts the conventional axis assignment. Each variate's full lookback series becomes one token in the Transformer.

Self-attention is computed across variates (across coins), not across time positions — explicitly modelling cross-variate structure.

FFN layers then process the time-axis information per variate.

Pros and cons on this universe

Pros

Only top-15 arm with native cross-variate attention thesis — fits a cross-sectional trader's mental model.
Strong literature track record (ICLR 2024).
Rank #9 IC on this universe.

Cons / failure modes

Attention across variates means cost is O(N²) in the number of coins, not the lookback length.
Pairwise IC correlation ≈ +0.96 with TSMixer — much of its signal is redundant.
918-paper benchmark places it mid-tier on RMSE.

References