TFT — Axon Ridge Capital

Mechanism diagram · Transformer / Attention

How it works

TFT routes inputs through variable selection networks — gates that learn which features to attend to per time-step.

An LSTM encodes local temporal patterns.

Multi-head self-attention captures longer-range dependencies.

The model emits quantile forecasts via a quantile loss head, supporting probabilistic output.

Pros and cons on this universe

Pros

Interpretable feature gates — variable selection makes the model's attention legible.
Strong BNB long (+0.307 per-coin IC).
Rank #15 IC.

Cons / failure modes

Dossier currently recommends diagnostic-only use — hyperparameter-fragile in our experience.
Published “80% return” financial-application claims have been debunked in the wider literature.
Heavyweight architecture relative to mixer alternatives.

References