Mechanism diagram · Transformer / Attention
How it works
TFT routes inputs through variable selection networks — gates that learn which features to attend to per time-step.
An LSTM encodes local temporal patterns.
Multi-head self-attention captures longer-range dependencies.
The model emits quantile forecasts via a quantile loss head, supporting probabilistic output.
Pros and cons on this universe
Pros
- Interpretable feature gates — variable selection makes the model's attention legible.
- Strong BNB long (+0.307 per-coin IC).
- Rank #15 IC.
Cons / failure modes
- Dossier currently recommends diagnostic-only use — hyperparameter-fragile in our experience.
- Published “80% return” financial-application claims have been debunked in the wider literature.
- Heavyweight architecture relative to mixer alternatives.