Mechanism diagram · Transformer / Attention
How it works
Each coin's lookback series is split into patches (typically non-overlapping). Each patch becomes a token in a Transformer.
PatchTST is channel-independent: each variate is processed by the same shared Transformer separately, then concatenated for the final head.
RevIN normalisation handles non-stationary mean/variance per window.
Pros and cons on this universe
Pros
- Strong literature track record on LTSF benchmarks.
- Patching reduces sequence length by the patch size, sharply cutting attention cost.
- Rank #6 on the latest shared-metric walk-forward screen — still a positive signal.
Cons / failure modes
- Dropped from active CPCV sweep — repeated negative dollar Sharpe folds despite positive IC.
- Channel independence ignores cross-coin structure, which is the central object on a perpetual panel.
- Literature top-tier; on our IC screen ranked sixth — partial transfer.