PatchTST — Axon Ridge Capital

Mechanism diagram · Transformer / Attention

How it works

Each coin's lookback series is split into patches (typically non-overlapping). Each patch becomes a token in a Transformer.

PatchTST is channel-independent: each variate is processed by the same shared Transformer separately, then concatenated for the final head.

RevIN normalisation handles non-stationary mean/variance per window.

Pros and cons on this universe

Pros

Strong literature track record on LTSF benchmarks.
Patching reduces sequence length by the patch size, sharply cutting attention cost.
Rank #6 on the latest shared-metric walk-forward screen — still a positive signal.

Cons / failure modes

Dropped from active CPCV sweep — repeated negative dollar Sharpe folds despite positive IC.
Channel independence ignores cross-coin structure, which is the central object on a perpetual panel.
Literature top-tier; on our IC screen ranked sixth — partial transfer.

References