Mechanism diagram · Transformer / Attention
PATCH / ATTENTION TRANSFORMER Input series lookback × N coins RevIN norm + reverse Patching L → P patches Linear proj → d-model Transformer encoder multi-head self-attention + FFN × L Flatten + head → horizon h Forecast — h × N coins

How it works

TimeXer separates inputs into endogenous (the target series) and exogenous (in our setup, funding rates).

Endogenous patches go through self-attention. A global endogenous token represents the entire series.

Variate-wise cross-attention then injects exogenous signals — the funding stack — into the endogenous representation.

Pros and cons on this universe

Pros

  • Built for exogenous-aware time-series forecasting — aligned with the funding-z hypothesis.
  • NeurIPS 2024 — recent and well-replicated.
  • Rank #13 IC — comfortably positive.

Cons / failure modes

  • Recent MSE-only crypto replication on BTC + M2 reported only marginal lifts.
  • Without exogenous channels, collapses toward PatchTST and loses its differentiator.
  • Cluster-redundant with PatchTST on the IC profile.

References