Mechanism diagram · Foundation Model
FOUNDATION MODEL · ZERO-SHOT Input series lookback Tokenise / patch fixed vocab Pretrained Transformer frozen weights Probabilistic forecast quantile / next-token pretrained on broad TS corpus · no HL fine-tuning yet

How it works

Moirai uses any-variate attention — attention is computed across any combination of variates without enforcing a fixed structure.

Multiple patch sizes are supported simultaneously, with a frequency embedding telling the model what cadence each input series is at.

A masked encoder design enables both forecasting and infilling.

Pros and cons on this universe

Pros

  • Universal — handles series of different frequencies and missingness natively.
  • Mixture-of-experts variant (Moirai-MoE) reduces inference cost per prediction.
  • ICML 2024 publication with active maintenance.

Cons / failure modes

  • No HL empirical IC — frontier item gated on EXP-012c.
  • Foundation-model context-parroting critique applies here too.
  • Memory footprint substantial at inference.

References