Mechanism diagram · Foundation Model
How it works
Moirai uses any-variate attention — attention is computed across any combination of variates without enforcing a fixed structure.
Multiple patch sizes are supported simultaneously, with a frequency embedding telling the model what cadence each input series is at.
A masked encoder design enables both forecasting and infilling.
Pros and cons on this universe
Pros
- Universal — handles series of different frequencies and missingness natively.
- Mixture-of-experts variant (Moirai-MoE) reduces inference cost per prediction.
- ICML 2024 publication with active maintenance.
Cons / failure modes
- No HL empirical IC — frontier item gated on EXP-012c.
- Foundation-model context-parroting critique applies here too.
- Memory footprint substantial at inference.