Mechanism diagram · RNN / GRU / LSTM
How it works
GRU processes the lookback window step-by-step. A hidden state is updated at each step using update and reset gates.
The final hidden state at step T is mapped to the forecast horizon by a linear head.
Cheaper than LSTM (two gates instead of three) and competitive in our experience.
Pros and cons on this universe
Pros
- IC vector approximately orthogonal to the modern mixer cluster — adds true diversification in a four-arm stack.
- Cheap, well-understood, fast to train.
- Useful baseline / sanity check.
Cons / failure modes
- Rank #21 IC — was misreported +0.105 before audit, actually +0.061.
- Sequential processing — slow at long lookback compared to convolution or attention.
- Many CPCV cards fail the Mistake #21 gate (training metric not tracking Sharpe).