GRU — Axon Ridge Capital

Mechanism diagram · RNN / GRU / LSTM

How it works

GRU processes the lookback window step-by-step. A hidden state is updated at each step using update and reset gates.

The final hidden state at step T is mapped to the forecast horizon by a linear head.

Cheaper than LSTM (two gates instead of three) and competitive in our experience.

Pros and cons on this universe

Pros

IC vector approximately orthogonal to the modern mixer cluster — adds true diversification in a four-arm stack.
Cheap, well-understood, fast to train.
Useful baseline / sanity check.

Cons / failure modes

Rank #21 IC — was misreported +0.105 before audit, actually +0.061.
Sequential processing — slow at long lookback compared to convolution or attention.
Many CPCV cards fail the Mistake #21 gate (training metric not tracking Sharpe).

References