Forecasting rate limits in claumon
claumon is a small, fast dashboard for Claude Code - a single binary, zero config, one browser tab. It shows rate-limit gauges, per-session token and cost breakdowns, and historical trends. This page covers one piece: the forecaster behind the rate-limit gauges.
The forecaster reads the time series of utilization snapshots in each open rate-limit window (session, weekly, per-model weekly) and produces, per gauge, a point estimate of utilization at reset, an 80% credible interval, and a median ETA to a threshold (with its own interval) when that threshold is reachable before reset.
The model
Four pieces carry the forecast. The full derivation, calibration, and a worked example live in the spec.
The path law. Inside the open window, utilization accumulates as a Gamma process - a non-decreasing jump process. Conditioned on a mean rate , the increment over a horizon is
so its mean and variance grow linearly in time:
Every increment is non-negative, so simulated paths are monotone and floored at the value now.
The rate, by empirical Bayes. The unknown rate combines the recent OLS slope (, with standard error ) and a prior fit on past sessions (mean , variance ) through a normal-normal conjugate update:
The prior dominates early, when the slope is noisy; the data takes over as snapshots accumulate.
The forecast and its spread. The point forecast extrapolates the posterior rate to reset,
and the law of total variance splits its spread into rate uncertainty (quadratic in the remaining horizon) and path noise (linear in it):
The reported 80% interval is read off the Monte Carlo terminal quantiles rather than , so it keeps the right skew and the floor at the current value instead of forcing a symmetric Gaussian band.
The ETA. Because the paths are monotone, a threshold is crossed once and stays crossed, so the first-passage time is well defined:
The mean trajectory gives an early deterministic anchor,
but the reported median ETA and its interval come from the same Monte Carlo. The Gamma skew pushes the median past this anchor, while near-zero rate draws - which may never reach the threshold before reset - stretch the upper tail.
The method is a normative, versioned spec: it fixes the generative model and its calibration so implementations must match, with retired versions archived alongside a changelog.