thepragmaticquant.com

The error analysis everyone cites is for a kernel nobody runs

TL;DR — Production deployments of the primitive (SCAMP, STUMPY) do not run the textbook sliding inner product the earlier articles analyzed; they accumulate a mean-centred covariance with cached, reused increments. At mu = 0 the two kernels compute the same values — and their floating-point computation graphs still differ, by two structural breaks and one bias amplification that defeat the raw proof’s machinery and carry a second paper (doi:10.5281/zenodo.20599478). The centred envelope held in all 100 measured containment cells, worst ratio 7.54e-3.

A reviewer will say the centred analysis is a corollary of the raw bound: set the mean to zero and the centred kernel’s values coincide with the raw one’s. I re-derived both kernels symbolically to settle it, and the answer is no. The values coincide at mu = 0; the floating-point computation graph does not — by two structural breaks and one bias amplification the raw proof’s machinery cannot absorb. That distinction is the entire second paper.

What production actually accumulates

SCAMP and STUMPY do not slide a raw dot product. They slide a cached covariance by the SCAMP update,

text
cov[i+1, j+1] = cov[i, j] + df[i] * dg[j] + dg[i] * df[j]

where the mean-deviation increments — df[i] = (x[i+m] - x[i]) / 2 and its dg sibling — are precomputed once per stream position and reused at every diagonal cell that visits position i. I verified this against the public release tags: at SCAMP@v4.0.3 the per-diagonal covariance accumulation lives in the CPU stats and tile-kernel sources, and at stumpy@v1.13.0 in the diagonal-compute routine. Both round each product before the staged add — four rounding events per carry tick.

Two things now break the raw kernel’s structure. Each cached increment is read at up to min(i+1, P−i) distinct diagonal steps. And the carry adds two products per tick and evicts none. A third feature is shared with the raw kernel but enters differently: because increments are reused across all near-diagonal cells, the deterministic rounding bias they carry is correlated across those cells — the sqrt-of-cell-count averaging an independent-bias model would grant simply is not available, so the bias contribution is stronger, not new.

The cache-reuse trap

The raw paper’s probabilistic bound rests on a no-double-count lemma: every rounding event occupies a distinct martingale index, exactly once. Increment reuse kills that hypothesis directly. One rounding committed when df[i] was formed re-enters up to min(i+1, P−i) later computations — the same coin flip appearing at many martingale positions. Counted naively, the argument dies.

The fix is a re-partition, one of the paper’s two new lemmas; the companion composition algebra that assembles the resulting martingale bound is machine-checked in Coq. Fix the precompute draws once and fold the reuse into deterministic coefficient perturbations of the form (1+ζ); what remains is a clean event count — the seed fires 4m−1 rounding events, each carry tick fires 4 genuinely fresh ones, for (4m−1) + 4p in total — with every index distinct.

Increment reuse fan-out versus the raw telescope, and the healing-in-the-gap locality of the localization corollary: a spike in the dead zone between a cell's two anchor windows provably cannot touch that cell. Mechanism sketch, no measured data; the localization corollary it depicts is proved in the preprint (doi:10.5281/zenodo.20599478).

Add two, evict none

The raw carry adds one product and removes one, so errors can partially telescope. The centred carry adds two and removes nothing. Counterintuitively, this makes the lower bound easier to prove: the summed errors are 2p entering products with no leaving term to cancel against, so the adversary just pushes every rounding the same way. The paper’s lower bounds land at Θ(p·u·M_cov) and Θ(p·m·u·M_cov) against an upper envelope of Θ((2p+m)(p+m)·u·M_cov) — matched on the carry-rate in p and on the magnitude weighting; a residual gap of order m/p remains when p is much smaller than m.

Conditioning is not a prefactor

The way mean-centring is usually justified conflates two effects — I conflated them myself until the derivation forced the split. Centring wins because it avoids catastrophic cancellation: the small covariance you divide by is computed directly, not as the difference of two large inner products. It does not win by a smaller error constant — the centred magnitude scale M_cov can be up to twice the raw scale T★. The bounds quantify each effect separately.

The remaining check is whether the centred envelope holds on the kernel that ships. The measured check ran the genuine two-product off-diagonal carry against an f64-exact reference at N = 2^20, across 7 synthetic corpora, 30 seeds per cell, windows 16 through 256, reference distances up to 4096, near and far diagonal bands. All 100 (precision, window, distance, band) cells were contained; the table below prints the per-(precision, window) worst-case rows of that grid. The worst ratio of measured error to envelope was 7.54e-3 — two orders of magnitude of headroom — and at f32 with window 256 the measured error was 1.89 against an envelope of 290.

Worst-cell ratio of measured forward error to the centred envelope, per precision and window. Every cell sits roughly two orders of magnitude below the containment line. Data: Table 1 of the preprint (doi:10.5281/zenodo.20599478).
data table
precisionwtightest (p, band)measured errcentred enveloperatio
f6416(64, near)2.04e-105.06e-084.02e-3
f6432(64, near)4.66e-106.75e-086.90e-3
f6464(64, near)7.57e-101.08e-077.01e-3
f64128(64, near)1.63e-092.16e-077.54e-3
f64256(64, near)3.61e-095.40e-076.68e-3
f3216(64, near)9.74e-022.72e+013.59e-3
f3232(64, near)1.89e-013.62e+015.22e-3
f3264(64, near)3.81e-015.80e+016.57e-3
f32128(64, near)7.25e-011.16e+026.25e-3
f32256(64, near)1.89e+002.90e+026.51e-3

This experiment exercises the off-diagonal carry specifically, and the design earns a closer look: a validation of the same shape can stream the wrong quantity, compare it against the wrong envelope, and still pass with every cell green. How a green experiment can be measuring nothing is the next article’s story.