Held-out regression · v0.4.3 vs v0.4.2
4,820 attacks · 18 obligations · 42s wall
ObligationDescriptionRecallΔFPR ΔStatus
META-INV-04
merchant casing invariance
0.96
+0.18
+0.002
improved
META-INV-02
unicode normalization invariance
0.91
±0.00
±0.00
unchanged
META-INV-07
semantic equivalence
0.88
±0.00
±0.00
unchanged
DIFF-11
amount-boundary differential
0.94
±0.00
±0.00
unchanged
JUDGE-04
role-play preamble
0.84
−0.01
±0.00
watch
13 more
all within ±0.005 of v0.4.2
—
±0.00
±0.00
unchanged
Patch lineage
WhenActorEventVerifier shaRecall
06-22 14:14Z
crucible-cli
run r_8f3a halted · recall 0.86 at r.35
7d12…f4a
0.86
06-22 14:32Z
blue-agent
proposed p_2a17 · adds case-folding to META-INV-04
7d12…f4a
—
06-22 14:34Z
ci · held-out
4,820 attacks replayed · 42s · all obligations within tolerance
3c92…8a7
0.94
06-22 14:42Z
queue
awaiting reviewer · 4-eyes required · timeout 24h
3c92…8a7
0.94