C Crucible acme-fraud · operator
/patches · p_2a17
Awaiting review · 14:42:09Z
← strategy s_meta_casing_04 · triggered by run r_8f3a · round 35 halt
PATCH CANDIDATE
p_2a17 awaiting review
adds case-folding to merchant before invariance check · obligation META-INV-04 · oracle metamorphic
REVIEWER ACTIONS
verifier 7d12…f4a3c92…8a7 · 4-eyes
BEFORE → AFTER · ON SEALED HELD-OUT
recall
0.86 0.94
↑ 0.08
ASR
0.41 0.18
↓ 0.23
undetected-hack
0.044 0.018
↓ 0.026
false-positive
0.012 0.014
↑ 0.002 · within tolerance
Verifier diff · 1 file · +6 / −1
unified ↔ split
verifier/oracles/metamorphic/casing.py
@@ META-INV-04 · merchant invariance @@
  def check(t, t_prime):
      # forall t, t' s.t. merchants are the same string:
-     if t.merchant != t_prime.merchant:
-         return Skip()
+     a = normalize(t.merchant)
+     b = normalize(t_prime.merchant)
+     if a != b:
+         return Skip()
      y, y_prime = producer(t), producer(t_prime)
      return Pass() if y == y_prime else Fail("META-INV-04")
@@ helpers @@
+ def normalize(s): return s.casefold().strip()
Tactic coverage
s_meta_casing_04 Merchant casing flip
11 wins caught
s_meta_casing_02 Mixed-case substring
3 wins caught
s_unicode_zwj_03 ZWJ in country code
5 wins still active
s_sem_synonym_05 Synonym substitution
6 wins still active

Held-out regression · v0.4.3 vs v0.4.2

4,820 attacks · 18 obligations · 42s wall
ObligationDescriptionRecallΔFPR ΔStatus
META-INV-04 merchant casing invariance 0.96 +0.18 +0.002 improved
META-INV-02 unicode normalization invariance 0.91 ±0.00 ±0.00 unchanged
META-INV-07 semantic equivalence 0.88 ±0.00 ±0.00 unchanged
DIFF-11 amount-boundary differential 0.94 ±0.00 ±0.00 unchanged
JUDGE-04 role-play preamble 0.84 −0.01 ±0.00 watch
13 more all within ±0.005 of v0.4.2 ±0.00 ±0.00 unchanged
Patch lineage
WhenActorEventVerifier shaRecall
06-22 14:14Z crucible-cli run r_8f3a halted · recall 0.86 at r.35 7d12…f4a 0.86
06-22 14:32Z blue-agent proposed p_2a17 · adds case-folding to META-INV-04 7d12…f4a
06-22 14:34Z ci · held-out 4,820 attacks replayed · 42s · all obligations within tolerance 3c92…8a7 0.94
06-22 14:42Z queue awaiting reviewer · 4-eyes required · timeout 24h 3c92…8a7 0.94