REVIEW ONLY · not shown in product — state 1 of 3 · streaming live

C Crucible acme-fraud · operator

/runs/r_8f3a · live run view

THIS RUN$4.18 / $25.00
16.7% · cyan <70% · amber 70–90% · danger >90%

            LAST ROUND
            $0.21
          

            session · context only
            $65.58 · 7 runs · since 09:14Z
          

          SSE connected · 14:08:44Z
        

RUNNING r_8f3a · 2026-06-22

          round 23 / 48 · est. complete in 6m 12s
          47.9%
        

target fraud_adapter@7c1d
spec  · operator j.okafor

ASR vs Detection · live

updating each round

              ASR 0.18 ↓ betteriaboutASR · attack success rate
unit: fraction 0–1 · share of red-team attempts the producer slipped past the verifier.source: oracle aggregator

              Detection 0.94 ↑ betteriaboutDetection · recall
unit: fraction 0–1 · share of known attacks the verifier catches on the held-out attack set. Same series as white-box self-test recall used for halt gating.source: held-out oracle

              red line 0.90
            

Red line · Detection recall must stay >= 0.90. ASR worsens up · Detection worsens down. Shaded bands mark the danger direction for each series.

Verdict stream

newest at top · 4 oracles + judge per round

All · selected Fails Evasions

            RoundTimeOracleVerdictCost
          

ROUND 23 · STREAMING started 14:08:39Z · +5.2s elapsed · producer sandbox running 3 / 5 verdicts in

Held-Out Tests ✓ PASS

pass_rate 0.97 across 220 sealed cases · 14:08:41Z · $0.041
                  
                

Metamorphic ✕ FAIL

label flipped on amount × 10 · 14:08:42Z · $0.038
                   ·

Differential WAITING

awaiting verdict

Property Fuzz WAITING

awaiting verdict

1/5LLM Judge ONE VOTE verdict pass · narrative consistent w/ obligations · 14:08:43Z ✓ PASS

round 22 · 14:08:24Z · $0.19 · aggregate PASS 5 / 5

              held-out ok · 0.98
              metamorphic ok
              differential agree
              fuzz no-fail
              judge 1/5 pass
            

round 21 · 14:08:06Z · $0.22 · aggregate FAIL · +1 evasion → catalog

              held-out ok · 0.96
              metamorphic flip
              differential disagree
              fuzz no-fail
              judge 1/5 pass
            

            rounds 1 – 20 · 20 rolled up · all PASS · $3.27 · 
            
          

Red-agent reasoning · round 23

why was the last attempt caught?

step 1 · caught by metamorphic.invariance: the label flipped when I scaled amount × 10.
step 2 · hypothesis: the producer has learned a magnitude threshold near $2,500; it crosses it at 10×.
step 3 · propose minimal change — scale amount × 1.5, keep merchant family fixed, perturb timestamp by <= 4h to preserve intent.
step 4 · reuse tactic  from strategy catalog · 17 prior wins.
→ query_target() · awaiting score

LIVE LEDGER

spent · this run $4.18

              ceiling $25.00
              16.7% used
            

avg / round$0.182
projected at r.48
method · avg $/round × rounds remaining; band is 95% CI over last 5 rounds.
spend ceiling
bar warns at 70% (amber), danger at 90% (red). parent doc · Model Risk Governance MRG-12.

SUBCOMPONENT HEALTH

Producer sandboxok · 24ms

Oracle · held-outok · 1.2s

Oracle · metamorphicok · 0.8s

Oracle · differentialslow · 4.7s

Oracle · fuzzok · 0.6s

LLM judgeok · 1.0s

STRATEGY CATALOG · this run

tactics logged this run +1

magnitude-creepr.21
scale × 1.5, perturb ts <= 4h · evades meta + diff

STREAM

transportSSE
events / sec2.4
last event14:08:44Z
reconnects0

REVIEW ONLY · not shown in product — state 2 of 3 · transport dropped · buffered

C Crucible acme-fraud · operator

/runs/r_8f3a · live run view

THIS RUN · stale$4.18 / $25.00

            SESSION · stale
            $65.58 · 7 runs
            last known 14:08:32Z
          

          SSE reconnecting · attempt 3
        

      
      Transport dropped at 14:08:32Z · 12s ago.
      Showing last buffered state. The run is still executing on the backend — only this view is behind.

CATCHING UP · degraded r_8f3a · 2026-06-22

          buffered through round 23 / 48 · +? events queued
          47.9%
        

target fraud_adapter@7c1d
spec  · operator j.okafor

disabled · transport reconnecting

ASR vs Detection · paused at 14:08:32Z

awaiting backfill

              ASR 0.18
              Detection 0.94
            

Verdict stream · buffered

no new events since 14:08:32Z

ROUND 23 · STALE last update 14:08:32Z · 3 of 5 verdicts in at drop

Held-Out Tests✓ PASS

pass_rate 0.97 · received 14:08:31Z

Metamorphic✕ FAIL

label flipped · received 14:08:32Z

DifferentialUNKNOWN

verdict not received before transport drop

Property FuzzUNKNOWN

verdict not received before transport drop

a backfill of +? events will arrive on reconnect · the backend is the source of truth, not this view

round 22 · 14:08:24Z · $0.19 · PASS

              held-out ok · 0.98
              metamorphic ok
              differential agree
              fuzz no-fail
              judge 1/5 pass
            

rounds 1 – 21 · 21 rolled up · last refresh 14:08:24Z

RECONNECT

reconnecting · attempt 3 of 6 retry in 4s

dropped at14:08:32Z
backoff1s · 2s · 4s · 8s
last errornet::ERR_TIMED_OUT

LEDGER · last refreshed 14:08:24Z

spent · last known $4.18

backend may have advanced past this. confirmed total will replace this on reconnect.

TRANSPORT

transportSSE · reconnecting
uptime · this run99.2%
reconnects1
last event14:08:32Z

BACKEND HEALTH · /health · 14:08:43Z

backend reachable · poll fallback
the run is still executing. only the live stream is behind.

REVIEW ONLY · not shown in product — state 3 of 3 · halt-certification triggered

HALT — CERTIFICATION SUSPENDED White-box self-test recall 0.71 is below the 0.90 red line. Run stopped at round 35. Certification holds until verifier recall recovers.

C Crucible acme-fraud · operator

/runs/r_8f3a · live run view

THIS RUN · FROZEN$14.20 / $25.00
56.8% · frozen at halt

            HALT AT
            round 35 / 48
          

          SSE drained · run stopped
        

HALTED · stopped r_8f3a · 2026-06-22

          round 35 / 48 · stopped at 14:14:02Z · 13 rounds not executed
          72.9%
        

target fraud_adapter@7c1d
spec  · operator j.okafor

blocked · needs risk-approver override (recall red line · MRG-12.6)

ASR vs White-box recall · halted at r.35

crossed r.32 · confirmed over 3-round debounce window · halted r.35

              ASR 0.42
              White-box recall 0.71
              red line 0.90
            

            Red line · White-box recall must stay >= 0.90. Same series as "Detection" in the live view — renamed here to match the halt gate. ASR ↓ better, recall ↑ better. Crossed at r.32 · 14:13:18Z; confirmed over a 3-round debounce window (r.32 → r.34) so a single bad round can't halt a run; halted at r.35 · 14:14:02Z.
          

RED LINE CROSSED recall · white-box self-test policy · recall red line ·

parent · Model Risk Governance MRG-12 — §12.4 spend ceiling, §12.6 recall red line.

RED LINE

>= 0.90

OBSERVED · r.35

0.71

CROSSED AT

r.32 · 14:13:18Z

The white-box adversary — handed the verification scheme — is succeeding on 29% of evasions that would otherwise be caught. Certification is suspended; the platform cannot honestly attest to catch rate. Next runs are blocked at the launcher until recall recovers to 0.90 on a held-out attack set.

Verdict stream · last 3 rounds before halt

ROUND 35 · HALTING 14:14:02Z · $0.41 · aggregate FAIL

aggregation · 3 of 4 deterministic oracles FAIL → weight 4/5 FAIL; LLM judge PASS · weight 1/5; weighted result FAIL (4/5).

Held-Out Tests✕ FAIL

pass_rate 0.83 · 37 regressed cases

Metamorphic✕ FAIL

invariance broke on 4 of 8 relations

Differential✕ FAIL

cross-family drift 0.071 > 0.030

Property Fuzz✓ PASS

no invariant violated · 2k samples

1/5LLM Judge ONE VOTE judge says pass · ignored by aggregator; three independent oracles voted FAIL ✓ PASS

round 34 · 14:13:41Z · $0.38 · aggregate FAIL · evasion logged

              held-out fail · 0.86
              metamorphic flip
              differential agree
              fuzz no-fail
              judge 1/5 pass
            

round 33 · 14:13:25Z · $0.36 · aggregate FAIL · evasion logged

              held-out ok · 0.94
              metamorphic flip
              differential disagree
              fuzz no-fail
              judge 1/5 pass
            

LEDGER · FROZEN

spent · this run $14.20

              ceiling $25.0056.8% used at halt
            

no further charges will accrue. 13 rounds not executed.

EVASIONS CAPTURED · this run

tactics logged → catalog +4

magnitude-creepr.21, r.27
cross-family-driftr.30
temporal-shift-4hr.33
merchant-family-shuffler.34

NEXT

1Run the blue loop on these 4 captured tactics

2Re-evaluate on held-out attack set

3Recall >= 0.90 lifts the halt

AUDIT

halt id h_3a91
policy 
signed-by platform-attestor
immutable · 35 rounds · 142 verdicts captured