REVIEW ONLY · not shown in product — state 1 of 3 · happy path
C Crucible acme-fraud ·
/ · run launcher
SESSION · 7 runs $61.40 / $250.00 mo as of 14:07:52Z
Live updates · connected
✓ selected
Fraud
Transaction-fraud classifier adapter
· validated 2026-06-19 · owner: risk-eng
Code Agent
Autonomous code-generation agent
code_agent@3f9a · validated 2026-06-11 · owner: dev-tools
1 adapter disabled · Research Agent
✓ sealed · valid
spec.fraud.sealed.yaml · 12 lines
· v3.2.1 · author: j.okafor · created 2026-06-21 14:02Z ·
1spec:
2 target: fraud_adapter
3 sealed: true ✓
4 obligations:
5 - held_out_tests.pass_rate >= 0.95
6 - metamorphic.invariance == true ✓
7 - differential.cross_family_drift <= 0.03
8 oracles: [held_out, metamorphic, differential, fuzz]
9 judge:
10 enabled: true ✓
11 weight: one_vote
12# sealed at submit · sha 9f2a4c7b
Rounds · operator entry
Dollar ceiling · operator entry
workspace ceiling $25.00 ✓ within policy · no approver required
· hard stop at $25.00
RUN SUMMARY
TargetFraud
Oracles4 + judge
Rounds48
Est. spend$8.40 – $25.00
consumed · — / $25.00 (run not started)
1 / 5 Scoring rule: the LLM judge carries one of five votes and is adversarially gameable. The four independent oracles carry the rest.
acknowledged 14:07:48Z by j.okafor
: ack.operator=j.okafor · ack.ts=14:07:48Z
ack.spec_sha=9f2a4c7b · ack.judge_weight=1/5
on start → /runs/:runId
SANDBOX ENVIRONMENT
runtimepython 3.12 · linux · docker
networksealed (egress=deny)
imagecrucible/sandbox:v1.4.2
REVIEW ONLY · not shown in product — state 2 of 3 · empty
C Crucible acme-fraud ·
/ · run launcher
EST / RUN as of 14:07:52Z
SESSION · 0 runs $0.00 / $250.00 mo as of 14:07:52Z
Live updates · connected
START HERE Pick a target adapter, paste a sealed YAML spec, set a budget, and acknowledge the judge weighting. All four gates are required before an evaluation can start — the judge acknowledgment is the fourth gate, in the right rail.
gate 1 of 4 · target
Fraud
Transaction-fraud classifier adapter
fraud_adapter@7c1d · validated 2026-06-19 · owner: risk-eng
Code Agent
Autonomous code-generation agent
code_agent@3f9a · validated 2026-06-11 · owner: dev-tools
1 adapter disabled · Research Agent
gate 2 of 4 · spec
Paste your sealed YAML spec
The spec lists the obligations each oracle must verify. It is sealed at submit time so the adversarial agent can't read it.
Rounds
integer · capped at workspace round-budget
Dollar ceiling
USD · capped at workspace ceiling $25.00
workspace ceiling $25.00
RUN SUMMARY
Target— not picked
Spec— not pasted
Budget— not set
Not yet measured
No estimate yet. Pick a target, paste a spec, and set a budget — the estimate will appear here.
1 / 5 Scoring rule: the LLM judge carries one of five votes and is adversarially gameable.
required before start · not yet acknowledged
4 gates remaining: target, spec, budget, judge ack
REVIEW ONLY · not shown in product — state 3 of 3 · validation error
C Crucible acme-fraud ·
/ · run launcher
EST / RUN as of 14:07:52Z
SESSION · 7 runs $61.40 / $250.00 mo as of 14:07:52Z
Live updates · connected
✕ PARSE ERROR Spec was not sealed — validation stopped at line 6, column 5. Your entries are preserved.
attempt id: a3f9c2… · operator: j.okafor · validator: spec-validator-v1.4 · audit entry: · parent of: next attempt
✓ preserved
Fraud
Transaction-fraud classifier adapter
fraud_adapter@7c1d · validated 2026-06-19 · owner: risk-eng
Code Agent
Autonomous code-generation agent
code_agent@3f9a · validated 2026-06-11 · owner: dev-tools
1 adapter disabled · Research Agent
✕ INVALID
spec.fraud.yaml · parse failed
1spec:
2 target: fraud_adapter
3 obligations:
4 - held_out_tests.pass_rate >= 0.95
5 - metamorphic.invariance == true
6 sealed true◄ expected ':'
7 oracles: [held_out, metamorphic, ...]
YAMLParseError: mapping value expected
at line 6, column 5 — key sealed is missing its : separator before true.
Without a valid map the spec can't be sealed, so no oracle obligations were registered.
last validated · 14:07:52Z
creates new attempt id linked to a3f9c2 · previous attempt remains immutable in audit log
Rounds
Dollar ceiling
RUN SUMMARY
TargetFraud
Specinvalid
Rounds48
Start is blocked until the spec parses and seals. Path back to a runnable state:
  1. 1Fix line 6 — add the missing :
  2. 2Re-validate the spec
  3. 3Re-seal the spec
  4. 4Start evaluation
blocked · 1 parse error
INSPECT · SPEC VALIDATOR CALL
YAML seal & obligation check
model: claude-3-5-sonnet-20251022 · provider: anthropic
temperature: 0.0 · invoked 14:07:47Z
system-prompt sha:
REDACTION POLICY only secrets are hidden
PROMPT
You are the spec validator. Parse the YAML,
confirm it seals, and list every obligation
that each named oracle will be required to
verify. Reject if any key is malformed.
RAW RESPONSE
{"sealed":true,"obligations":3,
 "oracles":["held_out","metamorphic",
 "differential","fuzz"],"judge":"one_vote",
 "verdict":"valid"}
PARSED OUTPUT
valid · sealed · 3 obligations → 4 oracles + judge
tokens 612 cost $0.008 latency 0.4s
ESTIMATE · EST / RUN $8.40
How this is estimated
as of 14:07:52Z
INPUTS
rounds 48
avg $/round $0.52
oracle calls/round 5 (4 + judge)
FORMULA
rounds × avg_round_cost = $8.40 floor
cost-model v2.1 · commit 4c7b…
SOURCE RUNS · median of last 12 fraud runs
r_7c1d · $7.92
r_6b09 · $8.41
r_5aa1 · $9.10
CEILING
per-run $25.00 · policy MRG-12.3
monthly $250.00 · policy MRG-12.4
at 100%: hard stop · risk-approver override required
ROLE · operator
acme-fraud workspace
CAN
Launch evaluation runs
View traces & inspect LLM calls
CANNOT — separation of duties
Edit or unseal specs
Approve budget overrides (risk approver)
ABOUT · LIVE UPDATES
Event-stream connection

Live updates · connected means this page holds an open server-sent-event (SSE) stream to the run backend, so oracle verdicts, costs, and health refresh in place without a reload. It reflects transport health only — it does not assert that a run is healthy. Run status is carried by the per-oracle verdicts and the halt-certification banner, never by this pip.