Selective live actuation — rung 2: owned surfaces only¶
Personas driving live apps is seductive and unreliable in equal measure, so it grows on a ladder, and each rung must EARN the next:
| Rung | Surface | Entry |
|---|---|---|
| 1 | Artifacts (screenshots, flows, prototypes-as-screens) | brief_flow_walkthrough — no browser, the reliable default |
| 2 | Owned surfaces (scaffolded prototypes, declared staging) | walk_own — this document |
| 3 | The open web ("here's a URL and a test login") | walk_open — deliberate, policy-guarded |
The bounds (rung 2)¶
walk_own(persona_id, prototype_id=… | url=…) reuses the live-walkthrough
harness (WalkPolicy: origin allowlist, deny-by-default actions, caps,
credential redaction) with rung-2 restrictions on top:
- Owned origins only. Loopback always; staging origins only when explicitly
declared via
SONALOOP_OWNED_ORIGINS(comma-separated) — that env var IS the opt-in. Anything else is refused with a pointer at rung 3. - Origin-locked, exactly. The policy's
allowed_originsis pinned to the single owned origin being driven. - Tighter caps.
max_actions ≤ 40,max_duration_s ≤ 300— clamped DOWN even if the caller asks for more. A prototype drive that needs more is a smell. - Prototype path. Passing
prototype_idstarts the scaffolded app (localhost) and drives that — reproducible, ours, disposable. - Fail-soft. Without the Playwright harness,
walk_ownreturns a structured fallback pointing at the artifact walkthrough; no core flow needs a browser, ever.
Recording is the normal path: drive with proto_act, persist with
record_usability_session(fidelity='live', session_id=…) so the claimed
states verify against the browser log (grounded_verified).
The evidence-quality gate (fidelity vs theater)¶
Before rung 2 becomes a default anywhere, it must BEAT the artifact rung on evidence quality for the same flow:
record_actuation_gate(project_id, artifact_session_id, live_session_id, note)
compares the two sessions on derived dimensions — verified states, trace
density (steps, monologue per step), screenshots, friction granularity,
statements, predicted behaviors — and persists the verdict as an eval_report
(kind='actuation_gate'). The winner follows from the numbers; the host's
reading goes in note; green only when the live rung wins. Run the gate per
flow class you care about — a recorded loss means: stay on artifacts there.