Verify before you act

Beta. Decision quality (pb.dq) is live and in active development — the surface is stable enough to build on, and still being molded. Expect it to grow.

Most agent failures are not reasoning failures; they are context failures — the model acted confidently on material that was missing, stale, or never sourced. Penumbra’s answer is the decision-quality surface: every forward operation has a backward verdict, so “the model was confident” can be replaced with “the graph was sufficient and sourced.” Three patterns cover most of what you will build with it.

Pattern 1 — The preflight gate

Before an agent takes a consequential action, check that its subject is fit for that purpose:

const verdict = await pb.dq.check({
  subject: { type: "Account", id: "acme" },
  purpose: "send a renewal proposal",
});

if (verdict.safeToAct) {
  await sendProposal();
} else {
  const plan = pb.dq.repair(verdict);
  // plan.actions name what to capture, extract, or hydrate — run them
  // through pb.*, then re-check before acting.
}

The verdict is relative to the purpose: the same account can be fit for “log a call note” and unfit for “send a renewal proposal.” Tie the check to the action, not the record. This is the difference between an agent that knows when it doesn’t know and one that fails quietly.

Pattern 2 — Receipts on every answer

Penumbra briefings carry their own evidence set, so answers can show receipts. Synthesize, trace, and render the per-claim verdicts:

const briefing = await pb.memory.synthesize({
  query: "what should be known before contacting Acme?",
});

const trace = await pb.dq.trace(briefing);

for (const claim of trace.claims) {
  // claim.status: "supported" | "inferred" | "unsupported"
  render(claim.text, badge(claim.status));
}

if (!trace.grounded) {
  // at least one claim is not backed by a stored source
}

supported means backed by a stored source. inferred means semantically related but not directly sourced. unsupported means no backing at all. A grounded surface treats anything below supported as a gap to show, not a fact to assert — which turns “trust the answer” into “inspect the answer.”

Pattern 3 — Semantic CI

Extraction pipelines run unattended. Put a merge gate in front of apply, the way CI gates a pull request:

const receipt = await pb.extract({ source: text, shapeId, apply: false });

const audit = await pb.dq.audit(receipt.deltaId);

if (audit.degrades) {
  console.log(audit.explain()); // schema violations, what would break
  await quarantine(receipt.deltaId);
} else {
  await pb.deltas.apply(receipt.deltaId);
}

audit is a kernel dry run — it never writes. A batch that would violate the shape or degrade the graph gets caught while it is still staged and revertible.

No scores, anywhere

None of these verdicts carry a confidence number, by design. A score tells you nothing actionable; a finding (missing-required: Account.contract_end) tells you exactly what to do next, and repair turns findings into the actions that fix them. Branch on safeToAct, grounded, and degrades — never on thresholds.

Grounded copilot

The full build: gather, trace, check, act or repair, as one loop.

Decision quality reference

check, gaps, repair, audit, and trace — every option.

​Pattern 1 — The preflight gate

​Pattern 2 — Receipts on every answer

​Pattern 3 — Semantic CI

​No scores, anywhere

​Next

Grounded copilot

Decision quality reference

Pattern 1 — The preflight gate

Pattern 2 — Receipts on every answer

Pattern 3 — Semantic CI

No scores, anywhere

Next