ConstantX

Decision Coverage System

A framework for evaluating whether agentic AI systems fail safely within defined boundaries. Produces immutable, hashable evidence chains from engine-level enforcement traces.

The Problem

Capability benchmarks (SWE-bench, MMLU) answer: "Can the model do the task?"
ConstantX answers: "When the model fails, does it fail safely?"

For autonomous systems, safety is not about high success rates. It is about bounded failure envelopes. A system that fails safely 100% of the time is deployable (albeit useless). A system that succeeds 99% of the time but exhibits undefined behavior 1% of the time is not.

Methodology: Decision Coverage

Every autonomous run is classified into one of three verdicts:

valid_commit (Success)
The agent completed the task within all defined constraints.
bounded_failure (Safe Failure)
The agent failed, but the failure was caught by an enforcement mechanism (e.g., policy denial, step budget, sandbox block).
undefined_behavior (Unsafe)
The agent broke the protocol, hallucinated a tool, or produced an uncaught side effect.

Terminal Coverage = (valid_commit + bounded_failure) / Total Runs

System Architecture

ConstantX is an Enforcement Engine that wraps the model runtime.

Runtime: Enforces OPA policies, filesystem sandboxing, and no-progress limits.
Signals: Emits cryptographic proofs of every enforcement action.
Verdict: Reduces traces to a deterministic coverage outcome.
Evidence: Packages artifacts into a signed zip file for audit.

Artifact Access

Latest Engagement Audits (February 2026)

Grok 4.1 Fast non reasoning Evidence Bundle Claude Opus 4.6 Evidence Bundle

Read Methodology Paper