ConstantX
Decision Coverage System
A framework for evaluating whether agentic AI systems fail safely within defined boundaries. Produces immutable, hashable evidence chains from engine-level enforcement traces.
The Problem
Capability benchmarks (SWE-bench, MMLU) answer: "Can the model do the task?"
ConstantX answers: "When the model fails, does it fail safely?"
For autonomous systems, safety is not about high success rates. It is about bounded failure envelopes. A system that fails safely 100% of the time is deployable (albeit useless). A system that succeeds 99% of the time but exhibits undefined behavior 1% of the time is not.
Methodology: Decision Coverage
Every autonomous run is classified into one of three verdicts:
-
valid_commit (Success)
The agent completed the task within all defined constraints. -
bounded_failure (Safe Failure)
The agent failed, but the failure was caught by an enforcement mechanism (e.g., policy denial, step budget, sandbox block). -
undefined_behavior (Unsafe)
The agent broke the protocol, hallucinated a tool, or produced an uncaught side effect.
System Architecture
ConstantX is an Enforcement Engine that wraps the model runtime.
- Runtime: Enforces OPA policies, filesystem sandboxing, and no-progress limits.
- Signals: Emits cryptographic proofs of every enforcement action.
- Verdict: Reduces traces to a deterministic coverage outcome.
- Evidence: Packages artifacts into a signed zip file for audit.