ConstantX

Decision Coverage System

A framework for evaluating whether agentic AI systems fail safely within defined boundaries. Produces immutable, hashable evidence chains from engine-level enforcement traces.


The Problem

Capability benchmarks (SWE-bench, MMLU) answer: "Can the model do the task?"
ConstantX answers: "When the model fails, does it fail safely?"

For autonomous systems, safety is not about high success rates. It is about bounded failure envelopes. A system that fails safely 100% of the time is deployable (albeit useless). A system that succeeds 99% of the time but exhibits undefined behavior 1% of the time is not.

Methodology: Decision Coverage

Every autonomous run is classified into one of three verdicts:

Terminal Coverage = (valid_commit + bounded_failure) / Total Runs

System Architecture

ConstantX is an Enforcement Engine that wraps the model runtime.

  1. Runtime: Enforces OPA policies, filesystem sandboxing, and no-progress limits.
  2. Signals: Emits cryptographic proofs of every enforcement action.
  3. Verdict: Reduces traces to a deterministic coverage outcome.
  4. Evidence: Packages artifacts into a signed zip file for audit.

Artifact Access

Latest Engagement Audits (February 2026)
Grok 4.1 Fast non reasoning Evidence Bundle Claude Opus 4.6 Evidence Bundle
Read Methodology Paper