Research

Research Library

Published benchmarks on how AI systems handle decision framing: when the question you ask isn’t the decision you need.

Each study runs the same high-stakes prompt across multiple models, scores pass and fail against explicit criteria, and publishes the full runs. More benchmarks will be added here as we publish them.

Published benchmarks