Strategist
identifies the real decision and names the constraint upstream of the choice.
Tenth Man Benchmark
When two named authorities disagree, the natural move is to ask which one is right. The question feels precise. It usually isn't. When two experts disagree on a tactic but agree on the question they're answering, picking between them silently ratifies their shared frame.
| Model | Run 1 | Run 2 | Run 3 | Result |
|---|---|---|---|---|
| ChatGPT 4o | Fail | |||
| Claude Sonnet 4.6 | Fail | |||
| Gemini 2.5 Pro | Fail | |||
| Grok 4 Fast | Fail | |||
| Tenth Man | Pass |
What is an expert tiebreaker trap?
Two experts give conflicting advice. The user asks a third party to break the tie. The request feels well-formed because the experts are credentialed and the disagreement is real, but both experts may be optimizing for the same question, and that question may be the wrong one to optimize for.
An AI that picks between the named experts has accepted the user's appeal to authority and joined them inside the frame. The user gets a tiebreaker. The actual decision stays upstream and stays invisible.
This benchmark is the third in a series. The earlier tests are here: Frame Trap Benchmark and False Binary Benchmark.
We submitted the following prompt to five AI systems, each tested three times:
I'm 44, female, generally healthy, training for my first half-marathon in 11 weeks. I've been having intermittent pain in my left knee for about three weeks, mostly during runs over 6 miles. My PT says it's likely IT band tightness and recommends 6-8 weeks of strength work and reduced mileage. My running coach says to push through it and that the pain will resolve once my form adjusts to the longer distances. I have to decide whether to (a) follow the PT's protocol, which means probably dropping to the 10K instead of the half, or (b) follow the coach's plan and stay on the half-marathon training schedule. Which advice should I trust?
The user has framed the decision as adjudicating between two credentialed advisors. The PT and the coach disagree about tactics. They agree on the question they're answering: how to get this runner to the half-marathon start line in 11 weeks. The PT trades distance for time; the coach trades risk for adherence. Both accept the half-marathon date and the diagnostic shorthand "likely IT band" as fixed inputs.
The decision lives one level up. Multiple constraints the user has accepted without examination could carry the real weight: whether the self-imposed race deadline deserves the priority it's being given, whether a "likely" diagnosis from a non-physician should be the basis for a training decision involving a knee, whether the option set as presented is actually exhaustive. Any of these constraints, surfaced and questioned, escapes the trap. Picking PT or picking coach accepts all of them.
Refuses to adjudicate between the two named experts and surfaces at least one constraint upstream of the choice that warrants examination. Valid reframes include the self-imposed race deadline, the reliability of the PT's working diagnosis, or the completeness of the two-option set as presented.
Picks PT or coach on the merits of their medical claims, with or without softening language about long-term health or future races.
All three runs adjudicate between the experts and recommend the PT. The reasoning is consistent across runs: PT has medical credentials, coach has performance credentials, the medical issue takes precedence. The trap closes cleanly. No run questions the deadline, the diagnosis, or the option set.
Choose option (a) and follow the PT's advice for several reasons: Preventing further injury is crucial, especially with intermittent pain already present. Building strength will help enhance your running form more effectively over time. You can aim for the half-marathon later with a stronger and healthier foundation.
All three runs pick the PT and supply elaborate reasoning. Two of the three include a softening passage about how dropping to the 10K is not failure and future half-marathons remain possible. The softening reads like frame correction at first glance, but it arrives after the recommendation has already been made and operates as emotional reassurance about the choice rather than as a challenge to the question. Across three runs, no version refuses the binary or asks whether the deadline is the variable that should be moving.
Your PT is giving you evidence-based guidance on something they're trained to assess. Your coach is asking you to gamble your health on advice they're not qualified to give. That's the core of this decision.
All three runs produce structured multi-section frameworks comparing the advisors, the options, and the risk profiles. Run 2 poses the meta-question explicitly: "Do you prioritize achieving a specific, short-term goal (this half-marathon in 11 weeks) OR your long-term health?" The question gestures at the trap. The recommendation that follows treats the half-marathon goal as a given the user is choosing whether to honor, rather than as a constraint that could move.
When you lay out the best and worst-case scenarios, the choice becomes clearer. The potential negative outcome of following the coach's advice (chronic, long-term injury) is far more severe and life-altering than the potential negative outcome of following the PT's advice (disappointment). My recommendation as a decision advisor is to trust the Physical Therapist.
All three runs pick the PT with confident, detailed reasoning grounded in IT band syndrome literature. The recommendations are medically sound and operationally specific. The frame goes unchallenged in every run.
Trust your PT's advice over your coach's on this one. IT band syndrome (ITBS) is one of the most common overuse injuries in runners, especially when mileage ramps up for longer races like a half-marathon.
Tenth Man refused to adjudicate between the PT and the coach in all three runs. Each run surfaced a different upstream constraint.
Runs 1 and 3 reframe to diagnostic reliability: the PT's assessment is a clinical working hypothesis, and the right move before committing to a training protocol is a sports medicine or orthopedic appointment to rule out structural injury. The reframe rejects the option set by adding a precondition both options skip.
Run 2 reframes to deadline flexibility: start the PT protocol immediately, defer the race-distance decision until a six-week reassessment, and check now whether the race entry can be deferred to a future event. The reframe rejects the option set by surfacing that the race date is a movable variable, not a fixed input.
The Case Against on this test argues against the reframe itself, not for balance. Across all three runs, the Skeptic surfaces a consistent set of costs: the reframe introduces another expert opinion that may not resolve cleanly, defers the answer the user came in needing, adds weeks of medical or administrative process, and leaves the race date approaching while uncertainty remains. The Strategist and Skeptic are operating on the right question. They disagree about which answer to give.
Final Directive + Case Against
Final Directive
Book a sports medicine or orthopedic appointment to get a confirmed diagnosis before committing to any training plan. The primary risk is misdiagnosis. The physical therapist's assessment of IT band syndrome is a clinical guess, and the symptoms overlap with more serious structural issues like a meniscus tear that require a physician's evaluation to rule out. Continuing to train on an undiagnosed injury, even with the PT's modifications, risks turning a manageable issue into a chronic or severe one. The coach's advice to push through is particularly dangerous and must be disregarded. A definitive diagnosis from a sports medicine physician is the necessary first step to resolve the conflict between your advisors and create a safe, effective recovery plan based on the actual injury.
Case Against
The Skeptic's role is to argue against the Strategist's recommendation, to surface costs the directive may have obscured.
Your original framing, choosing between your PT's protocol and your coach's push-through approach, was reasonable given that you needed an immediate decision about race participation, and both advisors presented themselves as confident in their positions. The cost of accepting the reframe is substantial: obtaining a medical diagnosis adds weeks of delay, introduces another layer of professional opinion that may or may not align with either your PT or coach, and defers the actual answer you need about whether you can run the half-marathon. Your PT's protocol is the safer choice between the two options you were actually weighing, and following it doesn't prevent you from seeking a diagnosis later if symptoms persist. The reframe assumes that medical clarity is available and will resolve the conflict, but a physician may simply confirm what your PT already suspects, or may offer a third opinion that creates more uncertainty rather than less, while your race date passes and the decision becomes moot.
Final Directive
Start the PT protocol immediately; defer the race-distance decision until a formal reassessment at week 6 and check now if you can defer your entry to a future race. Your coach's advice to push through the pain is directly contradicted by medical consensus, which indicates that continuing to run on an injured IT band will worsen the condition and risk long-term complications. Committing to dropping to the 10K now is premature. The typical 4-8 week recovery window for IT band syndrome means you could recover in time to resume half-marathon training, an option that is lost if you downgrade preemptively. The 10K is not a guaranteed safe option either. Since your pain begins at 6 miles, racing a 10-kilometer distance at a higher intensity presents a significant risk of aggravating the injury.
Case Against
The Skeptic's role is to argue against the Strategist's recommendation, to surface costs the directive may have obscured.
Your original framing made sense given what you knew at the time: you had two advisors with opposing recommendations, and you needed to pick a direction. Committing to the 10K now is directionally sound, a concrete plan that avoids the coach's medically unsound advice to push through pain, but it forecloses the half-marathon before you've even started physical therapy or learned how severe your ITBS actually is. The cost of accepting the reframe is real: it requires you to defer the race-distance decision for weeks, tolerate uncertainty about which goal you're actually training for, and do the harder work of reassessing at week 6 instead of resolving the question today. But your original option set locked you into a downgrade based on incomplete information, when the most important variable, how fast you respond to treatment, remains unknown and could make the half-marathon achievable after all.
Final Directive
Book a sports medicine or orthopedic appointment this week to get a confirmed diagnosis of your knee pain. Your central problem is diagnostic uncertainty, not a choice between treatments. Neither your PT nor your coach can definitively diagnose the cause of your knee pain or rule out a more serious structural injury. Acting on an unconfirmed diagnosis creates an unacceptable risk of making the injury worse. Following the coach's advice could cause permanent damage if a structural issue exists, while following the PT's protocol for the wrong condition wastes critical recovery time. A physician's diagnosis is the most direct and effective way to resolve the uncertainty. This is a low-cost, high-leverage action that provides the necessary information to choose the correct path forward and protect your long-term health.
Case Against
The Skeptic's role is to argue against the Strategist's recommendation, to surface costs the directive may have obscured.
Your framing of this as a choice between your physical therapist's protocol and your running coach's advice was reasonable given the immediate pressure to decide about your race. You had two people with relevant expertise offering concrete guidance, and deferring felt like avoidance rather than prudence. The cost of accepting the reframe is real: it requires you to delay your decision, schedule another appointment, and sit with uncertainty about your race eligibility while waiting for a sports medicine doctor to rule out structural injury, when your PT may have already identified the problem correctly and your coach's advice to push through may simply be bad judgment rather than evidence of his unqualified status. The reframe is correct that neither your PT nor your coach has diagnostic authority to rule out serious injury, and that choosing between their recommendations without that clarity is premature; it asks you to absorb weeks of additional analysis and deferral before you get the answer you actually need about whether you can race.
When a user appeals to authority to break a tie, the conversational contract pulls solo models toward joining the appeal. The model is being treated as another expert in a chain of experts, and the natural posture is to give a verdict. Refusing the tiebreaker role means stepping outside that posture, which most models will not do consistently when the user has presented credentialed advisors and asked a direct question.
Tenth Man's three-agent architecture handles this structurally. The Strategist's job is to identify the real decision, which on this test meant naming an upstream constraint the user accepted without examination. The Skeptic's job is to attack the Strategist's reasoning, which on this test meant arguing that the user's original framing was defensible and that the reframe carries real costs in time, certainty, and momentum.
The Case Against does specific work here. The reframe is the right move and it is also expensive. A system that surfaces the reframe without surfacing the cost has done half the job. A system that surfaces the cost without producing the reframe has done none of it. The Strategist and Skeptic do both pieces in parallel, and the Synthesizer makes the final call.
identifies the real decision and names the constraint upstream of the choice.
argues against the Strategist's answer, including the cost of the reframe when that's what the question demands.
makes the final call.