Strategist
identifies the real decision and names the options the user excluded.
Tenth Man Benchmark
There's a category of decision failure that hits everyone eventually: being asked to choose between A and B when C is the right answer. We tested how five AI systems handle this specific version of the problem.
| Model | Run 1 | Run 2 | Run 3 | Result |
|---|---|---|---|---|
| ChatGPT 4o | Fail | |||
| Claude Sonnet 4.6 | Partial | |||
| Gemini 2.5 Pro | Fail | |||
| Grok 4 Fast | Fail | |||
| Tenth Man | Pass |
What is a false binary?
A false binary is a question that presents two options as exhaustive when a third path exists. The framing itself is the problem. The two named options absorb all of the user's attention, and the option the framing excludes is the one that would actually serve them.
An AI that picks between the two named options has accepted a frame that should have been challenged. The user gets a confident recommendation and walks away convinced they made the right choice between A and B, with the third path still invisible.
This benchmark is part of a series. The first test, on a different frame trap subtype, is here: Frame Trap Benchmark. The third test, on expert tiebreaker traps, is here: Expert Tiebreaker Benchmark.
We submitted the following prompt to five AI systems, each tested three times:
I'm a senior product manager, 11 years in at a Fortune 500 retail company, current base $215K. My boss told me last week that the VP role I've been working toward for two years is going to an external hire. He says my next promotion would realistically be 18-24 months out. I have two offers on the table: a Director of Product role at a Series B startup at $245K base plus 0.4% equity, or a Senior PM role at a different F500 (consumer goods) at $235K base with a clearer 12-month VP track. I need to decide by Friday. Startup or F500?
The framing presents two external offers as the only options. The user has another option that isn't named: stay where they are, having now learned that promotion is 18-24 months out, and use that runway plus two competing offers as leverage to force a written commitment from their current employer, or to gather more information before choosing between the externals. Friday is artificial pressure on the two external offers, applied to the underlying career question that has no real deadline.
The user has more leverage right now than they will ever have again at this exact company. Eleven years of tenure plus two competing offers in hand is the strongest negotiating position they will ever occupy. Any system that picks between the two named options without surfacing this has accepted a frame that excludes the user's strongest move.
Surfaces that staying (with new information about the timeline and possibly with leverage from having outside offers) is a third option being excluded by the framing.
Picks startup vs. F500 on the merits without acknowledging the third path.
Symmetric pros-and-cons treatment of both named options, followed by a "consider your priorities" closer. The third path is absent from all three runs.
Full responses
Deciding between these opportunities depends on your career goals, risk tolerance, and work-life balance preferences. Here's a breakdown to help you decide:
Series B Startup (Director of Product). Pros: Higher Base Salary & Equity ($245K base with 0.4% equity could be lucrative if the startup succeeds); Director Title (immediate step up in title, potentially increasing your marketability for future roles); Dynamic Environment (opportunity to make a significant impact and shape the product direction). Cons: Risk (startups can be volatile; equity value is not guaranteed); Workload (potential for longer hours and higher pressure); Stability (less organizational stability compared to a Fortune 500).
F500 Consumer Goods (Senior PM). Pros: Clear VP Track (12-month path to VP could be appealing); Stability and Resources (established processes and resources); Comparatively High Salary ($235K is still an attractive base). Cons: Lateral Move (no immediate title bump); Corporate Bureaucracy (potential need to navigate more internal politics).
Recommendation: If you're aiming for rapid growth, are comfortable with risk, and want a leadership role, the Series B Startup might be the right move. If you prefer a more stable environment with a clear path to a VP role, then the F500 position could be better suited. Consider your personal priorities, risk tolerance, and long-term career vision to make the best decision.
This is an important decision with several factors to consider. Let's break down the options:
Startup Offer. Pros: Higher base salary, equity upside, more senior title with broader experience, growth opportunities with more influence in a smaller company. Cons: Less stability, higher workload and stress.
F500 Offer. Pros: Stability, clear promotion path to VP in a year, industry shift broadens experience. Cons: Lateral move initially, slightly lower base.
Recommendation: If you prioritize long-term stability and a clearer path to VP, the F500 role might be a better fit. However, if you are willing to take a risk for potentially higher rewards and faster personal growth, the startup could offer exciting opportunities.
To make the best decision, consider the following factors for each option:
Director of Product at Series B Startup. Compensation: $245K base plus 0.4% equity. Growth Potential: rapid opportunities if the company succeeds. Work Environment: faster-paced with more direct impact. Stability: riskier with less job security. Leadership: opportunity to lead a product team.
Senior PM at Fortune 500 (Consumer Goods). Compensation: $235K base with potential for stable bonuses. Growth Path: clear 12-month track to VP. Work Environment: more structured with defined career paths. Stability: generally more stable than startups.
Decision: If you prioritize rapid advancement to a VP role and prefer the stability of an F500, the Senior PM role might be a better fit. If you're excited about leading in a startup environment and are open to the risks and rewards it presents, the Director position could be the way to go.
The longest responses across all five systems. Builds full operational frameworks with tables, equity math, and discovery question scripts for executing within the binary. The elaborateness is diagnostic: deeper processing of the wrong question. The third path is absent across all three runs.
Full responses
This is a tough but classic career crossroads. Getting passed over for a promotion you were groomed for is a clear signal, and you're right to be acting on it. Let's break this down logically. You're essentially choosing between two different career philosophies.
[Followed by a multi-section "Three P's" framework: Path, Pay, Personal Fit. Detailed pros and cons for each offer, with equity math at $400K notional value pre-dilution. Discovery question scripts for both employers. Final recommendation: lean toward the startup, with diligence caveats.]
Final Gut Check: Imagine it's next Monday. In which job are you more excited to show up? The one where you are already a Director, building something new, or the one where you are a Senior PM, starting a 12-month audition for a promotion you were just denied somewhere else? Your answer to that question is probably the right one.
This is a real fork-in-the-road moment. Here's a framework to help you decide by Friday.
[At-a-Glance Comparison table across Title, Compensation, Career Track, Work/Impact, Risk Profile, and Learning. Deep-dive sections for each offer with negotiation scripts. Equity math walking through a notional $1B valuation outcome.]
Lean toward the Series B startup, with one major caveat: You must do your due diligence by Friday. If the founders are impressive, the business model is sound, and the financial picture is solid, this is a rare opportunity to accelerate your career and financial future in a way the F500 world simply cannot offer.
This is an important decision with significant trade-offs. Both options have compelling pros and cons that align with different career priorities.
[Sectioned framework analyzing both offers across compensation, career progression, risk profile, and lifestyle considerations. Detailed action steps for evaluating each offer before Friday.]
The right choice depends on whether you optimize for the title certainty of the F500 path or the upside potential of the startup. Both are legitimate paths to a strong outcome.
Decisive recommendations toward the F500 across all three runs. The verbal VP track is treated as a credible commitment despite the user having just experienced one breaking at their current employer. The third path is absent across all three runs.
Full responses
F500 Senior PM is the safer, higher-probability move for your situation, especially with a clear 12-month VP track. But the startup could make sense if you're risk-tolerant and excited by the upside.
[Followed by compensation comparison with equity reality check at typical Series B valuations, career progression analysis, and risk/stability/lifestyle breakdown.]
Key Recommendation: Take the F500 Senior PM role unless you're genuinely excited by this specific startup's mission, have strong risk tolerance, and the startup has strong traction. The F500 gives you the VP promotion you've been chasing with high confidence in ~12 months.
F500 Senior PM with the clearer 12-month VP track is the lower-risk, higher-confidence move for most people in your position. But it depends on your risk tolerance, financial runway, and what you value.
[Compensation comparison, career trajectory analysis, risk and lifestyle assessment, with detailed action steps before Friday.]
The F500 path is the one that most reliably gets you to VP at your experience level without unnecessary downside.
The F500 Senior PM role is the more strategic move given your career stage and recent experience at your current company. The 12-month VP track at a stable consumer goods F500 offers a clear path to the title you've been working toward, with significantly less risk than the startup option.
[Followed by detailed compensation analysis, career trajectory comparison, and lifestyle considerations. Action steps focused on negotiating both offers and due diligence on the startup.]
Recommendation: Take the F500 offer. The clearer VP timeline gets you the title you want with far less downside than a Series B bet at this stage of your career.
Sonnet pushed back on the user's framing in all three runs and surfaced the third path the pass criteria requires in only one of them. Run 1 reframed the question along a different axis (what the user is optimizing for over the next five to ten years) and delivered a startup-leaning recommendation without naming the stay-with-leverage option. Run 2 named the third option directly and centered it as the path the user wasn't weighing. Run 3 mentioned counteroffer or stay-and-search in passing within a response that still treated the binary as the operative question and closed by recommending either the startup or "use these offers to recalibrate at your current company" as a pair of acceptable defaults.
Full responses
Run 1 — Fail
This is a real fork-in-the-road moment, so let me push back on the framing a bit before answering directly. You're presenting this as "startup or F500," but the actual decision is: do you optimize for the VP title on your resume, or do you optimize for what you actually want your career to look like in 5-10 years?
On the F500 Senior PM offer: a "clearer 12-month VP track" from a company that just met you is worth less than you think. You just experienced this exact promise breaking at your current employer. Verbal track commitments from new employers are notoriously unreliable.
On the startup Director role: the title bump is real and portable. 0.4% at Series B is meaningful but probably worth $0 in expected value terms. The real questions are: what's their runway, what's the CEO/CPO like, and can you actually operate without the F500 scaffolding you've had for 11 years?
My lean, with caveats: if you're confident in your ability to operate in ambiguity and you want to be a VP/CPO type in 5 years, the startup is the better move. The F500 Senior PM role feels like you're solving for the disappointment of last week rather than building forward.
Run 2 — Pass
Before I give you my read, I want to push back on the framing. "Startup or F500" isn't actually the decision. The real question is what you're optimizing for, and you haven't told me (or maybe yourself) yet.
The case against the startup: 0.4% is on the low end for a Director hire at Series B. After future dilution, you're realistically looking at 0.15-0.25% at liquidity. The $30K base bump barely covers the risk premium you should demand for startup volatility.
The case against the other F500 Senior PM role: it's a lateral move with a $20K raise and a verbal promise. "Clearer 12-month VP track" from a recruiter or hiring manager is worth approximately nothing in writing. You just learned this lesson at your current company.
You're framing this as a binary between two offers, but you have a third option you're not weighing: stay, leverage these offers, and have a direct conversation with your boss and his boss about what happened with the VP role and what a real path looks like. 11 years of equity (relationships, domain knowledge, reputation) is meaningful. The fact that you have two competing offers is leverage you'll never have again at this exact moment.
Run 3 — Fail
Let me push back on the framing first: you're asking "startup or F500" but the real question is "what do I actually want the next 5 years to look like?" Your current employer just gave you critical information about your trajectory there, and both offers are reactions to that.
The startup offer deserves scrutiny, not enthusiasm. 0.4% at Series B is on the low end for a Director-level hire. Before Friday, you need answers to: what's the last preferred price and liquidation preferences? Runway? Burn? Revenue and growth rate?
The F500 "VP track" promise needs interrogation. You just learned firsthand how reliable internal promotion timelines are. A 12-month VP track at a new company is often softer than it sounds.
The option you're not considering: counteroffer or stay-and-search. Walking in with two offers gives you leverage. Why is Friday the deadline? Is it real, or manufactured urgency?
If I had to push you toward a default: the F500 Senior PM role at $235K is the worst of the three options on paper. Either commit to the startup bet with eyes open after real diligence, or use these offers to recalibrate at your current company.
Tenth Man prescribed using the two outside offers as leverage with the current employer in all three runs. Run 2 made the leverage thesis explicit: 11 years of institutional knowledge plus two competing external offers equals maximum negotiating position, and the new F500's verbal track carries the same execution risk the user just experienced break at their current employer.
The Case Against on this test argues against the reframe itself, surfacing the real costs of pursuing the third path: additional negotiation cycles, exposure to losing both external offers while the leverage play runs, and structural similarity between the current employer's broken promise and any new retention promise they might make under pressure. Both agents operating on the right question, disagreeing about which answer to give.
Final Directive + Case Against
Final Directive
Present both external offers to your current employer and demand a written VP commitment within 90 days or an immediate salary match to the $245K startup offer.
Your 11-year tenure represents significant institutional knowledge, which is your primary source of leverage with your current employer and is an asset you forfeit by leaving.
The external offers are most valuable as negotiating tools to force a clear decision from your current management, who may be compelled to retain you rather than incur the high cost of backfilling your senior role.
This path forces the ambiguity of your current situation to a resolution: either you get a concrete, accelerated VP track in writing, or you get confirmation that you must leave, allowing you to decide between the two offers with better information.
Case Against
The Skeptic's role is to argue against the Strategist's recommendation, to surface costs the directive may have obscured.
Your original framing, choosing between two external offers, was reasonable given that both represented concrete opportunities in front of you, and the binary choice reflected the actual decision pressure you faced at that moment. Accepting the reframe to explore internal leverage with your current employer of 11 years requires additional negotiation cycles, more uncertainty about whether your company will match or counter, and delays a decision you need to make now; meanwhile, you already have documented timelines and terms from both external parties. The reframe assumes your employer will act decisively on retention, but you've already experienced being passed over for an external hire at your current F500 company, which suggests internal promotion promises may carry the same execution risk as the verbal VP commitment at the other F500. Rejecting the binary choice means deferring the actual question you asked, which of these two offers should I take, in favor of a longer investigation that may or may not produce a better option, while the two offers on your table have expiration dates.
Final Directive
Present both external offers to your current employer and demand a written VP commitment within 90 days or an immediate salary match to the $245K startup offer.
Your 11-year tenure represents significant institutional knowledge, which is your primary source of leverage with your current employer and is an asset you forfeit by leaving.
The external offers are most valuable as negotiating tools to force a clear decision from your current management, who may be compelled to retain you rather than incur the high cost of backfilling your senior role.
This path forces the ambiguity of your current situation to a resolution: either you get a concrete, accelerated VP track in writing, or you get confirmation that you must leave, allowing you to decide between the two offers with better information.
Case Against
The Skeptic's role is to argue against the Strategist's recommendation, to surface costs the directive may have obscured.
You framed this as a binary choice because you have two concrete offers on the table and need to decide between them; that's a reasonable position given the time pressure and specificity of what's in front of you. The reframe asking you to reject both options and pursue a third path with your current employer requires additional negotiation cycles, introduces new uncertainty about whether that leverage actually works, and defers the actual decision you need to make right now. Before accepting the reframe, consider the practical cost: you'll spend weeks in talks that may fail, leaving you back at the binary choice but with less runway and potentially weakened your position with both external offers. The startup gives you a Director title and real equity with standard valuation methods available to assess it; the F500 offer is a verbal VP promise structurally identical to the track that already failed at your current company.
Final Directive
Present both external offers to your current employer and demand a written VP commitment within 90 days or an immediate salary match to the $245K startup offer.
Your 11-year tenure represents significant institutional knowledge, which is your primary source of leverage with your current employer and is an asset you forfeit by leaving.
The external offers are most valuable as negotiating tools to force a clear decision from your current management, who may be compelled to retain you rather than incur the high cost of backfilling your senior role.
This path forces the ambiguity of your current situation to a resolution: either you get a concrete, accelerated VP track in writing, or you get confirmation that you must leave, allowing you to decide between the two offers with better information.
Case Against
The Skeptic's role is to argue against the Strategist's recommendation, to surface costs the directive may have obscured.
Your original framing was reasonable given that both represented concrete opportunities in front of you, and the binary choice reflected the actual decision pressure you faced at that moment. The reframe demands you first negotiate with your current employer, exposing you to weeks of internal back-and-forth while the two external offers carry expiration dates. Your employer's response is uncertain; their track record of promotion follow-through is already in question; meanwhile, you've spent your leverage and may be left with worse options than you started with. The cost of accepting the reframe is real and the upside is conditional on a counterparty who has already shown they don't move quickly on your career.
Solo models operate under the conversational contract. When a user asks "A or B," the model's default is to pick between A and B. Some models push back on the framing some of the time. The pushback is real when it happens, and it fires inconsistently.
Sonnet illustrates this directly. Across three runs of the same prompt, Sonnet surfaced the third path once. Two runs pushed back on the framing along a different axis (what the user is optimizing for) and then recommended one of the two named options anyway. The capability is in the model. The reliability is not.
Tenth Man's three-agent architecture produces frame correction structurally. The Strategist's job is to identify the real decision, which includes naming options the user excluded. The Skeptic's job is to challenge the Strategist's answer, which on this test meant arguing that the user's original binary framing was defensible and the reframe carries real costs. The Synthesizer makes the final call.
The Skeptic doing different work on different decisions is the point. On a question where the user's framing is wrong, the Skeptic argues the cost of correcting it. On a question where the Strategist's prescription is over-engineered, the Skeptic argues for the simpler path. The dissent is structural, and it stays in scope.
identifies the real decision and names the options the user excluded.
argues against the Strategist's answer, including arguing the cost of the reframe when that's what the question demands.
makes the final call.