Tolerance for Ambiguity, Locus of Control, and What These Leader Traits Actually Measure
Two HBR pieces from January and March 2026 sit on the same kind of methodological territory. Simone Stolzoff, in a piece pegged to his forthcoming book on uncertainty, argues that tolerance for uncertainty is a core leadership skill and offers three principles for building it. Merete Wedell-Wedellsborg, drawing on her clinical-psychology background, describes a quiet “psychological withdrawal” she observes in senior leaders, defines it as a pullback driven by lost illusions and disempowerment, and prescribes building “negative capability” in response.
Both pieces are well-written. Both name a leader-level psychological state and give the reader a vocabulary for noticing it. What neither piece does, and what most leadership writing of this kind does not do, is anchor the construct it names to the validated instrument that exists for measuring it. Tolerance for uncertainty has a scale. Leader agency has at least three. The instruments tell a more uncomfortable story than the advice does, and the gap between the two is where most of the actually-actionable work sits.
What follows is a short walk through the instruments and what they actually predict, with two caveats up front: most of the scales were validated on student or clinical samples rather than on executives, and the gap between scale-level evidence and senior-leader populations is itself a measurement problem the leadership-writing genre habitually papers over.

Table of Contents
- Tolerance for ambiguity, uncertainty, and the IUS-12
- Yagan’s claim is testable
- Agency erosion, Rotter’s locus of control, and Bandura’s self-efficacy
- What measuring leader traits in an executive population requires
- Why advice without instruments is hard to act on
Tolerance for ambiguity, uncertainty, and the IUS-12
The dominant scale for measuring tolerance of uncertainty is the Intolerance of Uncertainty Scale, originally a 27-item instrument developed by Freeston and colleagues in 1994 and shortened to a 12-item form by Carleton, Norton, and Asmundson in 2007. The IUS-12 is now the workhorse measure across clinical, organizational, and decision-making research. It loads on two factors that Carleton labeled prospective anxiety (worry about future uncertainty) and inhibitory anxiety (avoidance of decisions under uncertainty), and the two factors have been replicated across dozens of studies and several languages.
The first thing to notice about the IUS is its origin. It was built in the clinical anxiety literature, and the intolerance it measures is closer to a maladaptive-anxiety construct than to a managerial-skill construct. Higher IUS scores predict generalized anxiety disorder, social anxiety, depression, and obsessive-compulsive symptoms with effect sizes that have been re-estimated in multiple meta-analyses. The IUS does what it was built to do, and what it was built to do is identify clinically significant difficulty with not-knowing.
The second thing to notice is that the IUS has been imported into organizational research without much modification. Studies on managerial decision-making under ambiguity, on leader risk preferences, and on entrepreneurial cognition have used the IUS-12 to measure something that is theoretically adjacent to the clinical construct but not identical. The question of whether the same scale measures the same thing in an executive sample as in a clinical sample is a measurement-invariance question. The published evidence is mixed. Some studies report adequate invariance across populations. Others find meaningful loading differences. The honest reading is that the IUS measures something stable across contexts, but exactly what that something is shifts in non-trivial ways between a clinical population and a working executive population.
The older alternative is Budner’s 1962 Tolerance of Ambiguity scale, which was built in the social-psychology tradition for use with general adult populations. It has more directly organizational validation evidence than the IUS but worse psychometric properties at the item level. McLain’s 1993 update, the MSTAT-I and the later MSTAT-II, addressed some of the item-level issues. The MSTAT scales have been used in management research with more frequency than the IUS, partly because their construct fit is more obviously aligned with the kind of ambiguity an executive faces.
For the leadership audience Stolzoff is writing for, the right comparison is not “the IUS exists and you should use it.” It is “there are at least three serviceable scales that measure something close to the construct the article names, the scales have different validation profiles, and selecting among them depends on what you want to do with the score.” Pointing the reader at the construct without pointing at the scales is the move that leaves the prescription unactionable.
Yagan’s claim is testable
Stolzoff opens the article with a quote from OkCupid co-founder Sam Yagan: “The single biggest predictor of executive success is how you deal with ambiguity.” The quote is doing rhetorical work in the article and is not presented as an empirical claim, but it is worth taking literally for a moment because the empirical literature on the question exists.
The closest match in the published research is the body of work on managerial cognition under ambiguity, going back through Charles Schwenk’s work in the 1980s and continuing through more recent strategy-as-cognition research. The general finding is that ambiguity tolerance, as measured by Budner-derived scales or the MSTAT, predicts a non-trivial share of variance in strategic decision quality, decision speed, and willingness to commit under incomplete information. The effect sizes are moderate, comparable to what Big Five conscientiousness predicts for general job performance.
The general finding does not support Yagan’s specific claim. It is consistent with ambiguity tolerance being one of several traits that predict executive performance. It is not consistent with ambiguity tolerance being the single largest predictor. The single largest predictor in most of the published literature on managerial performance, when general intelligence is allowed in the regression, is general intelligence. When intelligence is held out, the largest single predictor across most studies is conscientiousness, not ambiguity tolerance.
This is the kind of correction the published literature lets you make once you anchor the construct to its instrument. Without the instrument anchoring, the Yagan quote stands as a striking aphorism. With it, the quote is a hypothesis that has been tested, and the test came back negative for the strong version of the claim.
I wrote about a structurally similar finding in my post on what years of professional experience predicts in psychometric scores. Two custom psychometric studies, run on different instruments in different populations, both found that experience predicted exactly two traits and none of the others. That is the kind of empirical specificity that emerges when you put the construct through an actual measurement model. The Yagan quote, treated as an empirical claim, falls into the same category.
Agency erosion, Rotter’s locus of control, and Bandura’s self-efficacy
Wedell-Wedellsborg’s article describes a state in which leaders feel that their actions no longer change outcomes, that effort and consequence have decoupled, and that the resulting psychological withdrawal manifests as oscillation between passivity and control, with bursts of symbolic action when the pressure becomes unbearable. The article is acute. It is also a description of a construct that has been measured for decades.
The closest scale-level match is Rotter’s 1966 Internal-External Locus of Control scale, which measures the degree to which a person believes outcomes are determined by their own actions versus by external forces. People high on internal locus of control tend to behave as if effort changes outcomes; people high on external locus tend to behave as if outcomes are determined by luck, fate, or powerful others. The Rotter scale has 23 items and has been used in thousands of studies; updated versions like Levenson’s 1981 multidimensional scale break the external dimension into chance and powerful-others factors.
The pattern Wedell-Wedellsborg describes maps onto a movement from internal toward external locus of control under sustained pressure. The behavioral consequences she lists, including rigidity, defensive control, and symbolic action, are consistent with the experimental literature on perceived control. Glass and Singer’s 1972 work on stress and noise tolerance, Seligman’s learned-helplessness research from the 1970s, and the more recent work on stress and decision-making in organizational psychology all converge on the finding that perceived control mediates the relationship between objective stressors and behavioral outcomes.
A second relevant scale is Bandura’s General Self-Efficacy Scale, which measures the degree to which a person believes they can effectively act on the world. Self-efficacy is theoretically distinct from locus of control (you can believe outcomes depend on your actions and still believe your actions are unlikely to be effective), and the two scales correlate moderately but not redundantly. The Pearlin Mastery Scale, developed in 1978 for stress research, is a third relevant instrument, focused specifically on the perceived ability to control significant life outcomes.
For the executive population Wedell-Wedellsborg writes about, all three scales have a relevant validation history but none of them was built specifically for senior leaders. The closest leader-population work has been done on entrepreneurial self-efficacy, where the Chen, Greene, and Crick scale from 1998 has accumulated a respectable evidence base across founder samples. For senior corporate leadership, the published norms are sparse, which is its own problem.
The diagnostic the article points at is real, and the framework of “negative capability” is a usable interpretation of it. The thing the article does not give the reader is the way to operationalize the diagnosis, which is to actually administer one of these scales and look at the score. A leader who suspects they are withdrawing has no quick way to check that suspicion against the article’s vocabulary. They could check it against the IUS, the Rotter scale, or the Pearlin Mastery in about ten minutes each.
What measuring leader traits in an executive population requires
The honest answer is that running the existing instruments on a senior-leader sample would require some methodological work that has not been done at scale. Three issues need attention.
The first is norming. Most of the validated scales above were normed on student or clinical samples. Executive populations are systematically different in ways that affect scale-level interpretation: higher in measured intelligence, more variable in tenure, exposed to a different class of stressors. Scoring an executive against a norm built on undergraduate psychology participants gives you a number whose interpretation is uncertain.
The second is response bias. Senior leaders are unusually socially-desirable in their self-report patterns (social desirability bias is the named confound), partly because the stakes of admitting to certain traits are high in their professional context. The IUS-12, the Rotter scale, and the Pearlin Mastery all have face-valid items that an executive can easily game in either direction. For research use this is a moderate problem; for high-stakes use it is a serious one. Implicit measures and behavioral indicators sit on the agenda for any serious leader-trait research, but the practical instrumentation is much heavier than a 12-item self-report.
The third is the construct-stability question. Some of the traits these scales measure are hypothesized to be stable across the lifespan; others are state-and-trait composites whose state component fluctuates with circumstance. Wedell-Wedellsborg’s “psychological withdrawal” is described as a state, not a trait, and measuring it would require either repeated administrations of a state-version scale or a different methodological approach entirely.
None of this is a reason not to measure. It is a reason to measure carefully and to interpret the resulting scores within the limits of the instrument’s validation history.
Why advice without instruments is hard to act on
The reason this kind of post is worth writing is that the gap between leadership advice and leadership measurement is where most of the actually-useful work hides. An article that tells a leader to build uncertainty tolerance gives them a direction without a unit. They cannot tell whether they have moved. They cannot tell whether their team has moved. They cannot tell whether the program they ran moved anyone.
An article paired with a scale, by contrast, gives them all of the above. Run the IUS-12 before and after the program. Run the Rotter scale on the leader and the team and look at the distribution. Take the score seriously enough to ask what threshold matters and what direction movement implies. The advice does not become true because there is a number attached to it; the advice becomes testable, which is a different thing and a much more useful one.
This is the same form of argument I have made about the Q12 and population-level engagement benchmarks, about the build-versus-buy decision for psychometric instruments, about the Buckingham critique of NPS-style metric collapse, and about three popular leadership constructs and the actual scales they refer to. The pattern across those posts is constant. Construct claims are easy to make and easy to repeat. Measurement claims, anchored to specific instruments with specific validation histories, are harder to make and easier to falsify. The harder ones are the ones worth building practice on.
For the two HBR pieces this post engaged with, the practical translation is short. Tolerance for uncertainty is operationalized in the IUS-12, Budner’s tolerance-of-ambiguity scale, and McLain’s MSTAT scales. Leader agency is operationalized in Rotter’s locus-of-control scale, Bandura’s general self-efficacy scale, and the Pearlin Mastery Scale. Pick the scale that fits the construct you actually care about, administer it carefully, interpret the score within the limits of the validation evidence, and treat the resulting number as a starting point for further investigation rather than as a verdict on the leader. That is what leader-trait research looks like when it is being done, and it is also the part of the work the leadership-writing genre tends to leave out.
Related reading
Discriminant Validity in Practice: How to Tell a New Construct From a Renamed One
Trust ambiguity, grit, and the test that separates a real construct from a relabel. Convergent and discriminant validity in practice, with a worked example.
Three Popular Leadership Constructs Through the Lens of Their Actual Scales
Wise empathy SJT, Edmondson's psychological safety scale in human-AI teams, and the contested ALQ. What three leadership constructs actually measure.
What Net Promoter Score Loses by Collapsing the Top of the Rating Scale
Marcus Buckingham called Net Promoter Score problematic in HBR. The reason is a rating scale measurement argument worth unpacking. What NPS loses and what to do.
Custom or Off-the-Shelf Psychometric Instrument?
When does a custom psychometric instrument earn its multi-week build cycle, and when is off-the-shelf personality assessment for hiring enough?