Reading Time: 6 minutes

I’ve been in this conversation more times than I can count.

A TD or L&D leader pulls me aside after a webinar, or messages me, and asks the same question: which personality assessment should we be using with our leaders? DISC? Enneagram? CliftonStrengths? Hogan

I’ve stopped answering that question directly. Not because it doesn’t matter — it does — but because it’s almost never the right first question. And I want to tell you why.

Here’s the pattern I’ve watched play out for 10 years of building in this space:

The assessment runs. The workshop is actually pretty good — people have real conversations, things click that hadn’t clicked before. Managers leave thinking this is going to change how the team works.

Six weeks later, the reports are in a folder nobody opens. The 1:1s look exactly the same. Someone quietly asks whether the organization should try a different assessment next year.

It’s not the tool. It’s never the tool.

According to a DDI webinar poll, 53% of HR and L&D professionals say the top reason personality assessments fail to drive development is “lots of data but no clear next steps.” Read that again. Not “the tool was bad.” Not “people weren’t engaged.” The data existed. Nobody knew what to do with it.

There are usually two reasons for that. The first: the assessment was chosen without a clear picture of which specific leadership problem it was designed to solve. The second: even when the right tool was used, the insight had no delivery mechanism to get it from a report into the conversation that needed it. This framework addresses both.

Get the 2026 AI coaching playbook for talent development to accelerate team performance.

How to choose the right personality assessment for your leadership team

1. Match the assessment to the leadership problem you’re trying to solve

The question TD leaders most often ask me is: which assessment is best for leadership teams?

The question I wish they’d ask instead is: what specific leadership problem are we trying to solve, and which assessment was built to answer it?

Most major personality assessments are valid instruments for what they measure. DISC is not a better or worse tool than the Enneagram in any absolute sense. They were built to measure different things. When a team uses a self-awareness instrument to solve a communication friction problem — or a strengths assessment when they needed to understand how conflict surfaces — they’re not working with a bad tool. They’re working with a MISMATCH between the question they’re asking and what the instrument was designed to answer.

So flip the question. It’s not which personality test is best for leadership teams. It’s which test was built to answer the specific leadership question your organization is actually working on.

Here’s what that looks like. Not a ranking — a decision framework. Match the instrument to the goal.

Goal: build self-awareness in individual leaders

The Enneagram and 16 Types (MBTI) are designed for depth of self-understanding — how a person’s motivations, habitual patterns, and stress responses shape their leadership behavior. A manager who has never been able to explain why they shut down under pressure often finds that language in one of these profiles. Use-case boundary: these tools don’t predict how two specific people will interact, or explain observable team behavior. That’s not a flaw. That’s the edge of what they were designed to do.

Goal: improve team dynamics and day-to-day interaction

DISC is purpose-built for this. It maps observable behavioral tendencies — how someone communicates, responds to conflict, processes urgency — rather than internal psychology. A manager can use DISC to anticipate how a High D and a High C will read the same ambiguous situation differently, or calibrate feedback to someone who needs deliberate processing time vs. someone who wants the bottom line first. DISC doesn’t explain why someone behaves the way they do. It shows how. For team dynamics work, that’s often the more useful data.

Goal: identify and activate individual strengths

CliftonStrengths (StrengthsFinder) was built for strengths activation, not behavioral mapping. It identifies a person’s dominant talent themes and is designed to anchor development in what someone already does well — not what’s missing. It works well for high-potential programs, for managers who default to gap thinking, and for coaching conversations oriented toward growth. It’s less useful for diagnosing conflict patterns or communication friction — that requires behavioral-tendency data, not strengths data.

Goal: executive development and succession planning

Hogan assessments — including the Hogan Development Survey, were designed for senior leader development and executive selection. They measure performance-based personality and the derailment risks that emerge under pressure: behaviors that work at one leadership level and become liabilities at the next. For high-stakes succession work or executive coaching, Hogan-class instruments offer the right validity and depth. They’re not the right fit for a broad team rollout.

Goal: build emotional intelligence and interpersonal effectiveness

Blue EQ measures EQ dimensions directly — self-awareness, empathy, social effectiveness, emotional regulation. For leadership programs that center on relationship quality, psychological safety, or navigating difficult conversations, Blue EQ measures what the program is actually trying to move. It’s not a substitute for a behavioral instrument like DISC. It’s measuring a different dimension of the same person.

If you only take one thing from this section, take that: match the tool to the goal.

2. Have a strategy for getting the insight into the flow of work

Here’s the part I find harder to say, because I’ve watched incredible organizations run incredible assessments and still end up right back where they started.

Even perfect data fails if it has no delivery mechanism after the workshop ends.

The forgetting curve tells us why. Research on training retention consistently shows that within a week of a workshop, participants retain as little as 20% of what they learned. Without spaced practice and application in context, assessment insight follows the same curve as any other training content: vivid on the day, mostly gone within a week, and largely inaccessible three weeks later — right at the moment a manager is sitting across from someone in a difficult 1:1 and could actually use it.

Long-term retention — the kind that produces observable behavior change between talent reviews — requires that insight be retrieved and applied in context, repeatedly, over time. That’s the function of a behavioral infrastructure: a system that puts the right data in front of the right person at the moment it’s relevant. Not at the workshop. At the 1:1.

The thing that changes outcomes isn’t the quality of the report. It’s whether the insight shows up when it matters.

When a manager gets a Slack notification 10 minutes before a 1:1 — showing how the person they’re about to meet processes feedback, what communication style lands best, where conflict typically surfaces in their profile — that data functions differently than a PDF they’d have to remember to open. It’s there at the moment it can actually be used.

That’s the real job. Not generating more assessment data. Activating the data that already exists.

Most organizations don’t need a new assessment — they need to activate the ones they already have

Organizations with 1,000+ employees use an average of 20 different assessment tools. Companies with 5,000+ employees average 35. Only 9 of those are typically purchased centrally. The rest accumulate through individual coaching vendors, HR initiatives, and one-off team programs — each producing data that lives in its own portal, disconnected from everything else.

Thirty-five.

Your organization probably already owns more assessment data than you could ever generate fresh. The problem isn’t a data gap. It’s data fragmentation.

Team members have profiles in three different systems. Managers don’t know which assessment applies to which situation, or where to find the data when they need it. A team member’s DISC profile exists somewhere, but it’s not visible when their manager is preparing for a performance conversation. The Enneagram data from two years ago is in a vendor portal nobody logs into. StrengthsFinder results are in a spreadsheet that got emailed around after a team offsite.

The instinct is to consolidate — pick one assessment and standardize on it. Sometimes that’s the right call. But more often, the problem isn’t which assessment to use. It’s that the assessments you already have produce data once and then go quiet.

Assessment data isn’t the problem. Assessment abandonment is.

See How Cloverleaf’s AI Coach Works

What to ask before adding another assessment to your stack

If you’re evaluating a new platform — or trying to get more out of the tools already in your stack — I’d push two questions most vendor conversations never reach.

→ Does this integrate with the assessments we’re already using, or does it add another silo? If the answer is another silo, the fragmentation problem compounds.

→ How does insight from this assessment get activated in the workflow? A platform that produces reports is not the same as a platform that delivers coaching. The question is whether assessment data surfaces at the moment a manager can act on it — before the conversation, during a feedback draft, when staffing a project that will require someone to navigate ambiguity well.

We built Cloverleaf because we believed this. Now we have the data that proves it.

Cloverleaf integrates 13+ assessments — DISC, 16 Types, Enneagram, Insights Discovery, CliftonStrengths®, Blue EQ, and more — in a single platform. The point isn’t to give everyone 14 reports.

It’s to make the decision framework above executable: teams use the assessment that fits their leadership development goal, all the data lives in one place, and a coaching layer puts it in front of the right person at the right moment.

That coaching layer integrates valuable insight through the tools managers already use — Slack, Teams, email, calendar — so it appears before the 1:1, not after the moment has passed. Assessment data stops living in a report and starts functioning as infrastructure for leadership development: persistent, contextual, and available when it’s needed.

The coaching arrives before the problem. That’s the whole point.

Reading Time: 7 minutes

The DISC workshop goes well. The facilitator is good. People recognize themselves in the profiles, laugh at the right moments, and leave with a new vocabulary for why certain relationships have always felt like friction. There is genuine energy in the debrief.

Then the quarter moves on. The report ends up on a shared drive. And six months later, the same team dynamics are back — the same conflict patterns, the same communication breakdowns, the same people getting read as difficult.

This is not a DISC problem. It is a program design problem. Research consistently shows that the vast majority of organizations with 100 or more employees use behavioral assessments. Most of them do not see lasting change in how their teams actually operate. The gap is not between good assessments and bad ones. It is between teams that treat DISC as a data point and teams that build it into how they work.

The five differences below are not theoretical. They are the structural distinctions that separate teams where DISC created a moment of recognition from teams where it changed how they actually function.

Get the 2026 AI coaching playbook for talent development to accelerate team performance.

5 things teams understand that make DISC more effective 

1. They treat self-awareness as a team output, not an individual exercise

Teams that know DISC: everyone understands their own profile.  Teams that use DISC: everyone has a working model of each other.

Most DISC programs are designed to help individuals understand themselves better. That is a legitimate goal — and the research on self-awareness validates the stakes. Dr. Tasha Eurich’s decade of research found that while 95% of people believe they are self-aware, the actual figure is closer to 10–15%. Working alongside colleagues who lack self-awareness can cut a team’s chances of success in half, with measurable effects on stress, motivation, and retention.

But the research finding most directly relevant to program design is this: individual self-awareness compounds when it becomes shared. A team where one person understands their own operating tendencies is marginally better off. A team where everyone has a working model of how the people around them think — and a common language to name those differences in the moment — operates at a categorically different level.

A Korn Ferry study of 6,977 professionals across 486 publicly traded companies found that organizations with self-aware leaders consistently outperformed peers on financial measures. A separate simulation with 300+ leaders found high self-awareness predicted better decision-making, coordination, and conflict resolution at the team level.

The unit of change is not the individual profile. It is the shared map.

Teams that use DISC design their programs with this in mind. The goal is not for each person to know their own type. It is for the team to know each other well enough to use their differences as information rather than evidence of incompatibility.

2. They depersonalize conflict in real time, not in retrospect

Teams that know DISC: they understand style differences in theory.  Teams that use DISC: they name them in the room before the story hardens.

Here is how team conflict typically unfolds without a shared behavioral language. A high-Dominance team member sets an aggressive deadline — not to create pressure, but because forward motion is how they are wired. A high-Conscientiousness team member pushes back with detailed questions — not to obstruct, but because rigor is how they protect quality. A high-Steadiness team member absorbs the tension in silence — not because they agree, but because preserving group harmony is what their instincts prioritize.

Without shared language, all of this registers as interpersonal friction. The D reads the C as obstructionist. The C reads the D as reckless. The S gets read by both as passive. And the team develops a story about each other that has almost nothing to do with intent and everything to do with operating from different defaults — which is exactly what Carl Jung meant when he said that what we leave unconscious will direct our lives, and we will call it fate.

Teams that use DISC have a name for what is happening in that room. Not “why are you being difficult” but “you’re coming at this from a different angle — what’s the risk you’re trying to account for?” The friction does not disappear. But it depersonalizes. And depersonalized friction is something a team can actually work with.

This only happens if the shared language is present at the moment of conflict — not recalled from a workshop six months later. Which is what makes the program design question so consequential.

See How Cloverleaf’s AI Coach Works

3. They understand the lens each style sees through — not just the label it carries

Teams that know DISC: they can name the four styles.  Teams that use DISC: they can predict the question each style brings into any situation.

Most DISC training delivers the taxonomy well. People leave knowing what each letter stands for and which descriptors fit their profile. What it less often conveys is the operational framing that makes DISC usable in real time: each style is, at its core, asking a fundamentally different question whenever it enters a new situation.

A Dominance tendency asks: where are we going, and when do we get there? This is the engine of momentum. It keeps teams from over-processing decisions that need to be made and drives accountability to outcomes. Its risk is urgency that creates pressure without realizing it — an internal deadline that the rest of the team treats as a hard commitment.

An Influence tendency asks: who is involved, and are they energized? This style builds the coalition that gets work done across boundaries. It keeps teams from becoming insular and sustains the engagement that long initiatives require. Its risk is a preference for being liked that can soften necessary clarity.

A Steadiness tendency asks: how does this work, and will it hold up over time? This is the style that builds the systems and processes that make teams scalable. It creates the psychological safety that comes from consistency and reliability. Its risk is absorbing dysfunction to protect harmony rather than naming the conflict that needs to happen.

A Conscientiousness tendency asks: what exactly are we trying to accomplish, and are we doing it right? This style surfaces the assumptions everyone else skipped and holds the standard that the team will eventually be glad someone held. Its risk is that the pursuit of precision can outlast the point where speed matters more.

When a TD leader helps a team internalize this framing — not just the labels but the questions — what changes is how team members interpret each other. The C is no longer being difficult. They are asking a question the team needs answered. The I is not just creating noise. They are managing something the team would lose without them. The shared map goes from a static profile to a live operating model.

4. They use DISC to design roles and work — not just to improve communication

Teams that know DISC: they adjust how they talk to each other.  Teams that use DISC: they adjust what they ask each person to do.

The most common application of DISC in the workplace is communication coaching. Know your colleagues’ styles, adapt your message accordingly. This is useful. It is also the smallest available return on the assessment investment.

The more consequential application is role and work design: using behavioral data to understand where each person on a team is most likely to produce excellent work — and where they are structurally likely to struggle regardless of effort or intention.

A high-C team member placed permanently in an execution role against someone else’s broad-brush strategy is not a performance problem. They are a retention risk created by a role design that systematically requires them to operate outside their zone. A high-I team member given a primarily individual-contributor scope with no collaborative surface area will disengage at a rate that has nothing to do with their manager’s intentions.

Teams that use DISC ask a different set of questions when work gets assigned. Not just “who has capacity” but “whose behavioral tendencies make this assignment likely to produce the outcome we need?” Not just “who should present this?” but “who is energized by visibility and who will perform better with a supporting role?”

This does not require treating DISC as deterministic — profiles are tendencies, not ceilings. But a team that uses its behavioral data to design work around where people are most likely to thrive gets materially different outcomes from one that uses it only to soften the edges of communication.

5. They build DISC insight into the workflow — not just into the training event

Teams that know DISC: they had a great workshop.  Teams that use DISC: the insight shows up before the conversation that matters.

Cloverleaf’s DISC assessment is built on independent validity research across 48,158 users with test-retest reliability confirmed. The data is stable. The insight is accurate. The structural problem is that even accurate, stable assessment data has a shelf life when it lives in a report.

Three months after a workshop, most team members cannot recall their colleagues’ profiles with enough specificity to use them under pressure. Six months after, the shared language has faded back into informal shorthand or disappeared entirely. This is not a failure of engagement. It reflects a well-documented principle in behavior change research: insight that is not reinforced at the moment of application does not change behavior.

A manager who completes a DISC workshop in January is not reliably better at navigating a conflict in March. The January insight is simply not present in the March moment. The gap is not commitment. It is proximity.

Teams that use DISC build for this reality. They connect the assessment data to the manager’s workflow before the 1:1, before the performance review conversation, when a team is forming around a new initiative. They treat DISC not as a report that gets read once but as a live data layer that informs how people develop each other in the ordinary conditions of work.

This is the design question that most DISC programs leave unanswered: not how to deliver a better workshop, but how to keep the insight active in the moments when behavior actually gets expressed. For a look at what that activation layer looks like in practice, see how Cloverleaf connects DISC results to in-the-flow coaching for managers.

The teams that see lasting change decided the goal was behavior change, not workshop completion

The five differences above share a common root: teams that use DISC have made a design decision that teams that know DISC have not. They decided that the goal of a behavioral assessment program is behavior change — not assessment completion.

That decision changes what gets built. It changes how work gets designed. It changes what managers are equipped to do before the conversations that shape how their teams develop. And it changes what TD leaders measure to know whether the program is working.

Most organizations have the assessment. What they’re missing is the layer that keeps it alive in daily work. That is what Cloverleaf does — surfacing DISC insight before the 1:1, before the feedback conversation, before work gets assigned. Not something to engineer. Something that shows up where managers already are.

Reading Time: 8 minutes

Your coaching platform tracks logins but your CHRO is asking whether managers are ready to lead

Here’s a budget conversation that happens thousands of times a quarter. A Talent Development leader walks in with data on their coaching investment. The slide deck has engagement numbers — 78% logged in, satisfaction is 4.2 out of 5, completion rates are strong. Good metrics. Clean charts.

Then the CHRO asks the question: “Are our managers ready to lead their teams through this reorg?”

Or the CFO’s version: “Is this coaching spend building feedback capability, or are people just clicking around?”

The TD leader has no answer. Not because they don’t care — because the tool they’re using was never designed to answer that question. It was designed to track platform adoption. Logins, clicks, completions. The same metrics you’d use for any SaaS tool. But coaching isn’t software adoption. It’s behavior change. And behavior change requires entirely different instrumentation.

Coaching spend is up 17% since 2023, and 67% of L&D leaders still can’t prove it’s changing behavior

The numbers are hard to reconcile. The coaching industry generates $5.34 billion in annual revenue, up 17% since 2023, according to the ICF’s 2025 Global Coaching Study. Organizations are increasing coaching budgets. Nearly six in ten coaching clients are now employer-sponsored. The investment side of the equation is accelerating.

But the measurement side hasn’t kept up. LinkedIn’s Workplace Learning Report found that 67% of L&D leaders struggle to demonstrate training impact to their executives. Measuring coaching impact remains the single most cited challenge in the ICF’s global study, and it was the top challenge in the 2020 study too. Five years of AI coaching platforms, five years of new analytics tools, and the measurement gap hasn’t closed.

The reason is structural, not technical. Coaching platforms measure what’s easy to instrument — platform activity — and present it as if it answers what leadership is asking. But a CHRO asking “are our managers building feedback capability?” and a dashboard showing “78% of users logged in” are operating in two completely different categories of measurement.

Get the 2026 AI coaching playbook to see how organizations are implementing AI coaching at scale.

Platform analytics and development analytics answer fundamentally different questions

When a TD leader reports that 650 people logged into the coaching platform last month, they’re reporting platform analytics. That data tells you the tool is being used. It tells you nothing about what it’s being used for, whether the usage aligns with organizational priorities, or whether anyone’s behavior is changing as a result.

This is the category confusion at the heart of coaching measurement. Platform analytics — logins, session completions, feature clicks, time-on-platform — belong in the same bucket as any SaaS adoption metric. They’re useful for product teams. They’re nearly useless for a TD leader sitting across from a CHRO who wants to know whether the leadership pipeline is getting stronger.

Development analytics answer a different set of questions entirely: What are people being coached on? Is it aligned with our organizational priorities? Which teams are growing in the areas we need? Where are the gaps we didn’t know about? Are managers building the specific capabilities — feedback, communication clarity, developing others — that we’re accountable for?

One L&D leader at a 1,200-person organization captured this gap precisely. Her team had doubled platform adoption in a year, from 300 to 650 active users. Booked out four months for team sessions. By every platform metric, the program was a success. But when she needed to show impact beyond engagement, the only option was emailing her customer success manager and waiting for a custom report. Nothing connected the activity to the outcomes her leadership cared about.

This is the default state of coaching measurement across the industry. TD leaders accept it, not because they think it’s sufficient, but because no tool has given them anything in the development analytics category to work with.

See How Cloverleaf’s AI Coach Works

Most coaching ROI claims are modeled from benchmarks, not actual measurements from your organization’s coaching data

The coaching industry has converged on ROI multipliers as the answer to the measurement challenge. The numbers vary — 6x, 8x, sometimes higher — and they’re published as headline figures to reassure executives that AI coaching is worth the spend.

These multipliers are useful for industry-level confidence. They’re less useful when a TD leader needs to answer a specific question about what coaching is doing inside their specific organization. A CHRO asking whether managers are building feedback capability doesn’t need an industry average. They need a read on what’s happening on their team, this quarter.

When your CHRO asks whether managers are building feedback capability, the right answer isn’t “coaching has a 6.4x ROI based on industry benchmarks.” The most helpful data can answer: “70% of our coaching interactions in the last quarter focused on leadership growth themes — specifically communication clarity and developing others. The Southeast region’s coaching activity in feedback and recognition is 40% below the company average. And 91% of coached members report practicing the skills they’re working on.”

That second answer requires a measurement infrastructure that most coaching platforms haven’t built, one that classifies what coaching is about, not just whether it’s happening.

Four categories of coaching measurement that answer the questions your leadership actually asks

If platform analytics are the wrong instrument, what should TD leaders be tracking? Based on customer research and the measurement frameworks emerging in organizations that are getting coaching ROI conversations right, four categories move the conversation from “are people using the tool?” to “is coaching building what we need?”

1. Coaching readiness: can people actually receive personalized coaching?

Before anything else, measurement starts with infrastructure. Not “did they log in” but “are they set up to receive coaching that’s actually personalized to them?” Do they have assessments completed so coaching can be tailored to their specific communication style and working preferences? Do they have a manager connected so relationship-level insights work? Do they have an integration active so coaching reaches them in Slack or Teams — where they already work — not in a separate portal they’ll forget to check?

This is the foundation. If people aren’t set up, nothing downstream matters. And the gap between “logged in” and “fully set up for personalized coaching” is often enormous. An organization might report 78% login rates while only 45% of users have completed the assessments that make coaching specific to them. That’s not a coaching program, it’s a platform people opened once.

2. Feature adoption by depth: which coaching experiences are driving development?

The second layer asks which specific coaching capabilities people are actually engaging with, and which are underutilized. If 78% of your organization has completed assessments but only 28% have set a development-focused coaching goal, that tells you people are interested in self-awareness but haven’t translated it into active development. That’s an enablement opportunity, probably a manager communication about how to use coaching for development planning, not an engagement failure.

Depth matters more than breadth here. An organization where 200 people are receiving daily coaching tips, engaging in AI-coached scenarios before difficult conversations, and tracking progress against a development focus is getting more value than an organization where 2,000 people logged in and took one assessment. Feature adoption by depth tells you where coaching is producing real development activity versus surface-level engagement.

3. Coaching theme distribution: what people are being coached on, and whether it aligns with what your organization needs

This is the measurement layer that turns activity data into organizational intelligence. When every coaching interaction — every tip delivered, every question asked, every scenario practiced — is classified into organizational themes, you can see what your coaching investment is actually building.

Imagine seeing that 70% of coaching conversations across your organization cluster around leadership growth themes — specifically communication clarity and developing others — while 15% center on workplace climate themes like trust, belonging, and psychological safety. That’s not a login report. That’s a map of where your organization’s development energy is going. You can assess whether it aligns with strategic priorities. You can see which teams are working on feedback capability and which aren’t. You can spot that the East region’s coaching activity in workplace climate themes is 3x above baseline, a signal that would take six months to surface in an engagement survey.

One L&D leader saw her organization’s coaching data broken down by theme for the first time and immediately recognized it. The top coaching theme, communication clarity, aligned with the leadership framework her team had just started rolling out. Coaching data was confirming that people were working on the same priorities her programs were targeting. She’d never had that visibility before. But more interesting was what she didn’t expect: 34 people across three departments had independently selected “building confidence” as a coaching focus. That wasn’t part of any formal program. It was a development need hiding in plain sight.

4. Self-reported growth: are people actually practicing what they’re learning?

The final measurement layer closes the loop. Coaching activity tells you what people are working on. Self-reported growth tells you whether it’s translating into behavior change. Check-ins where members report whether they’re practicing the skills they’re being coached on, even at a simple “sometimes / often / consistently” scale, provide the growth signal that no platform metric ever will.

This isn’t a replacement for 360-degree reviews or manager observations. It’s the signal that fills the gap between formal talent reviews — which might happen once or twice a year — and the daily reality of whether coaching is producing the behavior change it’s supposed to produce. When 91% of coached members report practicing new skills at least sometimes, that’s a data point a TD leader can bring to a budget conversation with confidence.

Between engagement survey pulses, coaching data is a real-time read on organizational health

Here’s something most coaching measurement approaches don’t account for: the coaching interactions happening every day contain signal about team dynamics, leadership gaps, and organizational climate that would normally take months to surface through formal channels.

When coaching conversations about trust and psychological safety spike in a specific department, that’s not just a coaching metric. It’s an early warning. A VP of Talent at a 5,000-person organization put it directly: engagement surveys happen twice a year, and by the time results are analyzed and action-planned, another three to six months have passed. That’s up to a year between identifying a problem and doing something about it.

Coaching theme data changes the cadence of that insight. When every coaching moment is tagged to sub-themes like burnout awareness, belonging, or growth opportunities, you’re getting a daily read on what people care about most — not from a survey they fill out twice a year, but from what they’re asking about and working on every day, in the flow of their work. This doesn’t replace engagement surveys. It fills the gap between them. Your survey tells you there’s a trust problem in Q3. Your coaching data tells you trust conversations spiked in Division X in August. That’s the difference between a lagging indicator and a leading one.

The organizations that prove AI coaching ROI are building classification infrastructure, not better dashboards

The coaching measurement gap isn’t going to close with better-looking dashboards or more sophisticated engagement metrics. The gap exists because most coaching tools instrument the wrong thing. They measure platform activity and present it as development proof. The organizations that are changing this conversation are the ones building a different kind of infrastructure: systems that classify every coaching interaction into themes their leadership already uses — leadership growth, feedback capability, workplace climate, innovation culture — so that coaching data speaks the same language as the rest of their talent strategy.

This matters because the question TD leaders face isn’t getting easier. Budgets are tighter. CHROs and CFOs are asking for specifics, not multipliers. “Are managers ready to lead?” “Is feedback capability growing?” “Which teams need development support?” These are questions that platform analytics — logins, clicks, completions — are unable to answer. They require development analytics: measurement that tells you what coaching is about, not just that it’s happening.

The infrastructure you build now determines whether you’ll have a credible answer the next time your leadership asks what your coaching investment is building. The organizations that figure this out first won’t just keep their coaching budgets. They’ll expand them, because they’ll be the only ones who can show exactly where coaching is making their managers, their teams, and their leadership pipeline stronger.

Cloverleaf’s ROI Dashboard uses AI to classify every coaching interaction into organizational themes, track feature adoption by depth, and surface growth signals, giving TD leaders the coaching intelligence they need to prove ROI without waiting for a quarterly report.

Reading Time: 6 minutes

Kathy Quarles, Head of People at 80 Acres Farms, knew what she was walking into. When she brought the idea of AI coaching to her CEO, the response wasn’t enthusiastic. It wasn’t even neutral.

“The comment was, ‘You go ahead and try this. When it fails, let’s talk,'” Kathy recalls. “There was for sure an expectation that this wouldn’t be something that would latch on as a meaningful tool.”

If you lead a people function, you’ve heard some version of this. Maybe not that direct — but the skepticism is familiar. Development tools get filed under “fluffy HR programs” that won’t survive contact with the daily rhythm of business. And the skeptics have a point: most of those investments haven’t produced visible results. Not because the content was wrong, but because the evidence of impact never showed up in a way the business could see.

What changed this CEO’s mind wasn’t a dashboard or a participation report. It was that people started behaving differently — and the shift was visible enough that the business noticed without being told to look.

Get the 2026 AI coaching playbook to see how organizations are implementing AI coaching at scale.

87% of employees believe algorithms give fairer feedback than their managers, and the behavioral research explains why

Before the 80 Acres story makes sense, it helps to understand the mechanism behind it. A 2024 Gartner survey of 3,500 employees found that 87% believe algorithms could give fairer feedback than their managers. A separate 2025 study published in Behavioral Sciences went further: when negative feedback came from AI rather than a leader, employees experienced less shame and fewer withdrawal behaviors. The feedback wasn’t softer. The content was the same. But removing the interpersonal weight changed how people received it.

This matters because defensiveness is the thing that kills most feedback. A manager delivers a coaching insight. The employee’s first instinct isn’t to absorb it — it’s to evaluate the messenger. Do they have an agenda? Is this fair? Is this personal? When coaching comes from data instead of a person, that entire layer of self-protection drops. What’s left is the actual content of the feedback — and space to decide what to do with it.

Kathy saw exactly this. “People became less defensive about the feedback,” she says. “It allowed this sort of free space of non-judgment coaching which was really unique.” Employees described getting a coaching tip in the morning, sitting with it, deciding how to incorporate it into their day — and doing it without someone looking over their shoulder to see if they followed through. That’s not engagement. That’s behavior change.

See How Cloverleaf’s AI Coach Works

The behavioral shift that convinced a skeptical CEO happened in Slack threads and hallway conversations, not in a quarterly report

80 Acres Farms is a vertical farming company that had done four acquisitions in 18 months when Kathy introduced AI coaching. New teams, new personalities, new ways of working — and the pressure to integrate fast. The company had used assessments and behavioral workshops before. “Everyone was really excited about it at the time we were doing it,” Kathy says. “And then we all have to go back to our day jobs and we’re super busy. To take the time to think about how those learnings apply in the daily rhythm of things — no one has time for that.”

What she observed after introducing coaching that showed up daily — in morning emails, before meetings, in Slack — wasn’t a spike in engagement metrics. It was something harder to manufacture and harder to ignore: people started voluntarily sharing coaching insights with each other.

“We’ll share the daily emails we get and send it to someone and say, ‘Cloverleaf called me out this morning — haha,'” Kathy describes. “And just 30 minutes ago, I had an email shared by two of my team members saying, ‘Oh, this is why we balance each other out so well.’ There was constructive feedback on both ends of how people like to do their work.”

Think about what this represents. These aren’t people completing a required development activity. They’re voluntarily surfacing their own development areas — and doing it with humor, with openness, without being asked. That’s the kind of behavioral evidence that doesn’t show up in a participation dashboard, but it’s visible to anyone paying attention. Including a skeptical CEO.

A leader shared a coaching tip about a defensive employee and it proved the data right in real time

One of the most concrete moments Kathy described was a leader who received a coaching message about how one of their employees tends to receive feedback — specifically, that the employee often gets defensive. The coaching included guidance on how to approach the conversation differently based on that person’s style.

The leader agreed with the insight and shared it directly with the employee. The employee’s response? Defensiveness — exactly what the data predicted. “What’s funny is that we actually saw the feedback that was shared — we lived it then in that moment,” Kathy says.

But here’s where the story shifts. The leader pulled back, reread the coaching tips on how this specific employee best receives feedback, adapted their approach, and went back. The second conversation was productive. The employee engaged with the constructive feedback.

This is the kind of moment that separates coaching that changes behavior from coaching that generates activity. The leader didn’t just receive an insight — they applied it, watched it fail, recalibrated using assessment-grounded coaching data, and tried again. That loop — insight, application, feedback, adjustment — is behavior change happening in real time. And it didn’t require a workshop, a scheduled session, or a follow-up from HR.

The signs coaching is working look like less defensiveness, voluntary vulnerability, and leaders adapting to each person on their team

Most organizations try to prove coaching’s value with the metrics they already have: participation rates, completion percentages, satisfaction surveys. These numbers feel safe because they’re easy to collect. But they don’t answer the question a skeptical leader is actually asking, which is: is this changing how people work?

The evidence that changed the CEO’s mind at 80 Acres wasn’t a report. It was that he could see it. People talking about their development openly. Leaders adjusting how they give feedback based on who they’re talking to. Team members explaining their working styles to each other using shared language. The real ROI of coaching showed up in behavior before it showed up in any metric.

Kathy describes what this looks like from the inside: “It makes it real. We can continue to build on it — build it into performance conversations, other development and coaching opportunities. As we build careers and have people span into different levels and jobs and functions, we can take our learnings and evolve it even more.”

If you’re trying to earn trust for coaching investments, look for these signals instead of dashboard metrics: Are people less guarded when receiving feedback? Are they sharing development insights with each other without being asked? Are leaders adapting their approach based on who they’re talking to — not defaulting to one style for everyone? Are conversations about growth happening outside of formal reviews? Those are the signs that coaching is actually working. And they’re the evidence that converts skeptics — because they’re visible to the business, not buried in an HR platform.

A pilot generated the behavioral evidence to convert a CEO who expected coaching to fail

Kathy didn’t try to convince her CEO with a pitch deck. She piloted coaching with a couple of small groups first — tested whether people found it accurate, useful, and genuinely reflective of how they work. “We actually piloted it with a couple small groups to start to see if we received good feedback, if people liked it, if they were finding it useful,” she says. “It allowed us to test it, and if it didn’t work, we just wouldn’t move forward.”

This is the approach that works when leadership is skeptical: don’t argue for the investment upfront. Start small enough that the risk is contained. Let the behavioral evidence accumulate. Then let the business see it. The pilot groups at 80 Acres generated the visible shifts — less defensiveness, voluntary sharing, adapted conversations — that made expanding easy to justify. The CEO didn’t need to be convinced by data. He was convinced by watching people change.

“I feel good about it,” Kathy says now, looking back. “Most importantly for me, I want to be able to provide things and support things that have benefit to our employees and our company.” That benefit didn’t show up in a satisfaction score. It showed up in how people started treating each other — and in a CEO who went from expecting failure to seeing coaching as part of how the company operates.

Build the case for AI coaching with the playbook behind these results

The behavioral shifts Kathy describes — less defensiveness, voluntary vulnerability, leaders adapting in real time — don’t happen by accident. They happen when coaching is built into the daily rhythm of work, grounded in assessment data, and delivered without the interpersonal weight that triggers self-protection. The 2026 AI Coaching Playbook for Talent Development lays out how to build this into your organization — from pilot design to stakeholder buy-in to measuring the outcomes that actually matter.

Reading Time: 4 minutes

Most organizations will tell you coaching matters. In a recent study of 177 HR professionals by the HR Research Institute, 71% of those with leadership coaching said it’s a strategic priority. Leadership development has held the number-one spot on Gartner’s HR priorities list for three consecutive years. Budget is flowing. Executive buy-in exists. The intent is there.

And yet, when those same organizations were asked whether coaching has actually improved performance to a high degree, only 22% said yes. About one in five couldn’t even say whether it had any impact at all.

This isn’t a new category of problem. McKinsey has documented why leadership development programs fail for over a decade. MIT Sloan estimates that only about 10% of leadership development spending actually delivers results. But the HR.com data adds something those analyses don’t — it shows exactly where the system breaks down for coaching specifically, and what the organizations getting results actually do differently.

Get the 2026 AI coaching playbook to see how organizations are implementing AI coaching at scale.

Only 30% of organizations train leaders to coach, track whether it’s happening, or tie it to performance reviews

The research reveals that most organizations have the aspiration but not the scaffolding. Only 30% train leaders in how to actually coach. Only 35% link coaching to leadership performance reviews. Only 23% monitor and evaluate participation. And just 18% reward or recognize leaders for developing others.

Think about what that means in practice. An organization declares coaching a strategic priority, then asks leaders to coach without teaching them how, doesn’t connect coaching to how those leaders are evaluated, doesn’t track whether it’s happening, and doesn’t recognize the leaders who do it well. The coaching initiative becomes something leaders are expected to do on top of everything else — with no structure, no measurement, and no incentive.

When 58% of respondents say the biggest barrier to coaching is “not enough time,” that’s not a scheduling problem. It’s a prioritization signal. Leaders will make time for what the organization actually measures and rewards. When coaching isn’t connected to anything that counts, it gets crowded out — no matter how many executives say it matters in the all-hands meeting.

This is a system failure, not a motivation failure. And it explains why buying a coaching platform — no matter how good — won’t fix things if the infrastructure around it doesn’t exist. The tool can’t compensate for what the organization hasn’t built.

Four practices that separate the 22% seeing coaching results from everyone else

The HR.com research divided organizations into two groups — those reporting strong coaching results (higher performers) and those reporting weaker results (lower performers) — and compared their practices. The differences are stark, and they’re not about budget or headcount.

1. They train leaders to coach, deliberately and ongoing.

Higher-performing organizations are over three times more likely to say their leaders are well-trained in coaching (49% vs. 15%). Most organizations assume leaders know how to coach because they’re experienced managers. The data says otherwise — fewer than half of leaders are rated proficient in listening, instilling confidence, or practicing empathy, the very skills coaching requires. Helping managers give feedback that actually lands is one of the highest-leverage investments an organization can make, and most aren’t making it.

2. They measure outcomes, not just activity.

Higher performers measure leadership performance improvement at more than double the rate of lower performers (51% vs. 24%). They track career advancement trajectories (41% vs. 17%) and learning assessments (31% vs. 11%). Lower performers, meanwhile, are nearly three times more likely to say they don’t measure coaching at all (33% vs. 13%). The gap here isn’t sophistication — it’s whether anyone is asking “is this working?” in the first place. Measuring the real ROI of coaching requires tracking behavior change, not just participation.

3. They build coaching into how the organization already operates.

Higher performers are more than twice as likely to connect coaching to succession planning (39% vs. 17%) and to link it to performance reviews (46% vs. 28%). Coaching isn’t a standalone initiative — it’s woven into the systems leaders already interact with. This is the difference between coaching as a side project and coaching as organizational infrastructure.

4. They use technology because of it’s scalability

Higher-performing organizations are nearly twice as likely to use digital tools for coaching and over three times more likely to use in-session support tools (51% vs. 16%). Lower performers are three times more likely to use no technology at all (43% vs. 14%). When coaching depends entirely on one person making time in a packed calendar to have a conversation they haven’t been trained for, it doesn’t scale. Technology doesn’t replace the human element — it creates the infrastructure that makes coaching possible at scale, grounded in data about how people actually work together.

See How Cloverleaf’s AI Coach Works

The most popular way to measure coaching doesn’t predict whether it actually works

Across all organizations in the study, the most common way to measure coaching effectiveness is participant feedback (42%). That’s asking the person being coached whether they liked it — which research consistently shows has no significant relationship to whether they actually learned or changed behavior. Only 37% track leadership behavior change. Only 27% track leadership pipeline readiness. And a quarter don’t measure at all.

This creates a vicious cycle. Without meaningful measurement, coaching can’t prove its value. Without proving its value, coaching doesn’t get the organizational commitment it needs — the dedicated time, the performance review integration, the leader training. And without that commitment, coaching produces exactly the mediocre results that make it hard to justify. Twenty-two percent satisfaction on a 71% strategic priority isn’t a coaching problem. It’s a measurement and accountability problem.

Organizations seeing results with AI coaching are three times more likely to have built the systems around it first

Only 16% of organizations in the study use AI-driven development for coaching. Thirty-two percent use no technology at all. But the higher-performer data tells a different story: organizations seeing results are three times more likely to use AI to predict future development needs (46% vs. 15%) and nearly twice as likely to personalize development with AI (49% vs. 27%).

This doesn’t mean AI is the answer to the priority paradox. An AI coaching platform deployed into an organization that doesn’t train leaders, doesn’t measure outcomes, and doesn’t connect coaching to performance reviews will underperform just like everything else. But for organizations that have built the infrastructure — that train leaders, measure behavior, and treat coaching as a system — AI becomes the mechanism that makes it possible to reach every manager, not just the ones who happen to get paired with a good coach. It’s what moves coaching from a new manager’s first 90 days to their next 900.

See how your coaching program compares across nine research-backed benchmarks

This article draws on the headline findings from the 2026 Leadership Coaching and Mentoring Playbook, published by the HR Research Institute and sponsored by Cloverleaf. The full report includes detailed comparisons between higher- and lower-performing organizations across nine major findings, technology adoption data, competency breakdowns, and actionable takeaways for building the coaching infrastructure that actually produces results.

Reading Time: 5 minutes

Here’s a scenario that plays out at enterprise organizations constantly. Leadership greenlights an AI coaching initiative. The Talent Development team gets budget approval. Someone pulls together a shortlist of four or five vendors. And then, nothing moves. Mostly because the evaluation process itself becomes the project.

Weeks go into drafting an RFP from scratch. IT wants security answers in one format. Procurement wants pricing structured differently. The TD leader is trying to figure out which questions actually matter for a coaching platform versus generic SaaS. By the time the RFP goes out, the original urgency has faded and the committee is fatigued before they’ve reviewed a single vendor response.

This really simply comes down to a process problem. And with 74% of HR leaders now deploying or planning to deploy digital coaching, more enterprise teams are running this gauntlet than ever — most without a playbook built for the category.

Get the 2026 AI coaching playbook to see how organizations are implementing AI coaching at scale.

Every AI coaching vendor tells a different story, your RFP needs to account for that

Most enterprise RFPs for software follow a predictable structure: vendor background, feature checklist, security and compliance, pricing, references. That structure works well enough for categories with established evaluation criteria — project management tools, HRIS platforms, learning management systems.

AI coaching isn’t one of those categories. It’s new enough that vendors describe their products in fundamentally different ways. One platform calls itself an “AI coaching assistant.” Another positions as a “leadership development platform with AI.” A third says “digital coaching at scale.” When vendors don’t even use consistent language, a generic feature checklist produces responses that are impossible to compare.

The fundamental differences between AI coaching platforms run deeper than feature lists. They include coaching methodology (is the AI grounded in behavioral science or just generating plausible-sounding advice?), contextual awareness (does it know who someone is working with and what they’re walking into?), and delivery model (does coaching happen inside the tools people already use, or does it require yet another app to check?). A standard RFP template won’t surface any of this.

That’s why 55% to 75% of enterprise software implementations fail to meet original objectives within the first year. Many of those failures trace back to evaluation — the buying team asked the wrong questions, compared the wrong things, or optimized for features that didn’t matter once the platform was actually in use.

See How Cloverleaf’s AI Coach Works

Five areas where AI coaching vendors diverge and what to ask about each

There are certain questions that, in a generic SaaS RFP, produce nearly identical answers from every vendor. “Do you support SSO?” Yes. “Are you SOC 2 compliant?” Of course. These are table stakes — important to confirm, but not useful for differentiation.

For AI coaching, the differentiating questions are category-specific. Based on what real enterprise RFP processes have revealed, here’s where vendors actually diverge:

1. Coaching methodology and evidence base.

What behavioral science or coaching frameworks inform the AI? How is coaching content validated? Can the vendor point to peer-reviewed research or established models — not just engagement metrics? The seven capabilities that define effective AI coaching provide a useful framework for evaluating whether a platform’s methodology goes beyond surface-level advice.

2. Behavior change measurement.

Completion rates and satisfaction scores are easy to report and nearly meaningless as indicators of development impact. The real question: does the platform track observable behavior change over time? A meta-analysis in Frontiers in Psychology found that coaching has a larger effect on behavioral outcomes (decision-making, communication, leadership behavior) than on more stable personal traits — which means the platform has to be designed to capture those behavioral shifts, not just session counts.

3. Contextual delivery.

Does coaching reach people in the moment it matters? Before a difficult conversation, when onboarding a new team member, during a project staffing decision — or does it sit in a separate portal waiting to be accessed? This is where the gap between “AI coaching” and meaningful manager enablement gets wide.

4. Assessment integration depth.

Some platforms run a single proprietary assessment. Others integrate with the validated instruments organizations already use like DISC, CliftonStrengths®, Enneagram, Insights Discovery, and more. The question isn’t just “which assessments do you support?” but “how does assessment data inform the coaching the platform delivers?”

5. Scalability model.

Can the platform reach every manager, IC, director in the organization, not just senior leaders? Enterprise coaching historically served the top 5-10% of an organization. AI coaching’s promise is scaling that to everyone, but only if the platform’s pricing, architecture, and delivery model support it.

How to get your buying committee to use a shared scorecard

Enterprise buying committees for software typically include eight to ten stakeholders and require six to nine months to reach a decision. For AI coaching, those timelines balloon because each stakeholder evaluates the purchase through a different lens. HR cares about coaching quality and development outcomes. IT cares about integration architecture and data security. Procurement cares about pricing structure and contract terms. Analytics wants to know what’s measurable.

Without a standardized evaluation format, each group asks vendors for information differently, gets responses in different formats, and produces assessments that can’t be compared. The vendor who gives IT the best security answers may have given HR a vague description of coaching methodology, but nobody catches it because they’re scoring in different documents.

A structured RFP with a built-in evaluation scorecard solves this by forcing all vendor responses into the same format and giving the committee a shared scoring framework. It also separates must-have requirements from nice-to-have features upfront — so the group doesn’t spend three meetings debating whether a feature that two people care about should disqualify a vendor the rest of the committee ranked first.

If you’re earlier in the process and still vetting whether AI coaching is the right investment, that’s a different conversation. But once the decision is “yes, we’re evaluating vendors” — the speed and quality of that evaluation depends almost entirely on the structure you bring to it.

What an RFP built for AI coaching covers that a standard template won’t

The coaching platform market hit $4.2 billion in 2026 and is growing at 11% annually. The vendor landscape is expanding fast, which makes structured evaluation more important, not less. More options means more noise to filter.

An AI coaching RFP built for this category needs to cover seven areas that map to how these platforms actually differ:

  • vendor background and proof points,
  • product capabilities and coaching methodology,
  • technical architecture and integration,
  • security and compliance,
  • implementation and ongoing support,
  • commercial terms and total cost of ownership,
  • and a weighted evaluation scorecard.

Within those areas, the questions need to be specific enough that vendors can’t hide behind marketing language. Not “describe your AI capabilities” (every vendor will say they use AI). Instead:

👉 “What behavioral science frameworks inform coaching recommendations?

👉 How does the system adapt coaching based on the specific relationship between two team members?

👉 What observable outcomes do you track beyond engagement metrics?”

And critically, the evaluation scorecard needs weighted scoring, because product capabilities and coaching methodology should carry more weight than, say, vendor company background. When every category counts equally, you end up selecting the best-looking vendor deck rather than the best coaching platform.

For a deeper look at what separates the platforms themselves, Cloverleaf’s comparison of the best AI coaching platforms for managers and teams breaks down the landscape.

Stop building your AI coaching RFP from scratch

We built an AI Coaching RFP Template because we’ve been on the receiving end of enterprise RFPs, and we’ve seen which ones produce useful evaluations and which ones produce vendor marketing disguised as responses.

The template includes 225+ questions across all seven evaluation categories, pre-tagged as must-have versus nice-to-have for easy customization. It comes with a built-in weighted scorecard that calculates vendor rankings automatically, plus a quick-start guide that walks you through customization, evaluation, and red flags to watch for.

It’s designed to get you from “we need to evaluate vendors” to “RFP sent” in an afternoon, not a month.

Download the Enterprise AI Coaching RFP Template