ai coaching Archives

Reading Time: 7 minutes

The DISC workshop goes well. The facilitator is good. People recognize themselves in the profiles, laugh at the right moments, and leave with a new vocabulary for why certain relationships have always felt like friction. There is genuine energy in the debrief.

Then the quarter moves on. The report ends up on a shared drive. And six months later, the same team dynamics are back — the same conflict patterns, the same communication breakdowns, the same people getting read as difficult.

This is not a DISC problem. It is a program design problem. Research consistently shows that the vast majority of organizations with 100 or more employees use behavioral assessments. Most of them do not see lasting change in how their teams actually operate. The gap is not between good assessments and bad ones. It is between teams that treat DISC as a data point and teams that build it into how they work.

The five differences below are not theoretical. They are the structural distinctions that separate teams where DISC created a moment of recognition from teams where it changed how they actually function.

Get the 2026 AI coaching playbook for talent development to accelerate team performance.

5 things teams understand that make DISC more effective

1. They treat self-awareness as a team output, not an individual exercise

Teams that know DISC: everyone understands their own profile. Teams that use DISC: everyone has a working model of each other.

Most DISC programs are designed to help individuals understand themselves better. That is a legitimate goal — and the research on self-awareness validates the stakes. Dr. Tasha Eurich’s decade of research found that while 95% of people believe they are self-aware, the actual figure is closer to 10–15%. Working alongside colleagues who lack self-awareness can cut a team’s chances of success in half, with measurable effects on stress, motivation, and retention.

But the research finding most directly relevant to program design is this: individual self-awareness compounds when it becomes shared. A team where one person understands their own operating tendencies is marginally better off. A team where everyone has a working model of how the people around them think — and a common language to name those differences in the moment — operates at a categorically different level.

A Korn Ferry study of 6,977 professionals across 486 publicly traded companies found that organizations with self-aware leaders consistently outperformed peers on financial measures. A separate simulation with 300+ leaders found high self-awareness predicted better decision-making, coordination, and conflict resolution at the team level.

The unit of change is not the individual profile. It is the shared map.

Teams that use DISC design their programs with this in mind. The goal is not for each person to know their own type. It is for the team to know each other well enough to use their differences as information rather than evidence of incompatibility.

2. They depersonalize conflict in real time, not in retrospect

Teams that know DISC: they understand style differences in theory. Teams that use DISC: they name them in the room before the story hardens.

Here is how team conflict typically unfolds without a shared behavioral language. A high-Dominance team member sets an aggressive deadline — not to create pressure, but because forward motion is how they are wired. A high-Conscientiousness team member pushes back with detailed questions — not to obstruct, but because rigor is how they protect quality. A high-Steadiness team member absorbs the tension in silence — not because they agree, but because preserving group harmony is what their instincts prioritize.

Without shared language, all of this registers as interpersonal friction. The D reads the C as obstructionist. The C reads the D as reckless. The S gets read by both as passive. And the team develops a story about each other that has almost nothing to do with intent and everything to do with operating from different defaults — which is exactly what Carl Jung meant when he said that what we leave unconscious will direct our lives, and we will call it fate.

Teams that use DISC have a name for what is happening in that room. Not “why are you being difficult” but “you’re coming at this from a different angle — what’s the risk you’re trying to account for?” The friction does not disappear. But it depersonalizes. And depersonalized friction is something a team can actually work with.

This only happens if the shared language is present at the moment of conflict — not recalled from a workshop six months later. Which is what makes the program design question so consequential.

See How Cloverleaf’s AI Coach Works

3. They understand the lens each style sees through — not just the label it carries

Teams that know DISC: they can name the four styles. Teams that use DISC: they can predict the question each style brings into any situation.

Most DISC training delivers the taxonomy well. People leave knowing what each letter stands for and which descriptors fit their profile. What it less often conveys is the operational framing that makes DISC usable in real time: each style is, at its core, asking a fundamentally different question whenever it enters a new situation.

A Dominance tendency asks: where are we going, and when do we get there? This is the engine of momentum. It keeps teams from over-processing decisions that need to be made and drives accountability to outcomes. Its risk is urgency that creates pressure without realizing it — an internal deadline that the rest of the team treats as a hard commitment.

An Influence tendency asks: who is involved, and are they energized? This style builds the coalition that gets work done across boundaries. It keeps teams from becoming insular and sustains the engagement that long initiatives require. Its risk is a preference for being liked that can soften necessary clarity.

A Steadiness tendency asks: how does this work, and will it hold up over time? This is the style that builds the systems and processes that make teams scalable. It creates the psychological safety that comes from consistency and reliability. Its risk is absorbing dysfunction to protect harmony rather than naming the conflict that needs to happen.

A Conscientiousness tendency asks: what exactly are we trying to accomplish, and are we doing it right? This style surfaces the assumptions everyone else skipped and holds the standard that the team will eventually be glad someone held. Its risk is that the pursuit of precision can outlast the point where speed matters more.

When a TD leader helps a team internalize this framing — not just the labels but the questions — what changes is how team members interpret each other. The C is no longer being difficult. They are asking a question the team needs answered. The I is not just creating noise. They are managing something the team would lose without them. The shared map goes from a static profile to a live operating model.

4. They use DISC to design roles and work — not just to improve communication

Teams that know DISC: they adjust how they talk to each other. Teams that use DISC: they adjust what they ask each person to do.

The most common application of DISC in the workplace is communication coaching. Know your colleagues’ styles, adapt your message accordingly. This is useful. It is also the smallest available return on the assessment investment.

The more consequential application is role and work design: using behavioral data to understand where each person on a team is most likely to produce excellent work — and where they are structurally likely to struggle regardless of effort or intention.

A high-C team member placed permanently in an execution role against someone else’s broad-brush strategy is not a performance problem. They are a retention risk created by a role design that systematically requires them to operate outside their zone. A high-I team member given a primarily individual-contributor scope with no collaborative surface area will disengage at a rate that has nothing to do with their manager’s intentions.

Teams that use DISC ask a different set of questions when work gets assigned. Not just “who has capacity” but “whose behavioral tendencies make this assignment likely to produce the outcome we need?” Not just “who should present this?” but “who is energized by visibility and who will perform better with a supporting role?”

This does not require treating DISC as deterministic — profiles are tendencies, not ceilings. But a team that uses its behavioral data to design work around where people are most likely to thrive gets materially different outcomes from one that uses it only to soften the edges of communication.

5. They build DISC insight into the workflow — not just into the training event

Teams that know DISC: they had a great workshop. Teams that use DISC: the insight shows up before the conversation that matters.

Cloverleaf’s DISC assessment is built on independent validity research across 48,158 users with test-retest reliability confirmed. The data is stable. The insight is accurate. The structural problem is that even accurate, stable assessment data has a shelf life when it lives in a report.

Three months after a workshop, most team members cannot recall their colleagues’ profiles with enough specificity to use them under pressure. Six months after, the shared language has faded back into informal shorthand or disappeared entirely. This is not a failure of engagement. It reflects a well-documented principle in behavior change research: insight that is not reinforced at the moment of application does not change behavior.

A manager who completes a DISC workshop in January is not reliably better at navigating a conflict in March. The January insight is simply not present in the March moment. The gap is not commitment. It is proximity.

Teams that use DISC build for this reality. They connect the assessment data to the manager’s workflow before the 1:1, before the performance review conversation, when a team is forming around a new initiative. They treat DISC not as a report that gets read once but as a live data layer that informs how people develop each other in the ordinary conditions of work.

This is the design question that most DISC programs leave unanswered: not how to deliver a better workshop, but how to keep the insight active in the moments when behavior actually gets expressed. For a look at what that activation layer looks like in practice, see how Cloverleaf connects DISC results to in-the-flow coaching for managers.

The teams that see lasting change decided the goal was behavior change, not workshop completion

The five differences above share a common root: teams that use DISC have made a design decision that teams that know DISC have not. They decided that the goal of a behavioral assessment program is behavior change — not assessment completion.

That decision changes what gets built. It changes how work gets designed. It changes what managers are equipped to do before the conversations that shape how their teams develop. And it changes what TD leaders measure to know whether the program is working.

Most organizations have the assessment. What they’re missing is the layer that keeps it alive in daily work. That is what Cloverleaf does — surfacing DISC insight before the 1:1, before the feedback conversation, before work gets assigned. Not something to engineer. Something that shows up where managers already are.

Reading Time: 8 minutes

Your coaching platform tracks logins but your CHRO is asking whether managers are ready to lead

Here’s a budget conversation that happens thousands of times a quarter. A Talent Development leader walks in with data on their coaching investment. The slide deck has engagement numbers — 78% logged in, satisfaction is 4.2 out of 5, completion rates are strong. Good metrics. Clean charts.

Then the CHRO asks the question: “Are our managers ready to lead their teams through this reorg?”

Or the CFO’s version: “Is this coaching spend building feedback capability, or are people just clicking around?”

The TD leader has no answer. Not because they don’t care — because the tool they’re using was never designed to answer that question. It was designed to track platform adoption. Logins, clicks, completions. The same metrics you’d use for any SaaS tool. But coaching isn’t software adoption. It’s behavior change. And behavior change requires entirely different instrumentation.

Coaching spend is up 17% since 2023, and 67% of L&D leaders still can’t prove it’s changing behavior

The numbers are hard to reconcile. The coaching industry generates $5.34 billion in annual revenue, up 17% since 2023, according to the ICF’s 2025 Global Coaching Study. Organizations are increasing coaching budgets. Nearly six in ten coaching clients are now employer-sponsored. The investment side of the equation is accelerating.

But the measurement side hasn’t kept up. LinkedIn’s Workplace Learning Report found that 67% of L&D leaders struggle to demonstrate training impact to their executives. Measuring coaching impact remains the single most cited challenge in the ICF’s global study, and it was the top challenge in the 2020 study too. Five years of AI coaching platforms, five years of new analytics tools, and the measurement gap hasn’t closed.

The reason is structural, not technical. Coaching platforms measure what’s easy to instrument — platform activity — and present it as if it answers what leadership is asking. But a CHRO asking “are our managers building feedback capability?” and a dashboard showing “78% of users logged in” are operating in two completely different categories of measurement.

Get the 2026 AI coaching playbook to see how organizations are implementing AI coaching at scale.

Platform analytics and development analytics answer fundamentally different questions

When a TD leader reports that 650 people logged into the coaching platform last month, they’re reporting platform analytics. That data tells you the tool is being used. It tells you nothing about what it’s being used for, whether the usage aligns with organizational priorities, or whether anyone’s behavior is changing as a result.

This is the category confusion at the heart of coaching measurement. Platform analytics — logins, session completions, feature clicks, time-on-platform — belong in the same bucket as any SaaS adoption metric. They’re useful for product teams. They’re nearly useless for a TD leader sitting across from a CHRO who wants to know whether the leadership pipeline is getting stronger.

Development analytics answer a different set of questions entirely: What are people being coached on? Is it aligned with our organizational priorities? Which teams are growing in the areas we need? Where are the gaps we didn’t know about? Are managers building the specific capabilities — feedback, communication clarity, developing others — that we’re accountable for?

One L&D leader at a 1,200-person organization captured this gap precisely. Her team had doubled platform adoption in a year, from 300 to 650 active users. Booked out four months for team sessions. By every platform metric, the program was a success. But when she needed to show impact beyond engagement, the only option was emailing her customer success manager and waiting for a custom report. Nothing connected the activity to the outcomes her leadership cared about.

This is the default state of coaching measurement across the industry. TD leaders accept it, not because they think it’s sufficient, but because no tool has given them anything in the development analytics category to work with.

See How Cloverleaf’s AI Coach Works

Most coaching ROI claims are modeled from benchmarks, not actual measurements from your organization’s coaching data

The coaching industry has converged on ROI multipliers as the answer to the measurement challenge. The numbers vary — 6x, 8x, sometimes higher — and they’re published as headline figures to reassure executives that AI coaching is worth the spend.

These multipliers are useful for industry-level confidence. They’re less useful when a TD leader needs to answer a specific question about what coaching is doing inside their specific organization. A CHRO asking whether managers are building feedback capability doesn’t need an industry average. They need a read on what’s happening on their team, this quarter.

When your CHRO asks whether managers are building feedback capability, the right answer isn’t “coaching has a 6.4x ROI based on industry benchmarks.” The most helpful data can answer: “70% of our coaching interactions in the last quarter focused on leadership growth themes — specifically communication clarity and developing others. The Southeast region’s coaching activity in feedback and recognition is 40% below the company average. And 91% of coached members report practicing the skills they’re working on.”

That second answer requires a measurement infrastructure that most coaching platforms haven’t built, one that classifies what coaching is about, not just whether it’s happening.

Four categories of coaching measurement that answer the questions your leadership actually asks

If platform analytics are the wrong instrument, what should TD leaders be tracking? Based on customer research and the measurement frameworks emerging in organizations that are getting coaching ROI conversations right, four categories move the conversation from “are people using the tool?” to “is coaching building what we need?”

1. Coaching readiness: can people actually receive personalized coaching?

Before anything else, measurement starts with infrastructure. Not “did they log in” but “are they set up to receive coaching that’s actually personalized to them?” Do they have assessments completed so coaching can be tailored to their specific communication style and working preferences? Do they have a manager connected so relationship-level insights work? Do they have an integration active so coaching reaches them in Slack or Teams — where they already work — not in a separate portal they’ll forget to check?

This is the foundation. If people aren’t set up, nothing downstream matters. And the gap between “logged in” and “fully set up for personalized coaching” is often enormous. An organization might report 78% login rates while only 45% of users have completed the assessments that make coaching specific to them. That’s not a coaching program, it’s a platform people opened once.

2. Feature adoption by depth: which coaching experiences are driving development?

The second layer asks which specific coaching capabilities people are actually engaging with, and which are underutilized. If 78% of your organization has completed assessments but only 28% have set a development-focused coaching goal, that tells you people are interested in self-awareness but haven’t translated it into active development. That’s an enablement opportunity, probably a manager communication about how to use coaching for development planning, not an engagement failure.

Depth matters more than breadth here. An organization where 200 people are receiving daily coaching tips, engaging in AI-coached scenarios before difficult conversations, and tracking progress against a development focus is getting more value than an organization where 2,000 people logged in and took one assessment. Feature adoption by depth tells you where coaching is producing real development activity versus surface-level engagement.

3. Coaching theme distribution: what people are being coached on, and whether it aligns with what your organization needs

This is the measurement layer that turns activity data into organizational intelligence. When every coaching interaction — every tip delivered, every question asked, every scenario practiced — is classified into organizational themes, you can see what your coaching investment is actually building.

Imagine seeing that 70% of coaching conversations across your organization cluster around leadership growth themes — specifically communication clarity and developing others — while 15% center on workplace climate themes like trust, belonging, and psychological safety. That’s not a login report. That’s a map of where your organization’s development energy is going. You can assess whether it aligns with strategic priorities. You can see which teams are working on feedback capability and which aren’t. You can spot that the East region’s coaching activity in workplace climate themes is 3x above baseline, a signal that would take six months to surface in an engagement survey.

One L&D leader saw her organization’s coaching data broken down by theme for the first time and immediately recognized it. The top coaching theme, communication clarity, aligned with the leadership framework her team had just started rolling out. Coaching data was confirming that people were working on the same priorities her programs were targeting. She’d never had that visibility before. But more interesting was what she didn’t expect: 34 people across three departments had independently selected “building confidence” as a coaching focus. That wasn’t part of any formal program. It was a development need hiding in plain sight.

4. Self-reported growth: are people actually practicing what they’re learning?

The final measurement layer closes the loop. Coaching activity tells you what people are working on. Self-reported growth tells you whether it’s translating into behavior change. Check-ins where members report whether they’re practicing the skills they’re being coached on, even at a simple “sometimes / often / consistently” scale, provide the growth signal that no platform metric ever will.

This isn’t a replacement for 360-degree reviews or manager observations. It’s the signal that fills the gap between formal talent reviews — which might happen once or twice a year — and the daily reality of whether coaching is producing the behavior change it’s supposed to produce. When 91% of coached members report practicing new skills at least sometimes, that’s a data point a TD leader can bring to a budget conversation with confidence.

Between engagement survey pulses, coaching data is a real-time read on organizational health

Here’s something most coaching measurement approaches don’t account for: the coaching interactions happening every day contain signal about team dynamics, leadership gaps, and organizational climate that would normally take months to surface through formal channels.

When coaching conversations about trust and psychological safety spike in a specific department, that’s not just a coaching metric. It’s an early warning. A VP of Talent at a 5,000-person organization put it directly: engagement surveys happen twice a year, and by the time results are analyzed and action-planned, another three to six months have passed. That’s up to a year between identifying a problem and doing something about it.

Coaching theme data changes the cadence of that insight. When every coaching moment is tagged to sub-themes like burnout awareness, belonging, or growth opportunities, you’re getting a daily read on what people care about most — not from a survey they fill out twice a year, but from what they’re asking about and working on every day, in the flow of their work. This doesn’t replace engagement surveys. It fills the gap between them. Your survey tells you there’s a trust problem in Q3. Your coaching data tells you trust conversations spiked in Division X in August. That’s the difference between a lagging indicator and a leading one.

The organizations that prove AI coaching ROI are building classification infrastructure, not better dashboards

The coaching measurement gap isn’t going to close with better-looking dashboards or more sophisticated engagement metrics. The gap exists because most coaching tools instrument the wrong thing. They measure platform activity and present it as development proof. The organizations that are changing this conversation are the ones building a different kind of infrastructure: systems that classify every coaching interaction into themes their leadership already uses — leadership growth, feedback capability, workplace climate, innovation culture — so that coaching data speaks the same language as the rest of their talent strategy.

This matters because the question TD leaders face isn’t getting easier. Budgets are tighter. CHROs and CFOs are asking for specifics, not multipliers. “Are managers ready to lead?” “Is feedback capability growing?” “Which teams need development support?” These are questions that platform analytics — logins, clicks, completions — are unable to answer. They require development analytics: measurement that tells you what coaching is about, not just that it’s happening.

The infrastructure you build now determines whether you’ll have a credible answer the next time your leadership asks what your coaching investment is building. The organizations that figure this out first won’t just keep their coaching budgets. They’ll expand them, because they’ll be the only ones who can show exactly where coaching is making their managers, their teams, and their leadership pipeline stronger.

Cloverleaf’s ROI Dashboard uses AI to classify every coaching interaction into organizational themes, track feature adoption by depth, and surface growth signals, giving TD leaders the coaching intelligence they need to prove ROI without waiting for a quarterly report.

Reading Time: 6 minutes

Kathy Quarles, Head of People at 80 Acres Farms, knew what she was walking into. When she brought the idea of AI coaching to her CEO, the response wasn’t enthusiastic. It wasn’t even neutral.

“The comment was, ‘You go ahead and try this. When it fails, let’s talk,'” Kathy recalls. “There was for sure an expectation that this wouldn’t be something that would latch on as a meaningful tool.”

If you lead a people function, you’ve heard some version of this. Maybe not that direct — but the skepticism is familiar. Development tools get filed under “fluffy HR programs” that won’t survive contact with the daily rhythm of business. And the skeptics have a point: most of those investments haven’t produced visible results. Not because the content was wrong, but because the evidence of impact never showed up in a way the business could see.

What changed this CEO’s mind wasn’t a dashboard or a participation report. It was that people started behaving differently — and the shift was visible enough that the business noticed without being told to look.

Get the 2026 AI coaching playbook to see how organizations are implementing AI coaching at scale.

87% of employees believe algorithms give fairer feedback than their managers, and the behavioral research explains why

Before the 80 Acres story makes sense, it helps to understand the mechanism behind it. A 2024 Gartner survey of 3,500 employees found that 87% believe algorithms could give fairer feedback than their managers. A separate 2025 study published in Behavioral Sciences went further: when negative feedback came from AI rather than a leader, employees experienced less shame and fewer withdrawal behaviors. The feedback wasn’t softer. The content was the same. But removing the interpersonal weight changed how people received it.

This matters because defensiveness is the thing that kills most feedback. A manager delivers a coaching insight. The employee’s first instinct isn’t to absorb it — it’s to evaluate the messenger. Do they have an agenda? Is this fair? Is this personal? When coaching comes from data instead of a person, that entire layer of self-protection drops. What’s left is the actual content of the feedback — and space to decide what to do with it.

Kathy saw exactly this. “People became less defensive about the feedback,” she says. “It allowed this sort of free space of non-judgment coaching which was really unique.” Employees described getting a coaching tip in the morning, sitting with it, deciding how to incorporate it into their day — and doing it without someone looking over their shoulder to see if they followed through. That’s not engagement. That’s behavior change.

See How Cloverleaf’s AI Coach Works

The behavioral shift that convinced a skeptical CEO happened in Slack threads and hallway conversations, not in a quarterly report

80 Acres Farms is a vertical farming company that had done four acquisitions in 18 months when Kathy introduced AI coaching. New teams, new personalities, new ways of working — and the pressure to integrate fast. The company had used assessments and behavioral workshops before. “Everyone was really excited about it at the time we were doing it,” Kathy says. “And then we all have to go back to our day jobs and we’re super busy. To take the time to think about how those learnings apply in the daily rhythm of things — no one has time for that.”

What she observed after introducing coaching that showed up daily — in morning emails, before meetings, in Slack — wasn’t a spike in engagement metrics. It was something harder to manufacture and harder to ignore: people started voluntarily sharing coaching insights with each other.

“We’ll share the daily emails we get and send it to someone and say, ‘Cloverleaf called me out this morning — haha,'” Kathy describes. “And just 30 minutes ago, I had an email shared by two of my team members saying, ‘Oh, this is why we balance each other out so well.’ There was constructive feedback on both ends of how people like to do their work.”

Think about what this represents. These aren’t people completing a required development activity. They’re voluntarily surfacing their own development areas — and doing it with humor, with openness, without being asked. That’s the kind of behavioral evidence that doesn’t show up in a participation dashboard, but it’s visible to anyone paying attention. Including a skeptical CEO.

A leader shared a coaching tip about a defensive employee and it proved the data right in real time

One of the most concrete moments Kathy described was a leader who received a coaching message about how one of their employees tends to receive feedback — specifically, that the employee often gets defensive. The coaching included guidance on how to approach the conversation differently based on that person’s style.

The leader agreed with the insight and shared it directly with the employee. The employee’s response? Defensiveness — exactly what the data predicted. “What’s funny is that we actually saw the feedback that was shared — we lived it then in that moment,” Kathy says.

But here’s where the story shifts. The leader pulled back, reread the coaching tips on how this specific employee best receives feedback, adapted their approach, and went back. The second conversation was productive. The employee engaged with the constructive feedback.

This is the kind of moment that separates coaching that changes behavior from coaching that generates activity. The leader didn’t just receive an insight — they applied it, watched it fail, recalibrated using assessment-grounded coaching data, and tried again. That loop — insight, application, feedback, adjustment — is behavior change happening in real time. And it didn’t require a workshop, a scheduled session, or a follow-up from HR.

The signs coaching is working look like less defensiveness, voluntary vulnerability, and leaders adapting to each person on their team

Most organizations try to prove coaching’s value with the metrics they already have: participation rates, completion percentages, satisfaction surveys. These numbers feel safe because they’re easy to collect. But they don’t answer the question a skeptical leader is actually asking, which is: is this changing how people work?

The evidence that changed the CEO’s mind at 80 Acres wasn’t a report. It was that he could see it. People talking about their development openly. Leaders adjusting how they give feedback based on who they’re talking to. Team members explaining their working styles to each other using shared language. The real ROI of coaching showed up in behavior before it showed up in any metric.

Kathy describes what this looks like from the inside: “It makes it real. We can continue to build on it — build it into performance conversations, other development and coaching opportunities. As we build careers and have people span into different levels and jobs and functions, we can take our learnings and evolve it even more.”

If you’re trying to earn trust for coaching investments, look for these signals instead of dashboard metrics: Are people less guarded when receiving feedback? Are they sharing development insights with each other without being asked? Are leaders adapting their approach based on who they’re talking to — not defaulting to one style for everyone? Are conversations about growth happening outside of formal reviews? Those are the signs that coaching is actually working. And they’re the evidence that converts skeptics — because they’re visible to the business, not buried in an HR platform.

A pilot generated the behavioral evidence to convert a CEO who expected coaching to fail

Kathy didn’t try to convince her CEO with a pitch deck. She piloted coaching with a couple of small groups first — tested whether people found it accurate, useful, and genuinely reflective of how they work. “We actually piloted it with a couple small groups to start to see if we received good feedback, if people liked it, if they were finding it useful,” she says. “It allowed us to test it, and if it didn’t work, we just wouldn’t move forward.”

This is the approach that works when leadership is skeptical: don’t argue for the investment upfront. Start small enough that the risk is contained. Let the behavioral evidence accumulate. Then let the business see it. The pilot groups at 80 Acres generated the visible shifts — less defensiveness, voluntary sharing, adapted conversations — that made expanding easy to justify. The CEO didn’t need to be convinced by data. He was convinced by watching people change.

“I feel good about it,” Kathy says now, looking back. “Most importantly for me, I want to be able to provide things and support things that have benefit to our employees and our company.” That benefit didn’t show up in a satisfaction score. It showed up in how people started treating each other — and in a CEO who went from expecting failure to seeing coaching as part of how the company operates.

Build the case for AI coaching with the playbook behind these results

The behavioral shifts Kathy describes — less defensiveness, voluntary vulnerability, leaders adapting in real time — don’t happen by accident. They happen when coaching is built into the daily rhythm of work, grounded in assessment data, and delivered without the interpersonal weight that triggers self-protection. The 2026 AI Coaching Playbook for Talent Development lays out how to build this into your organization — from pilot design to stakeholder buy-in to measuring the outcomes that actually matter.

Reading Time: 4 minutes

Most organizations will tell you coaching matters. In a recent study of 177 HR professionals by the HR Research Institute, 71% of those with leadership coaching said it’s a strategic priority. Leadership development has held the number-one spot on Gartner’s HR priorities list for three consecutive years. Budget is flowing. Executive buy-in exists. The intent is there.

And yet, when those same organizations were asked whether coaching has actually improved performance to a high degree, only 22% said yes. About one in five couldn’t even say whether it had any impact at all.

This isn’t a new category of problem. McKinsey has documented why leadership development programs fail for over a decade. MIT Sloan estimates that only about 10% of leadership development spending actually delivers results. But the HR.com data adds something those analyses don’t — it shows exactly where the system breaks down for coaching specifically, and what the organizations getting results actually do differently.

Get the 2026 AI coaching playbook to see how organizations are implementing AI coaching at scale.

Only 30% of organizations train leaders to coach, track whether it’s happening, or tie it to performance reviews

The research reveals that most organizations have the aspiration but not the scaffolding. Only 30% train leaders in how to actually coach. Only 35% link coaching to leadership performance reviews. Only 23% monitor and evaluate participation. And just 18% reward or recognize leaders for developing others.

Think about what that means in practice. An organization declares coaching a strategic priority, then asks leaders to coach without teaching them how, doesn’t connect coaching to how those leaders are evaluated, doesn’t track whether it’s happening, and doesn’t recognize the leaders who do it well. The coaching initiative becomes something leaders are expected to do on top of everything else — with no structure, no measurement, and no incentive.

When 58% of respondents say the biggest barrier to coaching is “not enough time,” that’s not a scheduling problem. It’s a prioritization signal. Leaders will make time for what the organization actually measures and rewards. When coaching isn’t connected to anything that counts, it gets crowded out — no matter how many executives say it matters in the all-hands meeting.

This is a system failure, not a motivation failure. And it explains why buying a coaching platform — no matter how good — won’t fix things if the infrastructure around it doesn’t exist. The tool can’t compensate for what the organization hasn’t built.

Four practices that separate the 22% seeing coaching results from everyone else

The HR.com research divided organizations into two groups — those reporting strong coaching results (higher performers) and those reporting weaker results (lower performers) — and compared their practices. The differences are stark, and they’re not about budget or headcount.

1. They train leaders to coach, deliberately and ongoing.

Higher-performing organizations are over three times more likely to say their leaders are well-trained in coaching (49% vs. 15%). Most organizations assume leaders know how to coach because they’re experienced managers. The data says otherwise — fewer than half of leaders are rated proficient in listening, instilling confidence, or practicing empathy, the very skills coaching requires. Helping managers give feedback that actually lands is one of the highest-leverage investments an organization can make, and most aren’t making it.

2. They measure outcomes, not just activity.

Higher performers measure leadership performance improvement at more than double the rate of lower performers (51% vs. 24%). They track career advancement trajectories (41% vs. 17%) and learning assessments (31% vs. 11%). Lower performers, meanwhile, are nearly three times more likely to say they don’t measure coaching at all (33% vs. 13%). The gap here isn’t sophistication — it’s whether anyone is asking “is this working?” in the first place. Measuring the real ROI of coaching requires tracking behavior change, not just participation.

3. They build coaching into how the organization already operates.

Higher performers are more than twice as likely to connect coaching to succession planning (39% vs. 17%) and to link it to performance reviews (46% vs. 28%). Coaching isn’t a standalone initiative — it’s woven into the systems leaders already interact with. This is the difference between coaching as a side project and coaching as organizational infrastructure.

4. They use technology because of it’s scalability

Higher-performing organizations are nearly twice as likely to use digital tools for coaching and over three times more likely to use in-session support tools (51% vs. 16%). Lower performers are three times more likely to use no technology at all (43% vs. 14%). When coaching depends entirely on one person making time in a packed calendar to have a conversation they haven’t been trained for, it doesn’t scale. Technology doesn’t replace the human element — it creates the infrastructure that makes coaching possible at scale, grounded in data about how people actually work together.

See How Cloverleaf’s AI Coach Works

The most popular way to measure coaching doesn’t predict whether it actually works

Across all organizations in the study, the most common way to measure coaching effectiveness is participant feedback (42%). That’s asking the person being coached whether they liked it — which research consistently shows has no significant relationship to whether they actually learned or changed behavior. Only 37% track leadership behavior change. Only 27% track leadership pipeline readiness. And a quarter don’t measure at all.

This creates a vicious cycle. Without meaningful measurement, coaching can’t prove its value. Without proving its value, coaching doesn’t get the organizational commitment it needs — the dedicated time, the performance review integration, the leader training. And without that commitment, coaching produces exactly the mediocre results that make it hard to justify. Twenty-two percent satisfaction on a 71% strategic priority isn’t a coaching problem. It’s a measurement and accountability problem.

Organizations seeing results with AI coaching are three times more likely to have built the systems around it first

Only 16% of organizations in the study use AI-driven development for coaching. Thirty-two percent use no technology at all. But the higher-performer data tells a different story: organizations seeing results are three times more likely to use AI to predict future development needs (46% vs. 15%) and nearly twice as likely to personalize development with AI (49% vs. 27%).

This doesn’t mean AI is the answer to the priority paradox. An AI coaching platform deployed into an organization that doesn’t train leaders, doesn’t measure outcomes, and doesn’t connect coaching to performance reviews will underperform just like everything else. But for organizations that have built the infrastructure — that train leaders, measure behavior, and treat coaching as a system — AI becomes the mechanism that makes it possible to reach every manager, not just the ones who happen to get paired with a good coach. It’s what moves coaching from a new manager’s first 90 days to their next 900.

See how your coaching program compares across nine research-backed benchmarks

This article draws on the headline findings from the 2026 Leadership Coaching and Mentoring Playbook, published by the HR Research Institute and sponsored by Cloverleaf. The full report includes detailed comparisons between higher- and lower-performing organizations across nine major findings, technology adoption data, competency breakdowns, and actionable takeaways for building the coaching infrastructure that actually produces results.

Reading Time: 5 minutes

Here’s a scenario that plays out at enterprise organizations constantly. Leadership greenlights an AI coaching initiative. The Talent Development team gets budget approval. Someone pulls together a shortlist of four or five vendors. And then, nothing moves. Mostly because the evaluation process itself becomes the project.

Weeks go into drafting an RFP from scratch. IT wants security answers in one format. Procurement wants pricing structured differently. The TD leader is trying to figure out which questions actually matter for a coaching platform versus generic SaaS. By the time the RFP goes out, the original urgency has faded and the committee is fatigued before they’ve reviewed a single vendor response.

This really simply comes down to a process problem. And with 74% of HR leaders now deploying or planning to deploy digital coaching, more enterprise teams are running this gauntlet than ever — most without a playbook built for the category.

Get the 2026 AI coaching playbook to see how organizations are implementing AI coaching at scale.

Every AI coaching vendor tells a different story, your RFP needs to account for that

Most enterprise RFPs for software follow a predictable structure: vendor background, feature checklist, security and compliance, pricing, references. That structure works well enough for categories with established evaluation criteria — project management tools, HRIS platforms, learning management systems.

AI coaching isn’t one of those categories. It’s new enough that vendors describe their products in fundamentally different ways. One platform calls itself an “AI coaching assistant.” Another positions as a “leadership development platform with AI.” A third says “digital coaching at scale.” When vendors don’t even use consistent language, a generic feature checklist produces responses that are impossible to compare.

The fundamental differences between AI coaching platforms run deeper than feature lists. They include coaching methodology (is the AI grounded in behavioral science or just generating plausible-sounding advice?), contextual awareness (does it know who someone is working with and what they’re walking into?), and delivery model (does coaching happen inside the tools people already use, or does it require yet another app to check?). A standard RFP template won’t surface any of this.

That’s why 55% to 75% of enterprise software implementations fail to meet original objectives within the first year. Many of those failures trace back to evaluation — the buying team asked the wrong questions, compared the wrong things, or optimized for features that didn’t matter once the platform was actually in use.

See How Cloverleaf’s AI Coach Works

Five areas where AI coaching vendors diverge and what to ask about each

There are certain questions that, in a generic SaaS RFP, produce nearly identical answers from every vendor. “Do you support SSO?” Yes. “Are you SOC 2 compliant?” Of course. These are table stakes — important to confirm, but not useful for differentiation.

For AI coaching, the differentiating questions are category-specific. Based on what real enterprise RFP processes have revealed, here’s where vendors actually diverge:

1. Coaching methodology and evidence base.

What behavioral science or coaching frameworks inform the AI? How is coaching content validated? Can the vendor point to peer-reviewed research or established models — not just engagement metrics? The seven capabilities that define effective AI coaching provide a useful framework for evaluating whether a platform’s methodology goes beyond surface-level advice.

2. Behavior change measurement.

Completion rates and satisfaction scores are easy to report and nearly meaningless as indicators of development impact. The real question: does the platform track observable behavior change over time? A meta-analysis in Frontiers in Psychology found that coaching has a larger effect on behavioral outcomes (decision-making, communication, leadership behavior) than on more stable personal traits — which means the platform has to be designed to capture those behavioral shifts, not just session counts.

3. Contextual delivery.

Does coaching reach people in the moment it matters? Before a difficult conversation, when onboarding a new team member, during a project staffing decision — or does it sit in a separate portal waiting to be accessed? This is where the gap between “AI coaching” and meaningful manager enablement gets wide.

4. Assessment integration depth.

Some platforms run a single proprietary assessment. Others integrate with the validated instruments organizations already use like DISC, CliftonStrengths®, Enneagram, Insights Discovery, and more. The question isn’t just “which assessments do you support?” but “how does assessment data inform the coaching the platform delivers?”

5. Scalability model.

Can the platform reach every manager, IC, director in the organization, not just senior leaders? Enterprise coaching historically served the top 5-10% of an organization. AI coaching’s promise is scaling that to everyone, but only if the platform’s pricing, architecture, and delivery model support it.

How to get your buying committee to use a shared scorecard

Enterprise buying committees for software typically include eight to ten stakeholders and require six to nine months to reach a decision. For AI coaching, those timelines balloon because each stakeholder evaluates the purchase through a different lens. HR cares about coaching quality and development outcomes. IT cares about integration architecture and data security. Procurement cares about pricing structure and contract terms. Analytics wants to know what’s measurable.

Without a standardized evaluation format, each group asks vendors for information differently, gets responses in different formats, and produces assessments that can’t be compared. The vendor who gives IT the best security answers may have given HR a vague description of coaching methodology, but nobody catches it because they’re scoring in different documents.

A structured RFP with a built-in evaluation scorecard solves this by forcing all vendor responses into the same format and giving the committee a shared scoring framework. It also separates must-have requirements from nice-to-have features upfront — so the group doesn’t spend three meetings debating whether a feature that two people care about should disqualify a vendor the rest of the committee ranked first.

If you’re earlier in the process and still vetting whether AI coaching is the right investment, that’s a different conversation. But once the decision is “yes, we’re evaluating vendors” — the speed and quality of that evaluation depends almost entirely on the structure you bring to it.

What an RFP built for AI coaching covers that a standard template won’t

The coaching platform market hit $4.2 billion in 2026 and is growing at 11% annually. The vendor landscape is expanding fast, which makes structured evaluation more important, not less. More options means more noise to filter.

An AI coaching RFP built for this category needs to cover seven areas that map to how these platforms actually differ:

vendor background and proof points,
product capabilities and coaching methodology,
technical architecture and integration,
security and compliance,
implementation and ongoing support,
commercial terms and total cost of ownership,
and a weighted evaluation scorecard.

Within those areas, the questions need to be specific enough that vendors can’t hide behind marketing language. Not “describe your AI capabilities” (every vendor will say they use AI). Instead:

👉 “What behavioral science frameworks inform coaching recommendations?

👉 How does the system adapt coaching based on the specific relationship between two team members?

👉 What observable outcomes do you track beyond engagement metrics?”

And critically, the evaluation scorecard needs weighted scoring, because product capabilities and coaching methodology should carry more weight than, say, vendor company background. When every category counts equally, you end up selecting the best-looking vendor deck rather than the best coaching platform.

For a deeper look at what separates the platforms themselves, Cloverleaf’s comparison of the best AI coaching platforms for managers and teams breaks down the landscape.

Stop building your AI coaching RFP from scratch

We built an AI Coaching RFP Template because we’ve been on the receiving end of enterprise RFPs, and we’ve seen which ones produce useful evaluations and which ones produce vendor marketing disguised as responses.

The template includes 225+ questions across all seven evaluation categories, pre-tagged as must-have versus nice-to-have for easy customization. It comes with a built-in weighted scorecard that calculates vendor rankings automatically, plus a quick-start guide that walks you through customization, evaluation, and red flags to watch for.

It’s designed to get you from “we need to evaluate vendors” to “RFP sent” in an afternoon, not a month.

Download the Enterprise AI Coaching RFP Template

Reading Time: 8 minutes

The AI coaching category has a labeling problem. The term now covers everything from a chatbot that generates leadership tips on demand to a platform that monitors manager behavior across your entire organization and surfaces coaching inside the tools your people are already using. Both are called AI coaching. Neither is wrong to use the term. But they’re not the same thing, and the distance between them is roughly the distance between a gym membership and a personal trainer who shows up at your door every morning.

That distinction matters a lot when you’re a talent development leader evaluating platforms for a manager population of several hundred or several thousand people. A platform that looks impressive in a demo and generates strong engagement metrics in a pilot can still fail to produce any measurable behavior change at scale — not because the AI isn’t sophisticated, but because the architecture isn’t designed to reach people in the moments when behavior actually changes.

Get the 2026 AI coaching playbook to see how organizations are implementing AI coaching at scale.

Over a decade of working with organizations on manager development, and through research that examines how employees actually spend their time at work — the data shows roughly 14,640 interpersonal interactions per employee per year happening in messaging tools, meetings, and email — a small set of platform design decisions turn out to predict whether AI coaching actually changes behavior or just gets used for a few months before quietly becoming another unused SaaS subscription.

Here are the seven. Use them as an evaluation framework for any platform you’re considering.

7 capabilities an AI coaching platform must have for your organization

1. It comes to your people — without requiring them to go anywhere

Every AI coaching vendor says they’re “in the flow of work.” It’s worth asking exactly what that means, because the phrase covers a wide range of delivery models.

One version: the platform lives in your HR lifecycle. It appears in performance reviews, in onboarding workflows, in goal-setting cycles. It’s there when HR creates the moment. That’s useful — and it still leaves the other 98.5% of employee interactions untouched.

Another version: the coaching shows up in the tools employees are already working in. Email. Slack. Microsoft Teams. Calendar. Not as a link inviting someone to go visit a coaching platform, but as three sentences that appear where the manager’s attention already is, timed to the moment when those sentences will actually matter — before a difficult 1:1, after a performance conversation, when a new person joins the team.

HR and L&D functions have, on average, about 220 meaningful touchpoints per employee per year. That 1.5% of the year matters. But behavior change happens in the other 98.5%, in the back-to-back meetings and the quick Slack exchanges and the moment someone walks out of a hard conversation not quite sure what went wrong. Coaching that doesn’t reach into those moments is coaching that stays contained to the programs HR already runs — helpful, but not structural.

The question to ask any vendor: does the coaching proactively appear in the tools employees are already in, without requiring a separate visit?

See How Cloverleaf’s AI Coach Works

2. It’s triggered by what’s actually happening in your organization

The most powerful coaching arrives at the right moment — not because someone remembered to open an app, but because the platform detected that a coaching-relevant event just happened.

A manager just got promoted and inherited a new team. An employee’s latest performance review flagged adaptability as a growth area. A new direct report was added to a recurring meeting. A team’s engagement survey showed a dip in recognition. These are moments when coaching is genuinely useful — when the manager has a reason to pay attention and a specific situation to apply the insight to.

This kind of event-triggered delivery requires HRIS integration — a connection to the systems of record that actually know when organizational moments happen. When a platform is integrated with Workday or another HRIS, it can detect a promotion, a role change, a performance review completion, and fire coaching automatically in response, without anyone having to configure a workflow or remember to log in.

Not every platform does this. Many require the manager to initiate. That’s a meaningful distinction — a manager who already knows they need coaching might seek it out; a manager who doesn’t know what they don’t know won’t.

The question: does the platform detect organizational events and respond to them, or does it wait to be asked?

3. It’s built on validated behavioral science

There’s a meaningful difference between an AI coach that knows a person’s name and job title and one that understands, at a behavioral level, how that person processes information, makes decisions, responds to feedback, and experiences stress.

The behavioral profile is what makes personalization real rather than cosmetic. When a manager gets coaching on how to have a difficult conversation with a direct report, “personalized” shouldn’t just mean the direct report’s name is in the prompt. It should mean the coaching reflects how that specific person tends to respond to direct feedback — whether they need context before conclusions, whether they hear criticism as a threat or as useful data, whether they’ll engage more openly in writing than in person.

That kind of insight comes from validated behavioral assessments — DISC, CliftonStrengths®, Enneagram, Insights Discovery, and others — that have been rigorously developed and tested over decades. These aren’t just personality quizzes. They’re behavioral frameworks that organizations have invested in for a reason: they create a shared language and generate reliable predictions about how people work.

One important implication: if your organization has already invested in these assessments, the right AI coaching platform should make those investments compound, not become sunk costs. A platform that requires a new proprietary assessment — or asks employees to manually upload scores from another tool — adds friction and abandons the shared language you’ve already built.

The question: does the platform integrate with the validated assessments your organization has already adopted?

4. It connects behavioral data to your organizational context

Knowing who someone is matters. Knowing who they are in the context of your organization — against your leadership competencies, your values, your team structures — matters more.

A new manager who needs to grow in adaptability benefits from coaching on adaptability in general. But they benefit much more from coaching that knows adaptability is a core competency at your organization, understands what adaptability specifically means in your context (is it speed of decision-making? flexibility with ambiguity? comfort with restructuring?), and connects that to the specific behavioral reasons why adaptability might be hard for this person.

The same principle applies to onboarding, to cross-functional collaboration, to succession planning. Coaching that doesn’t know what your organization cares about can still be helpful — the way a generic leadership book is helpful. Coaching that’s grounded in your frameworks can be transformational, because it closes the gap between insight and the specific situation the manager is actually in.

This requires the platform to be configurable to your organization’s actual competency model, values, and priorities — not to a generic coaching library.

The question: can the platform be trained on your frameworks, not just its own?

5. It speeds up how quickly a manager gets to know their team

New managers — whether they’re first-time managers or experienced leaders inheriting a new team — spend weeks or months trying to understand who their people are. Who operates best with direct feedback and who needs context first. Who’s quietly burning out while saying everything is fine. Who has organizational intelligence that the manager doesn’t yet have access to. Who will advocate for the team’s needs and who will absorb workload silently until it becomes a problem.

In a world without AI, that understanding takes relationship capital that takes time to build. In a world with effective AI coaching, that timeline compresses dramatically — because the platform already knows the behavioral profiles of the team members, can flag likely friction points before they surface, and can help the manager prepare for individual conversations in ways that are specific to each person rather than based on how the manager was once managed themselves.

A manager walking into a 1:1 with a new direct report doesn’t need a 10-page overview of that person’s profile. They need three sentences: here’s how this person prefers to receive feedback, here’s what they need from you right now, here’s what to watch for. That’s the onboarding value of AI coaching — not just onboarding to the company, but onboarding to the team.

The question: does the platform help managers understand their teams faster, or just give managers content to read?

6. It measures behavior change, not just engagement

Usage metrics are easy to generate. Time-in-app, sessions per week, modules completed, NPS scores — these are real numbers and they’re not meaningless. But they don’t answer the question that budget holders are increasingly asking: did behavior actually change?

The HR function has historically been limited to measuring whether people liked a program — sentiment data collected through surveys, often months after the program ended. AI coaching, if it’s truly embedded in the flow of work, generates something more valuable: a continuous record of what managers are working on, what challenges they’re raising, what they’re trying, and whether they’re returning to apply what they practiced. That data, aggregated at the organizational level, is evidence — not proxy metrics but observable indicators of whether the investment is changing how managers lead.

This is the difference between a TD leader who can tell their CHRO “we had 2,000 managers log in last quarter” and one who can say “manager feedback conversations are measurably more specific and constructive than they were six months ago, and here’s the data.” That’s what behavior-level measurement makes possible.

The question: does the platform give you behavior-level measurement, or just engagement metrics?

7. It’s designed for managers who don’t have time to spare

This one sounds simple. It isn’t.

The default design of many AI coaching tools is the long-form conversation: an open-ended chat session that can go wherever the manager wants to take it. There’s genuine value in that for managers who have time and appetite for it. But most managers, on most days, don’t. They’re moving from meeting to meeting with a few minutes between. They’re dealing with the urgent at the expense of the important. A coaching interaction that requires 20 minutes of focused engagement isn’t going to happen consistently — which means it’s not going to change behavior at scale.

Effective AI coaching at scale is designed for the manager with 30 seconds, not the manager with 30 minutes. That means: three sentences, not a page. An actionable suggestion, not an open-ended question. A coaching moment with a designed ending — one that says “you have what you need now” rather than continuing to generate conversation indefinitely. And if the manager has more time and wants to go deeper — role-play the upcoming conversation, explore the situation further — that option is there. But it’s not required.

The feedback from managers who actually use AI coaching consistently is almost always some version of the same thing: I love it because it’s fast. Not because it’s comprehensive. Fast is a feature. The question: does the platform design for the manager who has 30 seconds, or for the one who has 30 minutes?

How to use this AI Coach criteria in your next evaluation

These seven criteria work best as conversation-starters in vendor demos, not as a scoring rubric. Most platforms will say “yes” to most of them in a demo setting. The useful follow-up is always the same: show me what that looks like in the product, and describe what the employee actually has to do to receive it.

The answers that matter aren’t the ones about future roadmap — they’re the ones about how the product works today. A platform that delivers coaching proactively in Slack without requiring a login is architecturally different from one that plans to do that eventually. A platform integrated with Workday for event-triggered coaching is running different code than one that’s planning the integration. These aren’t small distinctions.

The organizations that get the most from AI coaching are the ones that chose a platform aligned with how their managers actually work — not how they aspire to work — and with the assessment infrastructure they’ve already built. Those choices narrow the field considerably. And the platforms that clear all seven criteria are a short list.

Want to see how Cloverleaf addresses each of these criteria? The platform integrates 12+ validated behavioral assessments, delivers coaching directly in Slack, Teams, and email through HRIS-triggered events, and includes behavior-level measurement built in — no separate analytics platform required.