Beyond the sandbox: A playbook for scaling L&D innovation enterprisewide

Learning and development are the stewards of culture, and we must role-model experimentation. However, for innovation to truly flourish, it cannot stay in the lab.

A pilot that failed

Last year, our learning and development team fell into the classic innovation trap. We launched a pilot with an artificial intelligence-powered coach, designed to help managers navigate difficult performance reviews and polish their communication skills—a critical need during our review cycle.

On paper, the strategy appeared very promising. We invited over 20 motivated managers who had already attended our performance review workshops to participate. We provided them with on-demand access to a safe, AI-driven environment where they could practice tough conversations—like addressing underperformance—before facing their direct reports. 

We expected high engagement. We anticipated deep learning. The result? A ghost town. Across 20 participants and several weeks of the pilot, the total time spent with the AI coach was 10 minutes. Not 10 minutes per person, 10 minutes combined. 

The failure wasn’t in technology; the AI coach was actually very capable. We failed because we designed a pilot for a “sandbox,” not for the messy reality of a manager’s workday. We selected the wrong audience criteria, we ignored workflow friction and we waited for satisfaction scores when we should have been watching activation rates.

Going beyond the sandbox

For many L&D leaders, this is a familiar pain point. We often find our most promising innovations getting stuck in “pilot purgatory”—failing to gain traction when rolled out to the wider organization.

Consider these sobering stats: 

  • LinkedIn’s 2024 Workplace Learning Report states that fewer than 5 percent of large corporate reskilling programs advance far enough to measure success.
  • According to a November 2025 report by McKinsey & Company regarding the “state of AI in 2025: Agents, innovation and transformation,” most organizations are still in the experimentation phase with their AI initiatives, with nearly two-thirds of respondents stating that their organizations have not yet begun scaling AI across the enterprise.
  • Data from Boston Consulting Group’s 2021 report claims that “performance and innovation are the rewards of digital transformation.” However, only 35 percent of more than 850 companies surveyed worldwide achieve their digital transformation objectives.

We plan to run another AI coach experiment this year, but we’re recalibrating our approach. Here are three best practices L&D teams should consider implementing to ensure their experiments go beyond the sandbox and successfully scale across the enterprise. 

Best practice 1: Target the ‘point of pain,’ not the ‘path of enthusiasm’ 

In our initial pilot, we selected managers who were already active, engaged and open to the performance review process. These were our “champions”—people who had already attended workshops to hone their skills. We assumed their enthusiasm would translate into usage. We were wrong. Because these managers were already invested in their development, they likely felt competent enough to handle reviews without the AI coach’s support. For them, the tool was a “nice to have,” not a “must-have.” 

To accurately test an innovation, you must select the audience that feels the problem most acutely. You need the people who are truly feeling the pain.

In our case, we should have targeted managers who lacked the competence or confidence to deliver reviews effectively. We should have looked for managers who dread the conversation, not the ones who sign up for workshops to perfect it.

To find this audience, look for the behavioural signals of struggle. In our context, this would mean identifying managers with historically low compliance in review completions or those who have received critical feedback from employees regarding the quality of their past reviews. 

This principle applies broadly. For example, if you are testing a new candidate assessment tool, do not invite your best hiring managers. Invite the managers who are suffering from high new-hire turnover rates. They are the ones feeling the consequences of poor selection decisions. 

When you target the audience with the highest pain point, you are testing if your solution provides enough relief to drive adoption. If the people drowning in the problem won’t grab the life raft you’ve built, the raft is broken. 

Best practice 2: Solve for workflow integration, not just capability 

In our pilot, the AI coach was a standalone destination. To use it, managers had to leave their daily tools, log into a separate system, and navigate a new interface. We failed to account for the cognitive load of the performance review period. During this high-pressure window, managers viewed the AI coach not as a helper but as a distraction.

To design for scale, L&D teams must move from “destination learning” to workflow integration. In our new iteration, we plan to embed the solution directly into the flow of work. This means placing direct links to the AI coach inside the system where the performance review actually happens.

Another effective approach is to integrate the “nudges” into the company’s primary communication tools, such as Slack. For example, when sending Slack reminders about the performance review process steps and milestones, you could simultaneously trigger reminders to complete rounds of practice in the AI coach. 

The strategic shift here is minimizing the “distance” between the need and the solution. Every extra click, login, or window switch is a barrier that can drop adoption by double digits. By placing the tool exactly where the work is being done, we reduce decision fatigue.

We shouldn’t be telling managers to “learn”; we should be offering them a tool to get the job done faster and better. The goal is to make learning the path of least resistance. 

Best practice 3: Measure operational viability 

Perhaps our biggest mistake in the first experiment was our measurement strategy. We planned to measure manager satisfaction with the tool—a standard “vanity metric.” Our go/no-go decision was supposed to be based on how helpful the managers found the tool. 

However, because adoption was near zero, we never gathered enough data to measure satisfaction. We had no basis to make a decision about scaling. 

For enterprisewide innovation, L&D teams need to shift their focus from “sentiment metrics” (Did the learners like it?) to “operational viability metrics” (Can this initiative or program survive at scale?). A better metric for such a pilot would have been activation rate or time to first interaction. These metrics tell us if the tool is intuitive enough to be adopted without hand-holding. 

Equally important are the invisible costs of scaling. It’s critical to measure the load on your support infrastructure. If a pilot is “successful,” but generates a massive spike in IT tickets, it is an operational failure.

Success isn’t just a “4.5 out of 5 stars” rating. True success looks like: 

  • Activation: x percent of the target group accessed the tool within 24 hours. 
  • Stickiness: y percent of users returned for a second session without a nudge.
  • Support: Support tickets remained below z percent of the user base. 

These are the metrics that kill rollouts—and they are the ones L&D teams must test for first.

Innovation requires execution 

L&D teams play a critical role in driving organizational innovation. We are the stewards of culture, and we must role-model experimentation. However, for innovation to truly flourish, it cannot stay in the lab. 

Too often, “innovation” in L&D becomes synonymous with “procuring new tools” rather than “solving business problems.” When we allow our best ideas to languish in the sandbox, we don’t just waste budget—we erode our credibility with the business. The business doesn’t pay us to run interesting pilots; they pay us to build organizational capability. 

Our experiments must be designed from day one to survive the harsh reality of the business environment. By stress-testing with our skeptics and people who feel most pain, integrating deeply into the workflow and measuring operational viability, we can ensure our best ideas move beyond the sandbox and deliver impact at scale. 

As we embrace AI and new technologies, our mandate is clear: Don’t just verify that it works. Prove that it scales.