Article

Adapting Agile Practices for AI-Focused Product Development

bulbs
March 16, 2025
|
by Shrikant Vashishtha
AI Product Development
Scrum for AI
Agile Research Spikes
Dual-Track Agile
Agile Metrics for AI Development
bulbs

When it comes to AI product development, some teams which worked in a comparatively predictive development environment in the past come for a rude shock.

As most of the work is exploratory and research-driven, a single spike (POC) can span multiple sprints. Even after working for multiple sprints, the research may fail. Sometimes, even after research succeeds, unforeseen issues during implementation may make the feature unviable.

This can be challenging as the usual indicators of Sprint success (e.g. the sprint goal, achieving potential shippable increment or shipping multiple increments) do not seem to help anymore.

All this sort of uncertainty makes it difficult to showcase progress to stakeholders who may have been accustomed to looking at functional deliverables at the end of a Sprint in other projects.

Usual metrics like Velocity or Cycle Time do not seem to work anymore as they dont remain consistent and keep changing.

All this puts a big question mark if AI related development is even possible in Scrum. Also are there other alternatives which could handle this in a better way?

Research Oriented Work Requires Mindset Shift

In many cases, AI-related product development requires deep research, analysis, and development. Considering research work is unpredictable and has a probability of both success and failure, the Sprint goal can be learning-oriented, where the focus of such goals is to reduce uncertainty instead of just delivering a feature.

In AI research, linear progress is a myth. Your teams aren't just building; they're discovering, testing hypotheses, and often learning what doesn't work is as valuable as finding what does.

If a team discovers that a research direction is not feasible, it's still valuable as it saves future time and resources.

Due to an expectation mismatch with stakeholders who are used to receiving valuable product increments at the end of each Sprint, the team may feel constant pressure during each Sprint. However, hypothesis invalidation provides incredible value, as the team stops spending valuable time in a nonviable and unfeasible direction.

To handle such scenarios, it’s important to set the right expectations for stakeholders—that some Sprints may focus on learning while others focus on feature delivery. Additionally, it should be clear to them that it’s not feasible to have a successful outcome from every Sprint.

The team can follow the following strategies for communicating with stakeholders:

  • It is important to emphasize that research failures are part and parcel of AI innovation. Failed spikes, in general, save months’ worth of costly development, and such early failures need to be celebrated.

  • Sprint reviews are generally focused on demonstrating what worked and is potentially shippable. However, such reviews should also include discovery demos that focus on what was learned and the next steps, apart from what worked.

  • As a team frames each completed spike as a step toward reducing uncertainty, stakeholders should be able to see them as an investment toward better clarity rather than wasted time and money.

Adjust the Definition of Done of Research Spikes

In many cases, teams do not define spikes properly. For instance, sometimes a one-liner spike definition may be sufficient, but in many other cases, it may not. 

It becomes important to mention in the spike definition what sort of output is expected and based on what dimensions before the team starts working on a spike.

The spike output should be able to provide recommendations in the form of a Go or No-Go decision to the team.

All spikes need to be timeboxed, as sometimes there is no end to continuing a research activity, and it could become a victim of analysis paralysis.

At the end of a spike timebox, a team may assess whether they want to continue working in the direction they have been exploring and define another timebox to be scheduled in the next Sprint. In some cases, they may realize that research in the current direction is not fruitful and decide to pivot to another research direction.

When it comes to the Definition of Done (DoD) for spikes, instead of focusing on implementing some functionality, the DoD should include evidence-based conclusions (feasible/not feasible), hypothesis validation (did the spike confirm or disprove a technical approach?), recommended next steps, and documented findings.

Moving to Dual-Track Agile (Discovery & Delivery) for a Large Enough Team

Considering that the discovery (research) part itself consumes a lot of the team’s capacity while the rest is used for product development, it may be beneficial for the team to split into two tracks:

  • Discovery Track for research, experimentation, and user validation.

  • Delivery Track for building production-ready features.

The discovery team focuses on researching and investigating unknowns in short iterations, then hands off validated ideas to the delivery team when feasibility is confirmed.

This model helps isolate the unpredictable nature of research from the regular cadence of shipping features.

For this model to work, it’s important to have a large enough team that can be divided into dual tracks. At the same time, these two tracks shouldn’t work in silos; the Discovery track should consider the Delivery track as its customer, as the Delivery track’s work will depend on the Discovery output. Sometimes, depending on where the team is, both tracks can work on Discovery or on the Delivery track, collaborating as needed.

Measuring Progress with Learning Metrics

Instead of delivery-oriented KPIs (e.g., velocity), learning-focused KPIs can help stakeholders see the real value of AI research. Here are some examples:

Research Throughput:

The number of spikes completed (both feasible and infeasible), showing how many unknowns have been tackled within a given period (e.g., Sprint).

Decision Lead Time: 

Helps identify bottlenecks where research gets stuck and measures how long it takes to go from “we need to figure this out” to “we have a decision.”

Hypothesis Validated vs. Invalidated:

At the beginning of each spike, the team states a hypothesis (e.g., “This text-to-speech API can integrate seamlessly with our architecture”). At the end, they mark it as “validated” or “invalidated,” along with any partial learnings.

This reiterates the idea that invalidating a hypothesis early is valuable, as it saves time by preventing pursuit of the wrong path.

Stakeholder Satisfaction/Clarity:

A quick survey to understand if stakeholders feel informed about the R&D process. A good result confirms that stakeholder communication is on the right track; otherwise, the team needs to make necessary adjustments.

If you want to define some other KPIs for your project, it becomes helpful to identify them through Basili’s Goal Question Metric (GQM) approach.

Conclusion

Building AI-driven features often requires a different approach than traditional, feature-driven Scrum.

It is important to shift the mindset of the team and stakeholders toward a learning focus instead of a delivery focus. Making necessary changes in the Definition of Done (DoD) and adopting learning-oriented metrics can help keep the team agile while embracing the inherent unpredictability of AI R&D.

Key Takeaways

  • Adjust the Definition of Done for research spikes to emphasize validated learning.

  • Communicate frequently with stakeholders, reinforcing that disproving a hypothesis is progress.

  • Measure learning rather than just velocity; metrics like Decision Lead Time and Hypothesis Validation can show the true value of AI research.

By embracing these techniques, an AI team can innovate confidently, build trust with stakeholders, and deliver real value—even when the path to a fully integrated AI feature is full of surprises.

Frequent Releases
December 16, 2024
Secrets to Mastering High-Frequency Production Deployments

Frequent releases enable teams to deliver value faster by breaking features into small, deployable slices. This article explores best practices for iterative progress, reducing risks through automation, and maintaining customer focus to improve agili...

by Shrikant Vashishtha

Starting Rockets
October 15, 2024
Applying Lean Startup Concepts in Existing Product Development

Learn how established companies can innovate like startups! Applying Lean Startup methods helps teams quickly test ideas, save resources, and build products customers truly want. Discover the secrets of MVPs, rapid testing, and agile learning to stay...

by Shrikant Vashishtha

AI
February 25, 2024
The Risks and Opportunities of AI

Squads is using AI more and more, but we have strong reservations. This article tries to shine light on the reasons why we are reluctant, and to cut away some FUD surrounding AI. A left brained, down to earth look at it, to balance against the hyperb...

by Iwein Fuld