TL;DR
An AI POC checklist exists to make sure a 90-day proof of concept ships a real decision: scale, kill, or extend. Most AI POCs fail because they have no owner, no baseline, and no kill criterion. The checklist below splits 90 days into setup, execution, and measure-and-decide, and forces a written go/no-go decision on day 90. Copy the template inline and run it.
- One named owner. One anchor metric. One kill criterion.
- Baseline before AI. Two weeks of clean data minimum.
- Two-week milestones, not month-end check-ins.
- Decision on day 90. In writing.
In this article
Why most AI POCs fail
I have seen more AI proof-of-concept work die in production-adjacent purgatory than I can count. The pattern is the same every time. A team gets excited about a capability, runs a demo, calls it a POC, declares it a success, and then the work never ships. Six months later, nobody can find the slack channel and the budget is gone.
Three failure modes account for almost all of it.
- No owner. The POC was sponsored by a committee or a VP who is two steps removed from the workflow. When the work hits its first friction (data access, a vendor delay, a stakeholder objection), nobody has the authority to push through. The POC drifts.
- No baseline. The team never measured what the current-state metric was before the AI was introduced. So when the POC ends, there is no way to claim a lift, because there is no number to compare against. Everyone agrees "it feels better" and the work disappears.
- No kill criterion. The team never wrote down what would cause the POC to be shut down. So when results come in soft, the POC gets extended. And extended. And eventually quietly absorbed into "ongoing work" with no decision ever made.
The 90-day AI POC checklist exists to make those three failures impossible. If the checklist is filled in honestly, the POC has an owner, a baseline, and a kill criterion. The rest is execution.
A POC without a kill criterion is not a POC. It is a budget line waiting to expire.
The 90-day structure
Ninety days is the right window for most consumer-brand AI POCs. Shorter and the team cannot collect enough data to measure anything. Longer and the POC stops being a POC. It becomes a project, with the political weight of a project, which makes it harder to kill.
Ninety days splits cleanly into three phases of roughly 30 days each.
- Days 1-30: Setup. Owner named, anchor metric defined, baseline measured, kill criterion agreed in writing, infrastructure in place.
- Days 31-60: Execution. AI capability running inside a real workflow. Weekly status. Two-week milestone reviews.
- Days 61-90: Measure and decide. Data collected. Anchor metric compared to baseline. Go/no-go decision made on day 90.
The phases are not parallel. You cannot start execution before the baseline is measured. You cannot run the decision before the data is in. The sequencing matters because each phase produces the artifact that makes the next phase possible.
The AI POC checklist (copy this)
This is the checklist I use, in markdown format. Copy it, fill it in, and post it where the team can see it. Keep it to one page.
# AI POC: [POC Name]
**As of:** [date]
**Owner:** [name, role, email]
**Sponsor:** [executive name, role]
**Workflow:** [the actual workflow this touches]
**Anchor metric:** [single measurable outcome]
**Baseline:** [number, with date range and source]
**Target:** [number] | **By:** [date]
**Kill criterion:** [the condition that would cause shutdown, in writing]
---
## Phase 1: Setup (Days 1-30)
**Goal:** POC is ready to execute on day 31.
### Owner and stakeholders
- [ ] Owner named, role confirmed, in calendar
- [ ] Executive sponsor confirmed, sign-off on anchor + kill criterion
- [ ] Workflow lead identified and bought in
- [ ] Procurement, legal, security pre-cleared
### Anchor metric and baseline
- [ ] Anchor metric defined (one number, not three)
- [ ] Baseline measured over at least 14 days of clean data
- [ ] Data source documented (where the number comes from)
- [ ] Target lift and date written down
- [ ] Kill criterion written down and signed by sponsor
### Infrastructure
- [ ] Vendor or model selected for POC (not necessarily production)
- [ ] Data access in place (read at minimum)
- [ ] Logging and observability set up before first inference
- [ ] Cost ceiling set, with daily and weekly alerts
- [ ] Rollback procedure documented
### Communication
- [ ] Weekly 30-min status meeting on the calendar
- [ ] Slack channel or equivalent with the full team
- [ ] Two-week milestone dates in calendar (day 14, 28, 42, 56, 70, 84)
- [ ] Day 90 go/no-go review scheduled
---
## Phase 2: Execution (Days 31-60)
**Goal:** AI capability running inside the real workflow, generating measurable output.
### Two-week milestone 1 (day 42)
- [ ] Capability live in workflow
- [ ] First 14 days of data captured
- [ ] Status memo: on track / off track / blocked
- [ ] Adjustments documented
### Two-week milestone 2 (day 56)
- [ ] 28+ days of data captured
- [ ] Anchor metric movement vs. baseline calculated
- [ ] Adoption rate calculated (workflow %)
- [ ] Cost-to-date calculated against ceiling
- [ ] Status memo: on track / off track / blocked
### Kill check
- [ ] At day 56, sponsor reviews against kill criterion
- [ ] If kill criterion is tripped, POC is shut down here
- [ ] If not tripped, POC continues to measure phase
---
## Phase 3: Measure and decide (Days 61-90)
**Goal:** Decision in writing on day 90.
### Data collection (days 61-84)
- [ ] Final dataset locked
- [ ] Anchor metric delta vs. baseline calculated
- [ ] Adoption rate finalized
- [ ] Cost-per-unit and total POC cost calculated
- [ ] Qualitative feedback collected from workflow users
### Decision artifact (day 85-90)
- [ ] One-page memo: anchor metric movement, adoption, cost, decision
- [ ] Recommendation: ship to scale, kill, or extend
- [ ] If extend: new date, new kill criterion, new sponsor sign-off
- [ ] If ship: scale plan owner named, kickoff date set
- [ ] If kill: archive note, lessons learned written
### Day 90 review
- [ ] Sponsor and owner sign the decision
- [ ] Decision posted publicly to the team
- [ ] Budget routed to scale, returned, or reallocated
That is the entire checklist. One page if you keep the formatting tight. It is intentionally short. If the POC needs more detail, that detail lives in subsidiary documents (pilot specs, vendor contracts, governance memos). The checklist stays one page because the one-page constraint is what makes it operable.
For the broader context this checklist sits inside, see the AI Transformation Playbook for Consumer Brands and the companion AI Transformation Roadmap Template. The POC checklist is the unit. The roadmap is the program.
The go/no-go decision template
On day 90, you do not have a "results review." You have a decision. The decision template is short.
# AI POC Decision: [POC Name]
**Date:** [day 90 date]
**Owner:** [name]
**Sponsor:** [name]
## What we tested
[One sentence on the AI capability and the workflow it touched.]
## What we measured
- Anchor metric: [name]
- Baseline: [number] (measured [date range])
- POC result: [number] (measured [date range])
- Delta: [+/- %, absolute]
- Adoption: [% of target workflow using it]
- Cost: $[total] | $[per unit]
## What we decided
[ ] Ship to scale
[ ] Kill
[ ] Extend (new end date: [date], new kill criterion: [text])
## Why
[Two to four sentences. Tie the decision to the anchor metric and the kill criterion.]
## What happens next
- Owner of next phase: [name]
- Kickoff date: [date]
- Budget: $[amount]
Signed: _______________ (Owner) | _______________ (Sponsor)
That is the artifact that closes the POC. Without it, the work drifts. With it, the team has a written record of what was tested, what happened, and what was decided. That record is the most valuable thing the POC produces, more valuable than the model output itself.
The POC's deliverable is not the AI. It is the decision.
What success and failure look like
A successful POC has three properties on day 90.
- The anchor metric moved by the agreed threshold. Not "it felt better." Not "the team likes it." The number moved, against the baseline, by at least what the team committed to in week one.
- Adoption is real. The target workflow used the capability. Logs confirm it. If the capability shipped but nobody used it, the POC did not succeed regardless of what the model can do.
- Cost-to-serve is inside the ceiling. The team knows what an inference costs, what the monthly run-rate is, and what scaling to full workflow would cost. For more on this side of the math, see the LLM cost calculator.
A failed POC has those properties inverted. The anchor metric did not move (or it moved less than the kill threshold), adoption is below 30 percent, and the cost-to-serve is unclear. A failed POC is not a tragedy. It is a result. The team learned something, archived it, and recovered the budget for the next attempt.
The worst outcome is not a failed POC. The worst outcome is the POC that drifts, never gets killed, and consumes budget for two more quarters before quietly being absorbed into "ongoing work." That is the outcome the 90-day checklist exists to prevent.
The bottom line
The 90-day AI POC checklist forces three things that most AI proof-of-concept work lacks: a named owner, a measured baseline, and a written kill criterion. The 90-day window splits cleanly into setup, execution, and measure-and-decide. The decision on day 90 is mandatory and it is in writing.
Copy the template. Fill it in honestly. Run the cadence. Make the decision. Then either scale the work or recover the budget and try again. Either result is fine. Drift is not.
FAQ
What is an AI POC?
An AI POC is a time-boxed proof of concept that tests whether a specific AI capability can move a specific business metric inside a real workflow. It is not a sandbox experiment and it is not a vendor demo. A POC has a named owner, a measured baseline, a kill criterion, and a go/no-go decision date.
How long should an AI POC take?
90 days is the right window for most consumer-brand AI POCs. Shorter than 30 days and the team cannot collect enough data. Longer than 120 days and the POC stops being a POC and becomes an unowned project. 90 days splits cleanly into setup, execution, and measure-and-decide phases.
What is a kill criterion?
A kill criterion is a written, pre-agreed condition that would cause the POC to be shut down. For example: less than 10 percent lift on the anchor metric after six weeks. It is the most important sentence in the POC document because it is the one that makes the decision honest.
Who owns an AI POC?
One named person with authority to ship or kill. Not a committee. Not "the AI team." The owner sits inside the workflow the POC is touching. If the POC is about CX, the CX lead owns it. If it is about creative production, the creative ops lead owns it.
How do you measure POC success?
Against the anchor metric, with the baseline measured before the POC began. Success means the metric moved by the agreed threshold, the cost-to-serve is within the budget, and the workflow is willing to keep using the capability after the POC ends. Adoption is part of success.
What happens when a POC succeeds?
It graduates into a scale plan. That means production deployment, observability, cost monitoring, and an operating cadence. POCs that succeed but never get a scale plan tend to quietly die in month four when the original team moves on. Plan the scale phase before the POC ends.