TL;DR

V1 Constrain is step three of the V1 Framework. It is the work of defining what each unit of AI work is allowed to receive, allowed to produce, allowed to decide, and allowed to fail at. Constraint feels limiting and is actually liberating. A constrained AI is fast and reliable. An unconstrained AI is slow and hallucinatory.

  • Constrain four dimensions: inputs, outputs, authority, failure modes.
  • Constraints are the operating spec, not the instructions.
  • Every constraint needs an explicit escalation path.
  • Constraints can be relaxed as the system earns trust.
  • Under-constraint is the most common production failure mode.

Why constraint is liberating, not limiting

Most teams approach constraint with reluctance. Constraint feels like a step backward, because the model is technically capable of more than the constraint allows. The reluctance is a misread of how production AI actually works.

An unconstrained AI has the entire search space to wander through on every call. It produces different outputs for the same input across runs. It hallucinates fields that do not exist. It invents tool calls that do not resolve. It charges you for tokens you did not need. Every one of those failure modes is a constraint that was not written.

A constrained AI has a narrow search space. It produces the same shape of output every time. It cannot invent fields, because the output schema says what fields exist. It cannot call tools you did not give it. It cannot return formats your downstream system cannot parse. Every one of those guarantees is a constraint that was written.

Constraint is the difference between a system you can operate and a system you have to babysit. The teams that ship production AI write the constraints first. The teams that ship demos write the prompt first and hope.

The four constraint dimensions

For each unit of work that came out of V1 Step 2: Decompose, four dimensions need explicit boundaries.

1. Input constraints

What is a valid input to this unit? What shape, what fields, what value ranges? What happens if the input does not match? Inputs are constrained by schema. The schema says what the unit will accept. Anything that does not match the schema gets rejected before it ever reaches the model. Schema rejection is cheap. Hallucinated handling of malformed input is expensive.

2. Output constraints

What is a valid output from this unit? The output schema is the contract with the next unit downstream. If the schema says "one of {classify_a, classify_b, classify_c}" then the model cannot return classify_d. Structured output, JSON mode, and tool-use schemas exist for this reason. Use them. Free-text outputs are the single most common source of downstream brittleness.

3. Authority constraints

What is the AI allowed to decide on its own, and what must it escalate? Authority is the most important constraint in any customer-facing AI system. A unit that can issue refunds up to $50 is bounded. A unit that can issue refunds up to whatever it thinks is fair is unbounded and dangerous. Authority is defined by thresholds, allowlists, and explicit "must escalate" rules.

4. Failure mode constraints

What does this unit do when it cannot succeed? Every constraint creates a boundary, and every boundary creates a failure mode. The failure mode has to be designed. Does the unit return an error? Hand off to a human? Retry with different parameters? Default to a safe fallback? Unspecified failure modes get filled in by the model, which means they get filled in unpredictably.

An unspecified failure mode is a constraint the model gets to invent at runtime. That is not a constraint. That is a wish.

Constrained AI versus unconstrained AI

The performance gap between constrained and unconstrained AI in production is not subtle. Constrained AI is consistently faster, cheaper, and more reliable. The reasons are mechanical.

Speed. A constrained output schema means the model can stop generating as soon as the schema is satisfied. Free-text outputs run to the maximum token limit and beyond. Structured output cuts latency by 30 to 60 percent on most workflows I have shipped.

Cost. Fewer tokens, fewer retries, fewer parsing failures. A well-constrained unit costs a fraction of an unconstrained version of the same logic. At any meaningful volume, the cost difference compounds into the budget difference between a pilot and a production system.

Reliability. Constrained outputs are parseable on the first try. Unconstrained outputs require retry logic, fallback parsers, and human review of edge cases. The reliability gap is the difference between a system that ships and a system that stays in QA forever.

Teams sometimes object that constraint reduces the model's "creativity." That objection conflates the unit's job with the system's job. The system can be creative at the level of the workflow design. Each unit inside the workflow should be boring, predictable, and bounded. That is what makes production AI different from demo AI. For more on that distinction, see Production AI vs AI Demos.

A worked example: constraining a CX deflection workflow

Take the decomposed workflow from V1 Step 2: five units that classify a ticket, look up the order, look up the policy, decide a path, and generate a response or escalation. Without constraints, that workflow is a wish. With constraints, it is a system.

Here are the constraints on the "decide path" unit, which is the highest-stakes decision in the workflow.

Input constraints. The unit accepts a JSON object with three required fields: classification (enum of five values), order_status (enum of seven values), and policy_text (string, max 2000 characters). Any input missing a field or carrying an unrecognized enum value is rejected and routed to human review.

Output constraints. The unit returns a JSON object with two required fields: path (enum of four values: auto_resolve, draft_for_review, escalate_to_human, refund_within_threshold) and confidence (float between 0 and 1). The schema is enforced. No free-text reasoning. Reasoning, if needed, goes in a separate trace field.

Authority constraints. The unit can issue refunds up to $50. Above $50, the path is forced to escalate_to_human regardless of model output. The unit can auto-resolve only when confidence is above 0.9. Between 0.7 and 0.9, the path is forced to draft_for_review. Below 0.7, escalate_to_human.

Failure mode constraints. If the model returns an output that does not match the schema, the unit retries once with the same input. If the retry fails, the path defaults to escalate_to_human with a logged error. If any external lookup in upstream units failed, the path is also forced to escalate_to_human.

That single unit, with four constraint blocks written down, is now operable. The risk envelope is bounded. The cost is predictable. The escalation rate can be measured and tuned. The failure modes are visible. The team can put this in front of real customers and sleep.

Without those constraints, the same workflow is a customer-facing AI agent that can issue refunds at unspecified thresholds, return outputs in unspecified shapes, and fail in unspecified ways. That version of the system gets pulled from production within a week, and the team writes a postmortem about the model. The model was fine. The constraints were missing.

What under-constraint looks like in production

Under-constrained AI has a recognizable signature in production. If you see any of these, the constraint work was skipped.

Each of those is a constraint dimension that was left to the model to fill in at runtime. The fix is the same in every case: write the constraint, enforce it in the system, and let the model do the bounded work inside.

Every production AI failure I have personally debugged was a missing constraint, not a bad model. The model was doing exactly what the unconstrained spec asked it to do.

How constraints evolve over time

Constraints are not permanent. They are the current operating spec, written tight on day one because trust has not been earned yet. As the system proves itself in production, constraints can be loosened.

The pattern looks like this. Ship the workflow with the refund threshold at $50 and the auto-resolve confidence cutoff at 0.9. Watch production for a quarter. If the auto-resolve rate is too low because confidence rarely clears 0.9, lower the cutoff to 0.85 and measure again. If the refund threshold is too low and humans are spending half their time approving small refunds, raise it to $100.

Every constraint relaxation is a deliberate decision with a measured baseline behind it. Constraints do not loosen because someone got tired of seeing escalations. They loosen because the production data shows the loosened constraint will still hold the risk envelope. This is how an AI program moves from pilot to scale without losing reliability. It is also how AI governance keeps from killing velocity: governance lives in the constraints, and the constraints evolve based on receipts.

The teams that never relax constraints end up with a slow, over-cautious system that human-reviews everything. The teams that relax constraints without measurement end up with a fast, dangerous system that human-reviews nothing. The right path is the deliberate one in between.

Where Constrain fits in the larger framework

Constrain is the third of the five V1 steps. By the end of Constrain, you have a decomposed workflow with every unit's inputs, outputs, authority, and failure modes written down. The system is now buildable, operable, and verifiable, even though no prompt has been written yet. That is the V1 ratio in action: 90% thinking, 10% prompting.

Step 4 will turn the constraints into acceptance criteria: what does a good output look like, and how do we know? Step 5 will render the result into the prompt. By the time you get there, the prompt is almost trivial, because Strip cleaned the brief, Decompose named the units, and Constrain bounded each one.

The bottom line

V1 Constrain is the step that turns a decomposed workflow into a system you can operate. It is the work of writing the boundaries on inputs, outputs, authority, and failure modes for every unit. Constraint is liberating, not limiting. Constrained AI is faster, cheaper, and more reliable than its unconstrained cousin. The teams that ship production AI write the constraints first. The teams that ship demos do not.

The next step is V1 Step 4: Define Done, where the constraints get converted into acceptance criteria the system can be scored against.


FAQ

What is V1 Step 3?

V1 Step 3 is Constrain, the third step of the V1 Framework. It is the work of defining the real boundaries of each unit of AI work: what inputs are valid, what outputs are valid, what authority the AI has, and what failure modes are acceptable. Constraints are the operating spec for the system.

Why do constraints make AI better?

Constrained AI is faster, cheaper, and more reliable than unconstrained AI. Constraints narrow the model's search space, which reduces hallucination, lowers token cost, and makes outputs verifiable. Unconstrained AI looks impressive in a demo and fails unpredictably in production.

What is the difference between constraints and instructions?

Instructions tell the AI what to do. Constraints define what the AI is allowed to do. Instructions live in the prompt. Constraints live in the system design: the input schema, the output schema, the tool authority, the escalation rules. Instructions without constraints produce unpredictable systems.

How do you decide what to constrain?

Constrain the four dimensions: inputs, outputs, authority, and failure modes. For each unit of work from Step 2, write down what valid inputs look like, what valid outputs look like, what the AI is allowed to decide, and what happens when the AI cannot decide. If a dimension is left open, it will be filled in unpredictably at runtime.

Can constraints change over time?

Yes. Constraints are not permanent. They are the current operating spec. As the system proves itself in production and earns trust, constraints can be relaxed: the authority threshold rises, the input schema widens, the human review rate drops. Loosening constraints is a deliberate decision based on observed performance, not a default state.

What happens when AI hits a constraint?

It escalates. The constraint defines the boundary; the escalation path defines what happens when the boundary is reached. A well-designed system has explicit escalation paths for every constraint: refund over threshold goes to a human, malformed input gets rejected, low-confidence output gets reviewed. Constraints without escalation paths are just hopes.

About the author

Nicholas Harris is an AI-native operator at the intersection of generative AI and consumer growth. He is President at CreativeOS, an AI-powered SaaS platform serving 25,000+ brands, and Founder at Automatic, an AI consultancy. The V1 Framework is the methodology behind every production AI system he has shipped, including production LLM, image-generation, and AI agent workflows used by consumer brands at scale.

Prior to CreativeOS, he delivered 110.6% e-commerce revenue growth at NASM, an 11x EBITDA exit at SplitTesting.com, and 27.8% average conversion lift across the Acadia DTC portfolio. He is currently open to VP AI, AI Transformation, Head of Growth, and Fractional CTO roles at consumer-facing companies. Based in Mesa, AZ. Remote or Phoenix metro preferred.

Get in touch