Prompt Engineering for Quality: Crafting Prompts That Produce Production Code

Most AI-generated code works. That is the easy part. The hard part is generating code that works six months from now when someone else needs to modify it, when the API changes its response shape, when the database has 10 million rows instead of 10.

The default output from any AI model optimizes for a single question: does it run? That is the wrong question. The right question is: can it be maintained, tested, debugged, and extended?

The Default AI Output Problem

Give any AI model this prompt: "Write a function to fetch user data from the API."

You will get a function. It will work. It will have no error handling for network failures, no input validation, no retry logic, no types, and no tests. It will use magic strings and hardcoded URLs. It will be a maintenance nightmare disguised as a green checkmark.

This is not the AI's fault. The AI optimized for exactly what you asked for: a function that fetches user data. You did not ask for production quality. You got what you specified.

The gap between left and right is not a model upgrade. It is not a temperature change. It is not a different API. It is a better prompt.

System Prompts as Quality Enforcers

Individual prompts set the ceiling for individual generations. But system prompts — and specifically CLAUDE.md files — set the floor for every generation in a project.

A CLAUDE.md file is a constitution. It tells the AI: these are the rules that govern every line of code you produce in this codebase. Not suggestions. Rules.

We run a CLAUDE.md on every project at Tesseract Intelligence. The file specifies git workflow, quality gates, testing standards, naming conventions, and explicit anti-patterns. Every agent that touches the codebase inherits these constraints automatically.

The result: AI-generated code that follows our patterns from the first generation, not after three rounds of review feedback.

The "Before You Write Code" Pattern

This is the most underused prompt pattern in AI-assisted development. Before asking the AI to write any implementation, ask it to identify failure modes.

The prompt looks like this:

"Before writing any code, analyze this feature and list: (1) all failure modes, (2) edge cases, (3) error scenarios, (4) state transitions that could go wrong. Then write the implementation that handles all of them."

This forces the AI to think about what can break before it writes anything. The output is dramatically different from "just write it."

We learned this building Foresight, our trading engine. A prompt that said "write a function to place an order" produced code that did not handle: API rate limits, insufficient balance, stale price data, concurrent duplicate orders, or network timeouts during submission. A prompt that first asked for failure modes produced code that handled all five.

The Test-First Prompt Pattern

This pattern inverts the default AI workflow. Instead of: prompt for code, then prompt for tests — you prompt for tests first, then prompt for implementation that satisfies those tests.

The key insight: when tests come first, they become the specification. The AI writes implementation code that satisfies YOUR definition of correct behavior, not its own assumptions about what "correct" means.

The prompt structure:

"Given this function signature: createUser(email: string): Promise<Result<User, CreateUserError>> — write test cases covering: (1) valid email creates user with correct fields, (2) duplicate email returns DuplicateError, (3) invalid email format returns ValidationError, (4) database connection failure returns InfraError. Write only the tests. Do not write the implementation."

Then, separately:

"Here are the tests. Write the implementation that makes all of them pass."

This is how the InDecision Framework approaches decision-making under uncertainty: define what success looks like before you act. Tests first means defining success before generating code.

Constraint Prompting: The Power of No

Positive instructions tell the AI what to do. Constraint prompting tells the AI what NOT to do. In practice, negative constraints are often more powerful because they close specific failure modes.

The pattern:

"Write this function. Do NOT use any as a type. Do NOT catch errors silently — every catch block must log and re-throw or return a typed error. Do NOT use magic strings — extract all constants. Do NOT mutate input parameters."

Each constraint maps to a specific class of production bug. any types cause runtime crashes. Silent catches hide failures. Magic strings cause typo-driven incidents. Parameter mutation causes spooky action at a distance.

Constraints are the engineering equivalent. You are not adding features to the prompt. You are removing failure modes from the output.

Combining the Patterns

The most effective prompts combine all three patterns:

Pre-analysis: "Before writing code, list failure modes and edge cases."
Test-first: "Write tests for all identified scenarios. Then write the implementation."
Constraints: "Do NOT use these anti-patterns. Follow these conventions."

This is not about writing longer prompts. It is about writing prompts that produce code you would actually merge into a production codebase — code that handles failure, is tested, and follows your team's patterns.

Lesson Drill

Take a function you recently asked AI to generate. Rewrite the prompt using all three patterns: pre-analysis, test-first, and constraint prompting. Compare the output quality.
Create a CLAUDE.md file for one of your projects. Include at minimum: quality gates, testing standards, and three explicit anti-patterns to ban. Use it in your next AI coding session.
Pick one module in your codebase. Ask the AI to list all failure modes before touching any code. Count how many you had not considered. That number is your current quality gap.