Ask Knox

When AI output is wrong, most operators do the same thing: rewrite the prompt and run it again, hoping the new version works better.

That is not debugging. That is guessing with extra steps.

Real prompt debugging is systematic. Bad output is not random — it is diagnostic. Each category of failure has a signature, a root cause, and a category of fix. Operators who can read the failure mode and apply the right fix are ten times faster at producing working prompts than operators who throw random rewrites at the problem.

The Four Failure Modes

Four failure modes cover the vast majority of what goes wrong in AI output.

Hallucination

Signature: The model produces confident, specific, plausible-sounding content that is factually incorrect or not supported by the provided context.

Root cause: The model is filling a gap in its context with training priors rather than with facts you provided. Common triggers: asking about proprietary information the model cannot know, time-sensitive data, or domain-specific facts not in the context.

Fix strategy: Inject the correct facts as explicit context. If the model is hallucinating an answer, it is because the answer is not in the context. Add it. Use the formulation: "Based only on the information provided above, answer the following question. If the answer is not present in the provided information, say 'I don't have that information' rather than speculating."

Refusal

Signature: The model declines to complete the task, citing safety concerns, policy constraints, or inability to help — even when the task is legitimate and not actually violating any policy.

Root cause: Two categories. First, genuine policy alignment — the task is actually in a restricted area. Second, false positive — the model is pattern-matching superficial features of the request to restricted categories without understanding the legitimate intent.

Fix strategy: For false positives, reframe the task to make legitimate intent explicit. "Write a scene depicting a conflict" will trigger fewer false positives than "write a violent scene." Provide context that establishes legitimate purpose. If the task is genuinely restricted, it is restricted — do not attempt jailbreaks.

Drift

Signature: Output starts on the right track, then gradually diverges from the original task — especially in long-form outputs or multi-turn conversations.

Root cause: Task specification was too vague, allowing the model to gradually reinterpret what it is doing. In multi-turn conversations, the task definition from early in the conversation loses influence as the context window fills with recent turns.

Fix strategy: Tighten the task specification — convert "analyze this" into "identify exactly these three things." For multi-turn drift, re-anchor by restating the task in the current turn: "Reminder: the goal of this analysis is [specific goal]. Continuing from where we left off..."

Format Break

Signature: The output content is acceptable but the structure is wrong — JSON with a missing field, prose instead of a table, extra preamble before the actual answer, or missing sections.

Root cause: Format instructions were underspecified, or the model's training bias toward verbose conversational output is overriding explicit format constraints.

Fix strategy: Make the format specification more explicit and more constraining. Instead of "return as JSON," specify: "Return only a valid JSON object with exactly these keys: [keys]. No preamble, no explanation, no markdown code blocks — just the raw JSON object starting with {." Providing a format example is often more effective than describing the format.

The Debugging Loop

The four-step loop for systematic prompt debugging:

Step 1: Observe. Do not start fixing immediately. Characterize the failure precisely. What was expected? What actually happened? Is this hallucination, refusal, drift, or format break? What specifically is wrong — which part of the output fails? This step takes 60 seconds and saves you from misdiagnosing the root cause.

Step 2: Hypothesize. Form a specific hypothesis about why the failure occurred. "The model is hallucinating the pricing because pricing data was not in the injected context." "The model is drifting because the task was stated as 'analyze this' with no specific deliverables." "The model is breaking format because I described JSON but didn't provide a schema." One hypothesis at a time.

Step 3: Modify. Change exactly one thing — the thing your hypothesis says is wrong. Not the whole prompt. One variable. If your hypothesis is that context is missing, add only that context. If your hypothesis is that the task is vague, tighten only the task. Changing multiple things at once makes it impossible to know what fixed the problem — or what caused the new problem.

Step 4: Test. Run the modified prompt. Compare the output to your expected result and to your success criteria. Did it fix the failure mode? Did it introduce a new failure mode? Document the result before looping back to Step 1.

Debugging by Failure Mode

Failure Mode	Primary Fix	Secondary Fix
Hallucination	Inject the missing facts	Add "based only on provided context"
Refusal	Reframe with explicit legitimate intent	Move to system prompt context
Drift	Tighten task specification	Re-anchor in current turn
Format break	Add concrete format example	Use more constraining language

Building Diagnostic Instincts

After debugging 50 prompts systematically, the instincts become automatic. You read the output and immediately recognize the failure signature without thinking through the taxonomy. This is a skill that compounds — every prompt you debug teaches you something about what triggers each failure mode in your specific domain and use case.

The operators who are fast at prompt debugging have not memorized a checklist. They have internalized a diagnostic pattern through repetition. The checklist is how you build the instincts.

Lesson Drill

Take any prompt that has previously produced unsatisfactory output. Apply the debugging loop explicitly:

Characterize the failure — which of the four modes is it?
Write a one-sentence hypothesis about root cause.
Change exactly one thing based on the hypothesis.
Run. Document what changed.
Repeat until resolved.

Track how many loop iterations it takes to fix the prompt. Over time, watch that number drop. That drop is your diagnostic instinct compounding.

Bottom Line

Prompt debugging is not guessing — it is a diagnostic skill. Four failure modes cover the landscape: hallucination (missing context), refusal (task framing), drift (vague task definition), format break (underspecified output). The four-step loop — Observe, Hypothesize, Modify, Test — is the systematic method for resolving each. Change one variable at a time. Document every result. The next lesson covers the model parameters that sit underneath the prompts.

Prompt Debugging — When AI Gets It Wrong