Ask Knox

Exploratory queries waste time. Hypothesis-driven queries diagnose bugs.

The difference is whether the query was written to answer a specific question with a clear interpretation. Exploratory SQL says "show me some data." Hypothesis-driven SQL says "if component X is dead, the zero rate will be above 50%; let me check."

The Template

For any suspected dead scoring component, run this first:

SELECT
  COUNT(*) AS total,
  COUNT(*) FILTER (WHERE grok_score = 0) * 100.0 / COUNT(*) AS grok_zero_pct,
  COUNT(*) FILTER (WHERE perplexity_score = 0) * 100.0 / COUNT(*) AS perplexity_zero_pct,
  COUNT(*) FILTER (WHERE calibration_score = 0) * 100.0 / COUNT(*) AS calibration_zero_pct,
  COUNT(*) FILTER (WHERE news_score = 0) * 100.0 / COUNT(*) AS news_zero_pct
FROM agent_signals
WHERE created_at > NOW() - INTERVAL '14 days';

One row of output. Four percentages. The result of running this against Agent Framework was:

total	grok_zero_pct	perplexity_zero_pct	calibration_zero_pct	news_zero_pct
64	12.5	18.8	89.1	4.7

Three components are firing normally. One is dead. The diagnosis is inescapable and takes thirty seconds.

The Percentile Query

For components that are firing but producing suspicious values, run the percentile distribution:

SELECT
  percentile_cont(0.10) WITHIN GROUP (ORDER BY calibration_score) AS p10,
  percentile_cont(0.25) WITHIN GROUP (ORDER BY calibration_score) AS p25,
  percentile_cont(0.50) WITHIN GROUP (ORDER BY calibration_score) AS p50,
  percentile_cont(0.75) WITHIN GROUP (ORDER BY calibration_score) AS p75,
  percentile_cont(0.90) WITHIN GROUP (ORDER BY calibration_score) AS p90,
  MAX(calibration_score) AS max_val
FROM agent_signals
WHERE calibration_score > 0
  AND created_at > NOW() - INTERVAL '14 days';

This reveals whether a firing component is clustering where expected or drifting. If p50 is much lower than your design expected, the component is under-contributing even when it fires. If p90 is close to the component's theoretical max, the component is saturating.

The Clearance Query

For scoring systems with a threshold, the single most important diagnostic is the clearance rate:

SELECT
  COUNT(*) AS total_signals,
  COUNT(*) FILTER (WHERE composite_score >= 70) AS cleared,
  COUNT(*) FILTER (WHERE composite_score >= 70) * 100.0 / COUNT(*) AS clearance_pct,
  AVG(composite_score) AS avg_score,
  MAX(composite_score) AS max_score
FROM agent_signals
WHERE created_at > NOW() - INTERVAL '14 days';

If cleared = 0, the system is not trading. If max_score < 70, even the best recent signal could not clear the bar — the ceiling itself is the problem. If avg_score is far below the threshold, the rubric is misaligned with the distribution of inputs.

Inline Diagram — Query Ladder

The Discipline

The three queries above map to the three most common failure hypotheses for a scoring system: a dead component (zero-rate), a mis-contributing component (percentile), and an unreachable threshold (clearance). Run them in order. Most bugs reveal themselves at the zero-rate or percentile step. The rare ones that survive all three are genuinely subtle; only then do you escalate to a per-signal detail query that pulls the full component breakdown for the top recent signals. But until a query has falsified the simple hypotheses, do not assume the bug is subtle.

The Rule