// Track
Testing AI-Adjacent Systems
Evaluation, Audit, and Quality Assurance for AI Pipelines
Design evaluations for agent outputs, run audit swarms, handle knowledge cutoff as a testing concern, and build LLM-as-judge systems for automated quality scoring. Drawn from real audit runs across Knox's fleet — including the SP-001 false positive incident and the Autoresearch prompt quality system.
3 lessons~26 min total
Lessons are shown in recommended order. Complete them in sequence for the best experience — or jump to any lesson.