Production Auth Checklist — Everything That Can Go Wrong
Env vars that silently don't reach Production. Redirect URIs that must match exactly. Tokens that exceed 4KB. Two clients racing for the same lock. The complete diagnostic playbook from 8 hours of production debugging.
Authentication failures in production have a characteristic that makes them especially painful to debug: they are usually silent.
The user clicks "Login." Something goes wrong in the background. The user ends up on the home page, not logged in. No error message. No stack trace. No visual indication of what failed. Just: not logged in.
This lesson is the compiled playbook from the April 2026 debugging session — every failure mode encountered, diagnosed, and resolved across 8 hours and 12 PRs. Use it as your production auth diagnostic guide.
The Failure Taxonomy
Auth failures cluster into six categories. Identify which category your failure belongs to, then follow the diagnostic steps.
Category 1: Environment Variable Failures
Symptom: "Auth service unavailable" or "Invalid API key" errors. Works locally. Fails in production or preview.
Diagnostic:
# Check which env vars are present in Vercel Production
vercel env ls
# Look for NEXT_PUBLIC_SUPABASE_URL and NEXT_PUBLIC_SUPABASE_ANON_KEY
# If missing from Production → Doppler sync did not include Production environment
The Doppler trap: The Vercel + Doppler integration requires you to select which Vercel environments receive the secrets. It defaults to Development and Preview. Production must be explicitly included. Many engineers set this up, verify it works in Preview, and never notice that Production is excluded.
Fix: In Doppler → Integrations → Vercel, edit the sync configuration and add Production to the target environments. Re-trigger the sync. Verify with vercel env ls.
Category 2: Redirect URI Mismatch
Symptom: "redirect_uri_mismatch" error from the OAuth provider, or unexpected redirect to the wrong domain.
Diagnostic checklist:
- Discord Developer Portal → OAuth2 → Redirects: is your Supabase callback URL listed?
(
https://YOUR_PROJECT.supabase.co/auth/v1/callback) - Supabase Dashboard → Authentication → URL Configuration → Redirect Allowlist: is your app's callback URL listed?
(
https://academy.jeremyknox.ai/**) - Does your
redirectToparameter exactly match a registered URL? Check for http vs https, www vs non-www, trailing slash, port number.
The most common mismatch patterns:
Registered: https://academy.jeremyknox.ai/auth/callback
Sent: https://www.jeremyknox.ai/auth/callback ← subdomain differs
Registered: https://academy.jeremyknox.ai/auth/callback
Sent: https://academy.jeremyknox.ai/auth/callback/ ← trailing slash
Registered: https://academy.jeremyknox.ai/auth/callback
Sent: http://academy.jeremyknox.ai/auth/callback ← http vs https
Category 3: Cookie Size Overflow
Symptom: Session is established on the callback page but lost immediately on the next request. Supabase sometimes shows "session not found" or the user appears logged out after one page navigation.
Root cause: Browser cookies are limited to 4KB per cookie. A Supabase session token with a Discord OAuth payload (which includes guild memberships, avatar, roles, etc.) can easily reach 5-7KB. The token does not fit in a single cookie.
Fix: Use @supabase/ssr, not @supabase/supabase-js, for server-side session management. The SSR package automatically chunks the session across multiple cookies: sb-auth-token.0, sb-auth-token.1, etc. Each chunk is under 4KB.
If you are already using @supabase/ssr and still seeing this issue, verify that your middleware's setAll implementation sets ALL cookies in the response, not just the first one.
Category 4: Lock Contention
Symptom: Session establishment is intermittent. Sometimes works, sometimes silently fails. Logs show a lock timeout or "storage already acquired" error.
Root cause: The Supabase client uses a lock (lock:sb-{project-ref}-auth-token) to prevent concurrent writes to the session storage. If two clients race to call exchangeCodeForSession, or if detectSessionInUrl fires while your callback route handler is also calling exchangeCodeForSession, one will acquire the lock and the other will timeout.
Common scenarios:
// Scenario 1: Callback route + detectSessionInUrl racing
// callback/route.ts calls exchangeCodeForSession
// AND client has detectSessionInUrl: true
// Both fire on the same ?code= parameter
// Scenario 2: Two client instances
const supabase1 = createBrowserClient(url, key) // in a hook
const supabase2 = createBrowserClient(url, key) // in a useEffect
// Both have detectSessionInUrl: true by default
// Both attempt to process ?code= on page load
Fix: Use a single shared Supabase client instance (module-level singleton). If you have a callback route that calls exchangeCodeForSession, disable detectSessionInUrl on the client, or remove the manual call from the route and rely on detectSessionInUrl instead — but never both.
Category 5: www Redirect Eating State
Symptom: PKCE flows fail specifically in production. Works on localhost. Works when accessing www. directly. Fails when accessing the apex domain.
Root cause: Vercel commonly redirects jeremyknox.ai to www.jeremyknox.ai. This 307 redirect loses:
- URL hash fragments (breaking Implicit flow)
sessionStoragevalues (different origin after redirect)- PKCE verifier (stored before redirect, gone after)
Diagnostic:
# Test the redirect behavior
curl -I https://jeremyknox.ai
# Look for:
# HTTP/1.1 307 Temporary Redirect
# location: https://www.jeremyknox.ai
Fix: Configure your auth flow to always start from and return to www.jeremyknox.ai. Update Site URL in Supabase to match the www version. Add explicit HSTS and canonical links to prevent apex domain traffic.
Category 6: Provider Configuration Mismatch
Symptom: Discord login button does nothing, or fails at the Discord consent screen.
Checklist:
- Discord Developer Portal → OAuth2: is the application enabled?
- Discord Developer Portal → OAuth2 → Redirects: is
https://YOUR_PROJECT.supabase.co/auth/v1/callbacklisted? - Supabase Dashboard → Authentication → Providers → Discord: is Discord enabled? Are the Client ID and Secret filled in?
- Supabase Client ID in dashboard matches the Client ID in Discord Developer Portal?
The Full Diagnostic Flowchart
User reports: "Login isn't working"
↓
Q: Is there an error message visible?
├── Yes → read the error text carefully
│ ├── "redirect_uri_mismatch" → Category 2
│ ├── "invalid_client" or "invalid API key" → Category 1
│ └── "auth code already used" → Category 4
└── No → silent failure
↓
Q: Does ?code= appear in the callback URL?
├── No → OAuth never completed
│ ├── Check: Discord app enabled? → Category 6
│ └── Check: redirectTo in allowlist? → Category 2
└── Yes → code arrived but exchange failed
↓
Q: Is there a PKCE verifier in sessionStorage?
├── No → verifier lost during redirect
│ ├── Check: did the flow cross origins? → Category 5
│ └── Check: flowType mismatch? → Lesson 3
└── Yes → verifier present, exchange failed
↓
Q: Are there multiple Supabase clients on the page?
├── Yes → lock contention → Category 4
└── No
↓
Q: Does auth work in Preview but not Production?
├── Yes → env vars not synced to Production → Category 1
└── No
↓
Q: Is the user logged out after navigating away from callback?
└── Yes → cookie too large → Category 3
Case Study: The April 2026 Timeline
The full incident as it unfolded:
Hour 1: Env vars were not synced to Vercel Production. Auth failed silently — the Supabase client initialized with undefined credentials. Fixed with Doppler sync update.
Hour 2: Auth callback worked, but users were redirected to jeremyknox.ai instead of www.jeremyknox.ai. Site URL misconfigured in Supabase dashboard. Fixed to https://www.jeremyknox.ai.
Hour 3-4: PKCE verifier loss. Auth started on www.jeremyknox.ai, callback landed on academy.jeremyknox.ai. Different origins. Attempted multiple storage workarounds (all failed — see Lesson 5).
Hour 5-6: Switched to SSR, but cookies scoped to exact subdomain only. Main site still could not read academy's session. Added Domain=.jeremyknox.ai to cookie options.
Hour 7: Domain-scoped cookies were overwritten on subsequent requests — middleware was resetting them without the Domain attribute. Fixed by applying Domain override consistently in middleware AND server client AND callback route.
Hour 8: www site's login button still triggered its own OAuth flow instead of redirecting to the academy hub. Updated all login links to point to academy.jeremyknox.ai/auth/login.
Resolution: The auth system has been stable since. The final architecture: academy as auth hub, parent domain cookies, main site reads cookies via createBrowserClient. Zero auth incidents in the weeks following.
The Pre-Deployment Auth Checklist
□ Supabase Site URL matches the exact origin of your auth callback
□ Redirect allowlist covers all deployment environments (local, preview, production)
□ Discord (or other provider) callback URL registered in provider portal
□ Provider enabled in Supabase dashboard with correct Client ID and Secret
□ Env vars present in ALL Vercel environments (verify with `vercel env ls`)
□ Using @supabase/ssr (not @supabase/supabase-js) for server-side auth
□ Domain attribute on cookies matches your subdomain strategy
□ Only ONE Supabase client instance per context (no duplicates)
□ No race condition between detectSessionInUrl and manual exchangeCodeForSession
□ PKCE flowType set on client (matches GoTrue's default)
□ Cookie size tested with realistic payloads (Discord metadata can be large)
□ Auth tested on production domain directly (not just localhost or preview)
Run this list before every auth-related deployment. It takes 5 minutes. The April 2026 incident took 8 hours.