The CLI Fallback
principal-halt works without the broker. Zero-dependency design — pure bash, direct launchctl, raw SQLite. When your safety infrastructure fails, your safety system cannot depend on it.
Every safety system contains an assumption: that its infrastructure is available when you need it.
The Principal Broker's kill switch works through a Python class, FastAPI endpoints, NATS subscriptions, and database connections. Under normal conditions, this is fine. But "when something is badly wrong" is precisely the scenario where your infrastructure is least likely to be fully operational.
The broker could be the thing that crashed. The broker could be the thing that's misbehaving. The broker could be hanging while the trading bots continue running with no oversight. In any of these cases, triggering the kill switch through the broker is not an option.
principal-halt exists for this. It is a bash script. It has no dependencies beyond launchctl, ssh, and sqlite3 — tools that are always available on a macOS system. It implements the same four-level halt sequence as the Python kill switch, independently, with no shared code.
Architecture of the Fallback
The script's daemon lists are defined in pure bash arrays, explicitly synchronized with the Python module:
# ---- Daemon lists (sourced from broker/safety/kill_switch.py) ----------------
TRADING_DAEMONS=(
"com.host.foresight"
"com.host.sports-agent"
"com.host.political-agent"
"com.host.perpetuals-bot"
)
ALL_HOST_DAEMONS=(
"com.host.foresight"
"com.host.sports-agent"
"com.host.political-agent"
"com.host.perpetuals-bot"
"com.host.indecision-bot"
"com.host.indecision-price-alerts"
"com.host.djed"
"com.host.principal-broker"
)
# These survive ALL shutdown levels
PROTECTED_DAEMONS=(
"com.host.watchdog"
"com.host.nats"
"com.host.sentinel"
)
This duplication is intentional. The script does not import from the Python module. It does not call the Python module. It does not curl the broker. It is completely standalone. The cost of this is that when the daemon lists change in Python, they must be updated in the script too. This is tracked via the comment "sourced from broker/safety/kill_switch.py" — it is a maintenance obligation, not an oversight.
The protected daemons check is implemented in pure bash:
is_protected() {
local daemon="$1"
for p in "${PROTECTED_DAEMONS[@]}"; do
[[ "$p" == "$daemon" ]] && return 0
done
return 1
}
stop_daemon() {
local daemon="$1"
if is_protected "$daemon"; then
log " SKIP (protected): $daemon"
return 0
fi
if launchctl stop "$daemon" 2>/dev/null; then
log " STOPPED: $daemon"
return 0
else
log " WARN: launchctl stop $daemon returned non-zero (may not be loaded)"
return 0
fi
}
Notice that stop_daemon returns 0 even when launchctl stop fails. This is intentional. A non-zero return from launchctl stop typically means the daemon was not loaded — which is fine during an emergency halt. The important thing is that the script doesn't exit early on a warning. It logs it and continues to the next daemon.
Audit Trail Without a Database
The Python kill switch persists halt state to SQLite via HaltStateStore. The CLI cannot depend on that. Instead, it writes to a simple append-only text file:
AUDIT_LOG="${HOME}/.principal/shutdown_audit.log"
write_audit() {
local level="$1"
local detail="$2"
ensure_audit_dir
printf '%s | LEVEL %d | manual-cli | %s\n' "$(ts)" "$level" "$detail" \
>> "$AUDIT_LOG"
}
The format is: timestamp, level, source (manual-cli so it's distinguishable from broker-triggered halts), and detail. This file is not structured data — it is a plain text record that can be read with cat from anywhere without needing database access.
The ensure_audit_dir call creates the directory if it does not exist. This handles the case where the script is run before the broker has ever initialized its state directory.
Level 1: Intent Logging
The Level 1 implementation is where the CLI fallback is most honest about its limitations:
level_1() {
shift # remove the level arg
local assets=("$@")
if [[ ${#assets[@]} -eq 0 ]]; then
echo "Usage: principal-halt 1 <asset1> [asset2 ...]" >&2
exit 1
fi
local asset_list
asset_list=$(IFS=,; echo "${assets[*]}")
log "Level 1: halting assets [${asset_list}]"
log "NOTE: Asset-level halts require bot-specific commands. This event has"
log " been logged. Send halt directives to each bot manually if broker"
log " is unavailable, or use: curl -X POST http://localhost:8400/halt"
log " with {\"level\": 1, \"assets\": [\"${asset_list}\"]}"
write_audit 1 "Asset halt requested: ${asset_list}"
log "Level 1 logged successfully."
}
Level 1 in the CLI does not actually stop asset trading — it logs the intent and provides the operator with the correct curl command if the broker is reachable. Asset-level halts are bot-specific mechanisms that cannot be replicated in pure bash without calling each bot's own API.
This is the right tradeoff. The CLI correctly identifies what it cannot do and gives the operator the information to do it manually. Claiming to halt assets when it cannot actually do so would be worse than being transparent about the limitation.
Level 4: Token Revocation in Bash
The Python _revoke_all_tokens() uses bcrypt to generate a properly-formatted invalid hash. Bash does not have bcrypt. The script generates a random hex string instead:
# Step 3: Revoke agent tokens by zeroing registry DB auth hashes
log "--- Step 3: Revoking agent tokens ---"
if [[ -f "$REGISTRY_DB" ]]; then
# Generate a random 64-char hex string to overwrite all token hashes.
# This is not a valid bcrypt hash, so all bearer token validations fail.
local poison_hash
poison_hash=$(LC_ALL=C tr -dc 'a-f0-9' < /dev/urandom 2>/dev/null | head -c 64 || true)
if [[ -z "$poison_hash" ]]; then
# Fallback: use date + PID for entropy
poison_hash=$(printf '%s-%d-REVOKED' "$(ts)" "$$" | shasum | cut -c1-64 || echo "REVOKED-BY-PRINCIPAL-HALT-LEVEL4")
fi
if sqlite3 "$REGISTRY_DB" \
"UPDATE agent_registry SET auth_token_hash = '${poison_hash}' WHERE 1=1;" 2>/dev/null; then
log " Agent tokens revoked in registry DB."
else
log " WARN: Could not update registry DB at ${REGISTRY_DB}. Tokens may still be valid."
fi
else
log " NOTE: Registry DB not found at ${REGISTRY_DB} — broker not yet installed or path differs."
fi
A random 64-character hex string is not a valid bcrypt hash format. Any token validation that checks the stored hash against a presented bearer token will fail. The tokens are effectively revoked.
The fallback for entropy generation (shasum of timestamp and PID) exists because /dev/urandom with tr can occasionally fail in constrained environments. The double fallback (echo "REVOKED-BY-PRINCIPAL-HALT-LEVEL4") is a last resort — even a constant string will invalidate all tokens since it will never match a valid bcrypt hash.
The Confirmation Gate
Level 4 requires typing the exact phrase:
level_4() {
echo ""
echo "WARNING: PRINCIPAL LEVEL 4 SHUTDOWN"
echo "This will stop ALL services on the primary host and trading server."
echo "Agent auth tokens will be revoked."
echo ""
printf 'Type "SHUTDOWN INVICTUS" to confirm: '
read -r confirm
if [[ "$confirm" != "SHUTDOWN INVICTUS" ]]; then
echo "Cancelled."
exit 0
fi
# ...
}
The phrase matches the Python kill switch. In a 2am emergency, having the same phrase across both interfaces reduces cognitive load. You don't have to remember "which system uses which phrase" — it is always "SHUTDOWN INVICTUS."
Trading Server Fallback Instructions
If the trading server is unreachable during Level 4:
if ssh -o ConnectTimeout=5 -o BatchMode=yes \
"${TRADING_SERVER_USER}@${TRADING_SERVER_IP}" \
"$trading_server_cmds" 2>/dev/null; then
log " Trading server daemons stopped."
else
log " WARN: Trading server unreachable or SSH failed. Manual halt required."
log " SSH to ${TRADING_SERVER_IP} and run: ${trading_server_cmds}"
fi
The script logs the exact command string needed to halt the trading server manually. When you get the warning, you open a new terminal, SSH to the trading server yourself, and run what the log tells you to run. No guessing required.
The Log File
All output goes to /tmp/principal-halt.log via tee. This means output is visible on screen in real time and simultaneously written to disk. If you're triggering this from a phone screen with poor visibility, the log file is there when you get back to a proper terminal.
LOG=/tmp/principal-halt.log
log() {
local msg="[$(ts)] $*"
echo "$msg" | tee -a "$LOG"
}
The timestamp format is ISO 8601 UTC (date -u "+%Y-%m-%dT%H:%M:%SZ"). Every log line is timestamped. The post-incident review will have a precise timeline without any ambiguity about local vs. UTC time.
Why This Exists Separately
The CLI fallback would be unnecessary if the broker were perfectly reliable. But reliability is not the right frame. The frame is: under what conditions will you need to halt the system, and are those conditions correlated with the broker being available?
If the broker has a bug that causes it to authorize bad trades, the broker is the problem. If the broker is consuming 100% CPU and hanging, the broker is unavailable. If the broker is getting DOS'd by a misbehaving agent flooding the message queue, the broker might not respond.
In all of these scenarios, the answer to "can you still halt the system?" needs to be yes. principal-halt is that yes.
Zero-dependency safety infrastructure is not over-engineering. It is the only kind that works when it actually needs to.