/ Infrastructure / Autopilots Team only
Infrastructure

Autopilots runbook

Three live systems, what they watch, and what to do when one fires.
1

Overview

Autopilot Repo Hosting Supabase project Clients
HH Google Ads hh-google-ads-autopilot Vercel fcwuqbexvggjvwgdpnya 14 HH clients with Google Ads
HH Meta Ads hh-meta-ads-autopilot Vercel fcwuqbexvggjvwgdpnya HH clients with Meta Ads
MM Google Ads mm-google-ads-autopilot Vercel texktztyyacsvdkgxbkr 8 MM mechanic clients

All three post approval cards to Slack and log actions to ClickUp. The MM autopilot is internally nicknamed Lightning McQueen.

2

HH Google Ads

Detection logic

Keyword pauseSpend ≥ $50 with 0 conversions (14-day lookback), or CPA > 3× ad group average
Negative keywordSearch term spend ≥ $20 with 0 conversions, matched against irrelevant terms list. Max 10 per campaign per run
Ad rewriteCTR below 1% after 1,000+ impressions. Minimum 2 active ads per group (won't rewrite the last one)
Circuit breakerCumulative spend impact > 15% bumps all actions to Tier 2. Max 5 actions per ad group

Chain runner architecture

Clients are split into two chains (A/B) processed sequentially via /api/run-chain and /api/run-chain-b. Each chain uses Vercel after() to hand off to the next segment. A chain watchdog cron fires every 5 minutes during the run window to rescue stalled chains. Per-client timeout: 240 seconds (60s headroom under Vercel's 300s max).

Pipeline per client: fetchAllClientData()parseAllData()runRulesEngine()runAIAnalysis() (6 Claude sub-agents: Keyword, Negative Keyword, Ad Evaluation, Headline, Description, Reviewer) → classifyActions()applyCircuitBreaker() → execute or queue for approval.

Chain halt was fixed in PRs #8, #10, and #13. The reject/intervene modal sync path was fixed in PR #12. PR #11 (ClickUp subtask ID capture) is open as of June 2026.

Cron schedule (all times AEST)

JobSchedulePurpose
Full pipelineMonday 7:00 AMAll 14 clients, chained A/B
Weekly digestMonday 9:00 AMPortfolio summary to Slack
Anomaly checkMon-Fri 8:00 AMDaily anomaly detection + Slack alerts
Monthly report1st of month 7:00 AMMonthly trend reports

Action tiers

Tier 1 Pause keywords, add negative keywords (auto in tiered mode)
Tier 1.5 Ad copy rewrites (always requires approval)
Tier 2 Budget changes, bid changes, campaign/ad group status changes (always requires approval)

Safety bumps: confidence < 0.7 → Tier 2. >3 entities in same ad group → Tier 2. >15% cumulative spend impact → all Tier 2.

Slack channel: #hh-ads-autopilot

MCP read: my-mcp-server-nwchtq.fly.dev  |  MCP write: hedgehog-ads-writer.fly.dev

3

HH Meta Ads

Detection logic

Ad set pauseSpend ≥ threshold with 0 conversions, or CPA > N× account average. TOFU/awareness objectives (OUTCOME_AWARENESS, REACH, VIDEO_VIEWS) are excluded
Creative fatigueDedicated fatigue detection module monitors frequency and performance decay
Landing pagesLanding page performance analysis module

Architecture

Single /api/run-all endpoint processes all clients sequentially (no chain runner). Simpler than the Google Ads autopilot. The original A/B chain pattern was removed after PR #6 (HTTP 508 loop detected from ping-pong). Approval cards include Hook Rate and Hold Rate metrics. Competitor ad intelligence via Meta Ads Library (PR #23).

Cron schedule (all times AEST)

JobSchedulePurpose
Full pipelineMonday 8:00 AMAll clients sequentially
Weekly analysisMonday 8:30 AMCreative fatigue, landing pages, digest
Anomaly checkMon-Fri 9:00 AMDaily snapshot + threshold checks
Expire objectionsDaily 4:00 PMExpire held client objections older than 7 days

Slack channel: #hh-meta-autopilot

MCP: hedgehog-meta-ads MCP server

4

MM Google Ads (Lightning McQueen)

Detection logic

Identical thresholds to HH Google Ads: keyword pause at $50/0 conversions, negative keyword at $20/0 conversions, ad rewrite at CTR < 1% after 1,000 impressions, circuit breaker at >15% cumulative spend impact. Same 3-tier action classification.

Architecture

Uses staggered per-client crons (3-minute gaps) instead of the A/B chain runner. Each client is its own Vercel cron invocation. This avoids the chain stall bugs that plagued HH.

Cron schedule (all times AEST)

TimeClient
Mon 7:00 AMAuto Response
Mon 7:03 AMAutomotive Insight
Mon 7:06 AMBallarat Roadworthy
Mon 7:09 AMBallarat Service Centre
Mon 7:12 AMCore Diesel
Mon 7:15 AMNoranda Service Centre
Mon 7:18 AMPerformance Plus
Mon 7:21 AMUltra Tune North Ryde
Mon 7:30 AMWeekly digest
Mon-Fri 8:00 AMAnomaly check (all clients)
1st of month 7:00 AMMonthly report

Slack channel: #mm-ads-autopilot

MCP read: mechanic-mcp-server.fly.dev  |  MCP write: hedgehog-ads-writer.fly.dev (shared, multi-MCC)

ClickUp bot name: Lightning McQueen (bot user ID 89551429)

5

Approval flow

1
The autopilot detects an anomaly or optimisation opportunity and classifies the action into a tier.
2
Slack card posted to the autopilot's channel with the recommendation, affected entities, before/after values, and Approve, Reject, and Intervene buttons.
3
Approve: the autopilot executes the change via the MCP write server and logs the result to both Supabase (approval record) and ClickUp (subtask under the run's parent task).
4
Reject: the action is marked rejected in Supabase. The Slack card updates to show rejected status. No ad platform changes are made.
5
Intervene: opens a modal for the reviewer to provide custom instructions or modifications. The autopilot applies the modified action instead.
6
All outcomes are recorded in the Supabase approvals table and synced to ClickUp. The Burrow portal also surfaces pending approvals for client-side review.
6

Intervening

Step-by-step for a human taking over:

  1. Pause for a client: Set the client's row to is_active = false in the relevant Supabase table (google_ads_clients or meta_clients). The next cron run will skip them.
  2. Halt a running chain (HH Google Ads only): Set chain_status = 'halted' in the chain_runs table. The watchdog will not restart it.
  3. Check what was last done: Query the Supabase approvals table filtered by client and date. ClickUp subtasks under the autopilot's parent task also show the full action log.
  4. Reverse a change in Google Ads: Use the Google Ads UI or the hedgehog-ads-writer MCP to undo the specific action (re-enable a paused keyword, remove a negative keyword, revert ad copy).
  5. Reverse a change in Meta Ads: Use Meta Ads Manager or the hedgehog-meta-ads MCP to undo the action.
7

Failure modes

  • Chain halts mid-run (HH Google Ads): Clients with no active campaigns complete in ~6 seconds, which can cause Vercel to reap the function before after() registers. The watchdog cron rescues stalled chains. Fixed across PRs #8, #10, #13 but watch for recurrence with new clients.
  • Slack card posted but buttons unresponsive: The trigger_id expires if there's a double after() hop (fixed in PR #12). Also check that the Slack app's interactivity URL is correctly pointed at the autopilot's /api/slack/interactions endpoint.
  • Duplicate recommendations: Expired feedback rules staying is_active = true despite being filtered at query time (fixed in PR #15). If duplicates reappear, check the feedback_rules table for stale active rows.
  • MCP 503s under load: The Fly.io MCP servers can return 503 when multiple autopilots hit them simultaneously. MM autopilot uses 3-minute client gaps to mitigate. HH uses chain sequencing. If 503s spike, check Fly.io machine status.
  • Negative keyword execution failures: Campaign IDs missing from AI agent output, or keywords passed as array instead of comma-separated string. Fixed in PRs #9-#10. Watch for regressions when prompt templates change.
  • Meta HTTP 508 loop: Original A/B chain pattern caused ping-pong. Replaced with single /api/run-all in PR #6. Should not recur unless chain architecture is reintroduced.
  • Wrong account ID stored (Meta): The Visions incident (Apr 2026) — wrong ad account stored due to similar-named accounts. Always get the account ID from the client's own Ads Manager, not from search results.
  • Client added without service verification (Meta): The Doctors on Demand incident (Apr 2026) — client added to meta_clients without confirming Meta Ads was a purchased service. Both repos now have a service-scope verification gate.