Your AI Fleet Needs a Budget (Ours Was Burning $155 a Day)
Here's a fun thing to discover at 2am on a Tuesday: your AI agents have been quietly spending $155 per day on API calls, and nobody set a cap.
We found 29 email receipts in 48 hours. Twenty-nine little "auto recharge" notifications from Anthropic, each one between $10-15, cheerfully informing us that our AI fleet had decided it needed more tokens. The projected monthly burn? $4,650. On top of our $100/month base plan.
This is the story of how we went from "wait, how much?" to per-agent cost controls in about 16 hours.
The Problem: One Key to Rule Them All
When we started, every AI agent in our fleet — all 140 of them — used the same API key. The dispatcher would grab the key, hand it to whatever agent was running, and off it went. Simple. Elegant. Completely invisible from a billing perspective.
Our LLM Gateway tracked per-call costs beautifully. We had a $50/day cap. We had usage dashboards. We felt very responsible.
What we didn't track was the subscription billing. Turns out, when you route 140 agents through one Team seat, the "extra usage" charges are a separate meter that nobody was watching. Two meters. One watched. One not. Guess which one caught fire.
The Fix: Three Layers of "No"
Layer 1 — Round-Robin Key Pool: Instead of one API key absorbing all the load, we built a weighted round-robin pool. Twelve keys now, distributed across multiple subscription seats. Our heaviest seat gets 64% of the traffic (it has the biggest plan). The rest split the remainder. Same itertools.cycle pattern we already used for Gemini. Thread-safe. Kill switch included because we've learned that lesson.
Layer 2 — Per-Agent API Keys: Each high-spend bot now gets its own API key with a daily spending cap. Our benchmark bot? $8/day. Our judge bot? $5/day. Hit the cap and you get a 429 back, plus a NATS alert so we know about it. The caps use pg_advisory_xact_lock to prevent two concurrent requests from both sneaking past the limit — because our agents are fast enough to race each other.
Layer 3 — The Part We Haven't Done Yet: Setting an actual monthly spend limit in the Anthropic admin dashboard. Sometimes the most sophisticated engineering solution is clicking a checkbox in a settings page.
The Double-Counting Bug Nobody Caught
While we were at it, we discovered our spend dashboard was double-counting everything. Turns out we were logging API calls from two sources — the gateway (real spend) and our transcript ingestion system (already-paid-for replays). Every dollar showed up twice.
The fix was adding AND source_type = 'gateway' to seven SQL queries. Seven. That's the kind of bug that makes you question every dashboard you've ever built.
What It Costs Now
Gateway spend: ~$14-16/day against a $50 cap. Subscription extra usage: being redistributed across seats. Per-agent caps: enforced. Monitoring: actually watching both meters now.
The projected savings from redistribution alone should take us from $1,126/month in extra usage down to $520-620/month. Not because we're using less — because we're distributing the load across seats that already exist.
Lessons for Anyone Running an AI Fleet
- You have two billing meters. API dashboard spend and subscription extra usage are different things. Watch both.
- Per-agent caps prevent runaway costs better than global caps. One misbehaving agent shouldn't eat everyone's budget.
- Check your dashboards for double-counting. If you log from multiple sources, you're probably inflating your numbers. Sleep better with one honest filter.
- Kill switches on everything. Every cost control we built has an env var that turns it off. When something goes wrong at 3am, you want one variable to flip, not a code rollback.
Total cost of building all three layers: about $0.25 in AI compute (we had our fleet review the plan before we executed it) and one very long night.
This is part of our build-in-public series. Previous post: We Built AI Agents That Hack Our Own Infrastructure Every 6 Hours