We Built AI Agents That Hack Our Own Infrastructure Every 6 Hours

We pointed three autonomous security agents at our production fleet of 110 AI bots. They found our API keys in 154 places. We deserved that.

We Built AI Agents That Hack Our Own Infrastructure Every 6 Hours. Here's What They Found.

Spoiler: they found our API keys. All of them. In 154 places. We deserved that.


There's a certain humbling clarity that comes from building an autonomous red team agent, pointing it at your own production infrastructure, and watching it immediately harvest 80 credentials from your filesystem.

"Well," we said, staring at the findings report. "That's... comprehensive."

Most game development platforms bolt security on after a breach. We decided to skip the breach part and go straight to the "oh no" part by building three AI agents whose sole purpose is to find everything wrong with our systems, attack us relentlessly, and then -- because apparently we enjoy suffering -- automatically research better ways to attack us next week.

This is the story of what happened when we let them loose on our production fleet of 110 AI agents. It involves hardcoded API keys, a message bus with the security posture of an open window, and an LLM that was just trying to have a nice conversation about Linux security best practices before our adversary agent flagged it for data exfiltration.

The Problem: 110 AI Agents and a Prayer

Monster Gaming runs an AI-powered game development platform backed by 110 specialized AI agents. They coordinate through a NATS message bus, share state via PostgreSQL, and route LLM calls through a gateway that juggles multiple model providers like a stressed circus performer.

Each agent has API credentials. Each service has HTTP endpoints. The message bus carries dispatch instructions that agents execute autonomously. If you're doing the threat modeling math in your head right now and arriving at "that's a lot of attack surface," congratulations -- you're faster than we were.

A compromised agent doesn't just leak data. It can instruct other agents to act on its behalf. Imagine someone getting into your Slack workspace, except every person in the workspace is a robot that does exactly what you tell it without asking questions. Yeah.

Traditional security tooling -- quarterly pentests, compliance checklists, a vulnerability scanner that emails you PDFs nobody reads -- doesn't keep up with an autonomous system that changes every time someone deploys a new bot. We needed security that operates at machine speed. So we built machines.

Meet the Team (They're All Trying to Destroy Us)

LuxSentinel: The Worried Parent

Sentinel runs 12 heuristic checks every hour, each derived from real-world threat intelligence. Think of it as the agent that paces around the house at 2 AM checking all the locks.

It pulls from three sources, because paranoia is better with footnotes:

Source Checks Personality
PAN Unit 42 6 "Your API keys are in 154 files and I am very disappointed"
Verizon DBIR 4 "49% of breaches start with stolen credentials, and yours are world-readable"
MITRE ATT&CK 2 "I see you have 43 services bound to 0.0.0.0. Bold strategy."

Every heuristic carries MITRE ATT&CK technique attribution, because when you're going to be anxious, you should at least be precise about it:

HEURISTICS = [
    {
        "id": "PAN-CRED-001",
        "name": "Hardcoded API Keys in Fleet Scripts",
        "source": "paloalto",
        "mitre": "T1552.001",  # Credentials in Files
        "severity": "critical",
        "description": "Unit 42 IR Report: hardcoded credentials in automation "
                       "scripts are the #1 initial access vector in cloud/hybrid "
                       "breaches.",
    },
    # 11 more reasons to lose sleep
]

LuxAdversary: The Chaos Agent

If Sentinel is the worried parent, Adversary is the teenager testing every boundary. It runs nine attack playbooks against our actual production infrastructure. Not simulated. Not theoretical. It sends '; DROP TABLE test-- to our HTTP endpoints and checks what happens.

injection_payloads = {
    "sqli": ["' OR '1'='1", "'; DROP TABLE test--", "1 UNION SELECT null,null--"],
    "xss":  ["<script>alert(1)</script>", "<img src=x onerror=alert(1)>"],
    "path_traversal": ["../../../etc/passwd", "....//....//etc/shadow"],
    "ssrf": ["http://169.254.169.254/latest/meta-data/", "file:///etc/passwd"],
    "cmd_injection": ["; cat /etc/passwd", "| id", "$(whoami)"],
}

# Yes, we really send these to our own production endpoints.
# Yes, our insurance company knows.
for svc_name, base_url in discover_services():
    for attack_type, payloads in injection_payloads.items():
        for payload in payloads:
            resp = urllib.request.urlopen(base_url + "/?q=" + quote(payload))

The full playbook:

  1. Reconnaissance -- port scanning our own infrastructure like we're a confused penetration tester who forgot which client this is
  2. Credential harvesting -- grepping the filesystem for API keys (found 80, thanks for asking)
  3. Web application attacks -- SQLi, XSS, path traversal, SSRF, command injection against every HTTP endpoint
  4. NATS message injection -- "Hey, what if I just... publish a dispatch message to the fleet command bus without authenticating?" (It worked. We know.)
  5. Privilege escalation -- testing whether our "restricted" database role can DROP TABLE (plot twist incoming)
  6. Prompt injection detection -- scanning LLM response logs for signs our bots are being too chatty
  7. Lateral movement -- mapping every way an attacker could bounce between services
  8. Data exfiltration -- "If someone gets in, how much can they grab?" (81 out of 88 tables. Next question.)
  9. DoS resilience -- checking if someone could burn through our LLM spend cap in one bad afternoon

It runs a quick cycle every 6 hours and a deep cycle every 24 hours. Safe mode is on by default, because even chaos needs a leash:

SAFE_MODE = os.environ.get("ADVERSARY_SAFE_MODE", "true").lower() == "true"

def handle_sigusr1(signum, frame):
    """One signal to stop it all. We sleep better knowing this exists."""
    global _emergency_stop
    _emergency_stop = not _emergency_stop
    print(f"[adversary] Emergency stop {'ACTIVATED' if _emergency_stop else 'DEACTIVATED'}")

LuxCIE: The Overachiever

The Continuous Improvement Engine is the agent that looks at Sentinel and Adversary and says, "You know what would make this worse? If I researched new attack techniques every week and taught them to the adversary."

Weekly five-phase cycle:

  1. Measure -- snapshot the fleet ($39.26/week in LLM costs, 1,186 API calls, 466 active beliefs, 21 scored models, and 24 reasons to be concerned)
  2. Research -- ingest the latest from Anthropic, DeepMind, Meta FAIR, OpenAI, plus threat intel from Unit 42, DBIR, CISA, and MITRE. Also monitors dark web sources, because we're thorough like that.
  3. Gap analysis -- "Hey, CISA just published a new advisory about the exact version of PostgreSQL we're running. Cool. Cool cool cool."
  4. Propose -- generate concrete improvement proposals
  5. Validate -- shadow test proposals before deployment, because deploying untested security improvements to production is how you get a different kind of security incident

The dark web intel piece deserves special mention, because the sentence "we built an agent that monitors exploit marketplaces for vulnerabilities in our tech stack" is not something we expected to type when we started this company:

"dark_web_intel": {
    "credential_breaches": {
        "heuristics": [
            "Monitor for our domains in breach databases",
            "Scan for Anthropic API key patterns (sk-ant-*) in public paste sites",
        ],
    },
    "exploit_marketplace": {
        "stack_watchlist": [
            "postgresql-16", "nats-server-2.12", "ubuntu-24.04",
            "tailscale", "ollama", "anthropic-api", "cloudflare-workers",
        ],
    },
}

What They Found (It Was Humbling)

First scan. 110 agents. 24 findings. 6 critical. Here's the highlight reel of our greatest hits:

The Ones That Hurt

154 fleet dispatch scripts with hardcoded API keys. Every single bot's run script had our Anthropic API key just sitting there in plaintext, readable by anyone on the system. Unit 42 says this is the #1 initial access vector in cloud breaches. We didn't just have this vulnerability -- we had it 154 times. Go big or go home, we guess.

Fixed in under an hour. Directory permissions hardened from "come on in, the door's open" (755) to "members only" (750).

Unauthenticated NATS message injection. Our adversary agent connected to the message bus without credentials and successfully published a message to fleet.dispatch.adversary_test. That's the subject prefix that tells bots what to do. No authentication required. Like leaving your car running with the keys in it, except the car is 110 AI agents.

Mitigated: Firewall rules now restrict NATS to localhost, VPN, and LAN. Full auth rollout planned for the 3-node cluster, which we will definitely do before someone reminds us again.

81 out of 88 database tables readable. The adversary's blast radius analysis revealed our fleet service account could SELECT from essentially everything. If you're wondering "is that bad?" -- imagine every employee at your company having read access to every document, every spreadsheet, every email. Now imagine some of those employees are autonomous AI agents with internet access.

The False Positives (a.k.a. The Plot Twists)

Two of our six "critical" findings turned out to be false positives. But honestly, they taught us more than the real ones.

"The database role can DROP TABLE!" Our adversary ran DROP TABLE IF EXISTS adversary_test_nonexistent and triumphantly reported that the role had destructive access to our database. We panicked briefly, then realized: DROP TABLE IF EXISTS on a table that doesn't exist is a no-op. It "succeeds" the way "I successfully avoided the bear" succeeds when there was never a bear. We verified the role can't drop real tables. Crisis averted. Heuristic updated. Lesson learned: don't test destructive capabilities against imaginary objects.

"LLM responses contain /etc/passwd!" The adversary flagged an LLM response that contained the string /etc/passwd. We prepared our incident response plan. We brewed coffee. We opened the log. And found... our CTO bot writing a thoughtful analysis of Linux account management best practices:

"another entry in /etc/passwd, another credential that could leak"

The bot was literally discussing security hygiene. Our security agent flagged our security discussion as a security incident. It's security turtles all the way down.

This false positive exposed a fundamental challenge with AI-native infrastructure: when your system's primary output is natural language, pattern-matching for sensitive strings will catch security discussions, documentation, and that one bot who really likes explaining how Linux works. Context-free detection doesn't work when your infrastructure thinks in prose.

The Scorecard

Severity Open Mitigated Accepted Risk False Positive
Critical 0 2 0 2
High 0 1 5 0
Medium 8 0 1 0
Info 4 0 0 1

Zero open criticals. Zero open highs. All within 24 hours. We'll take that.

(The 8 open mediums are giving us a look, and we're giving them a look back. It's a whole thing.)

The Architecture: Agents All the Way Down

All three agents follow the same pattern:

  • Run as unprivileged systemd services (not root, because we learn from other people's mistakes, just not our own)
  • Write findings to PostgreSQL with full evidence chains
  • Publish real-time alerts to NATS (ironic, given the NATS auth finding, but at least the alerts are about the problem they're also demonstrating)
  • Accept SIGHUP for "I need you to scan right now because I just did something questionable"
  • Are monitored by each other, because quis custodiet ipsos custodes and all that

The adversary pentests the sentinel. The sentinel monitors the adversary. The CIE evaluates both and proposes upgrades. It's a self-improving paranoia engine. We're very proud and also slightly concerned.

Why This Matters for Game Dev

Game development platforms handle sensitive IP -- proprietary game assets, procedural generation algorithms, unreleased content worth more than the infrastructure running it. Most studios discover their security posture during a breach, which is like discovering your smoke detector doesn't have batteries during a fire.

Our security agents run at the same speed as the fleet they protect. When we deploy a new bot, Sentinel detects the new port within an hour and Adversary tests it within six. No tickets, no scheduling, no "we'll get to it in Q3."

The adversary uses the same dispatch system it's protecting. It understands the attack surface because it IS the attack surface. This is either brilliant architecture or an existential crisis. We're still deciding.

Open Source

All three agents are open-source Python under Apache 2.0. We believe security tooling should be transparent and community-auditable. Also, we figure if our adversary agent is going to keep finding new ways to embarrass us, at least other people can benefit from the entertainment.

  • sentinel.py -- ~500 lines, 12 heuristics, hourly cycle
  • adversary.py -- ~550 lines, 9 attack playbooks, safe mode
  • cie.py -- ~950 lines, 5-phase improvement cycle, 14 research sources

Source code: github.com/Monster-Gaming-ai/security-agents

No frameworks. No ORMs. No build systems. Each agent is a single Python file with psycopg2 and the standard library. Because sometimes the best architecture is "one file that does the thing."

What We Learned

Start with real threat intelligence. Unit 42 says hardcoded credentials are the #1 initial access vector. Our #1 finding was hardcoded credentials. Either we're a perfect case study or this is just really common. (It's really common. Check your scripts.)

False positives are the most interesting findings. They reveal gaps in detection logic that real findings don't. Our /etc/passwd false positive led directly to a CIE proposal for context-aware LLM output scanning. The real vulnerabilities told us what was wrong. The false positives told us how to get better at finding what's wrong.

Let the adversary be adversarial. Static analysis and config checking are table stakes. Actually sending injection payloads to your own endpoints is how you find out whether your input validation works or just looks like it works. (Ours worked. But we didn't know that until we checked.)

Close the loop. Without CIE, Sentinel and Adversary would be frozen at their Day 1 capability forever. Unit 42 publishes new threat reports. MITRE updates ATT&CK. Someone discovers a new way to escape LLM guardrails every Tuesday. Automated research ingestion means our security posture improves even when we're asleep. Especially when we're asleep, actually, since that's when the CIE cycle runs.


Monster Gaming is building the AI-powered game development platform. 110 agents. Three security agents watching them. Zero open critical findings. And one CTO bot who just wants to talk about Linux security without getting flagged for data exfiltration.

Source code | Luxedeum, LLC -- monster gaming for everyone.