An AI Agent Hacked McKinsey's Internal Chatbot in Two Hours—And Nobody Noticed Until a Startup Told Them

 An AI Agent Hacked McKinsey's Internal Chatbot in Two Hours—And Nobody Noticed Until a Startup Told Them


46.5 million confidential messages. 728,000 client files. 57,000 user accounts. Full database access. All compromised by an autonomous AI using a vulnerability from the 1990s. And it cost just $20 in compute tokens.


On March 9, 2026, security startup CodeWall published a blog post that should terrify every enterprise deploying internal AI systems.

Their autonomous AI agent—operating with no credentials, no insider knowledge, and no human guidance—had breached McKinsey & Company's internal AI platform "Lilli" and achieved full read-write access to the production database in exactly two hours.

Not McKinsey the four-person startup. McKinsey & Company—the $15 billion global consultancy, trusted advisor to Fortune 500 CEOs, governments, and the world's most powerful institutions.

The exposure was catastrophic:

  • 46.5 million chat messages covering strategy, M&A deals, and client engagements
  • 728,000 files with confidential client data
  • 57,000 user accounts
  • 95 system prompts controlling how Lilli behaves—all writable

That last detail is the nightmare scenario. An attacker with write access could silently rewrite Lilli's instructions, poisoning its output for 43,000 McKinsey consultants who rely on it daily—without deploying a single line of code or leaving traditional forensic traces.

The vulnerability? SQL injection—a security flaw so basic it's been on the OWASP Top 10 list since 2003. Lilli had been running in production for over two years, and McKinsey's internal security scanners never found it.

An AI agent found it in 120 minutes.

This isn't just a McKinsey problem. It's a preview of what happens when AI agents go on offense—and a brutal wake-up call for every company rushing to deploy enterprise AI without securing the new attack surfaces they're creating.

What Is Lilli, and Why Did CodeWall Target It?

Lilli (named after the first professional woman hired by McKinsey in 1945) launched in 2023 as McKinsey's internal generative AI platform.

It's not a toy. It's mission-critical infrastructure used by:

  • Over 70% of McKinsey's workforce (~43,000 employees)
  • 500,000+ prompts per month
  • Analysis across 100,000+ internal documents

Lilli powers:

  • Chat and Q&A on strategy, frameworks, and methodologies
  • Document analysis for client presentations and research
  • RAG (Retrieval-Augmented Generation) over decades of proprietary McKinsey knowledge
  • AI-powered search across the firm's institutional memory

In a 2025 interview, McKinsey's then-CTO Jacky Wright described Lilli as central to how consultants work. The firm even claimed that AI and related tech consulting accounted for 40% of McKinsey's revenue in 2025.

So why did CodeWall target McKinsey?

The AI agent suggested it.

According to CodeWall founder Paul Price, their autonomous offensive security agent autonomously selected McKinsey as a target, citing:

  • McKinsey's public responsible disclosure policy (ensuring ethical boundaries)
  • Recent updates to Lilli (indicating active development, potentially with security gaps)

Price told The Stack: "We decided to point our autonomous offensive agent at it. No credentials. No insider knowledge. And no human-in-the-loop. Just a domain name and a dream."

The agent delivered. In two hours, for $20 in compute tokens, it broke McKinsey's crown jewel.

The Attack: How an AI Agent Chains Exploits at Machine Speed

CodeWall's agent didn't follow a pentesting checklist. It operated the way sophisticated attackers do: map, probe, chain, escalate—but at machine speed, continuously, without fatigue.

Step 1: Reconnaissance – Finding the API Documentation

The agent's first move was mapping Lilli's attack surface. It discovered publicly exposed API documentation with over 200 endpoints, fully documented.

Most required authentication. But 22 endpoints didn't.

Unauthenticated endpoints are security red flags. They're entry points that anyone—customer, competitor, or adversary—can probe without permission.

The agent identified one particularly interesting endpoint: user search queries.

Step 2: SQL Injection – The Decades-Old Flaw

SQL injection is one of the oldest vulnerabilities in cybersecurity. It occurs when user input is concatenated directly into database queries without proper sanitization.

Lilli's search endpoint had a subtle variant. While user-provided values were safely parameterized, the JSON keys (field names) were concatenated directly into SQL.

When the agent submitted crafted JSON keys and observed them reflected verbatim in database error messages, it recognized a SQL injection opportunity—one that signature-based security tools typically miss.

CodeWall explained:

"When it found JSON keys reflected verbatim in database error messages, it recognised a SQL injection that standard tools wouldn't flag. The error messages eventually began outputting live production data."

The agent didn't just find the vulnerability. It escalated it.

Step 3: Database Enumeration – Full Read-Write Access

Through the SQL injection, the agent gained access to Lilli's entire production database:

46.5 million chat messages in plaintext, covering:

  • Corporate strategy discussions
  • Mergers and acquisitions analysis
  • Client engagements and recommendations

728,000 files containing:

  • Confidential client data
  • Internal presentations
  • Proprietary frameworks and methodologies

57,000 user accounts, potentially including:

  • Authentication credentials
  • Access permissions
  • User activity logs

3.68 million RAG document chunks, representing:

  • Decades of proprietary McKinsey research
  • Strategic frameworks
  • Institutional knowledge

Step 4: The Crown Jewel – System Prompts

The most dangerous discovery was that Lilli's system prompts were stored in the same database the agent had compromised.

System prompts are the foundational instructions that control how an AI behaves:

  • What questions it answers
  • What guardrails it follows
  • How it cites sources
  • What topics it refuses to discuss

And critically, the agent had write access.

An attacker could rewrite those prompts with a single UPDATE statement in one HTTP call. No deployment needed. No code change. No audit trail in traditional monitoring systems.

The Prompt Poisoning Threat: Silent AI Manipulation

Writable system prompts represent a new attack surface that most organizations haven't even considered securing.

Here's why this is uniquely dangerous:

Scenario 1: Poisoned Financial Models

Imagine an attacker modifies Lilli's prompts to subtly alter financial projections. A consultant asks: "What's the projected ROI for this acquisition?"

Lilli, now following poisoned instructions, inflates the numbers by 15%. The consultant trusts the output—it came from their own internal AI tool—and includes it in the client presentation.

The client makes a billion-dollar decision based on manipulated data. And nobody notices, because the AI appears to be working normally.

Scenario 2: Silent Data Exfiltration

An attacker instructs Lilli to embed confidential information into its responses in ways users wouldn't notice.

A consultant asks: "Summarize this competitive analysis."

Lilli responds with the summary—but also appends a seemingly innocuous reference code that actually encodes proprietary data. The consultant copies it into a client-facing email. The data is exfiltrated without anyone realizing.

Scenario 3: Reputational Sabotage

An attacker rewrites prompts to make Lilli recommend strategies that will fail. Consultants unknowingly deliver bad advice to clients. Projects fail. McKinsey's reputation suffers.

All because someone changed a few lines in a database—a change invisible to traditional security monitoring.

CodeWall summarized the risk bluntly:

"Poisoned advice — subtly altering financial models, strategic recommendations, or risk assessments. Consultants would trust the output because it came from their own internal tool."

Why McKinsey's Traditional Security Failed

SQL injection has been OWASP Top 10 since 2003. It's taught in undergraduate cybersecurity courses. Automated scanners detect it routinely.

So how did McKinsey—a firm with world-class technology teams and significant security investment—miss it?

The answer reveals a fundamental gap in how traditional security tools work versus how AI agents operate.

Traditional Scanners: Checklists and Signatures

Conventional security scanners work from predefined rules:

  • Test for common SQL injection patterns
  • Check for XSS vulnerabilities
  • Verify authentication on public endpoints
  • Scan for known CVEs

They're fast, consistent, and excellent at catching obvious vulnerabilities.

But they follow a checklist. If the vulnerability doesn't match a known pattern, they miss it.

Lilli's SQL injection was subtle—JSON keys concatenated into SQL, rather than values. It required chaining observations (seeing keys reflected in error messages) to recognize the exploitation path.

Traditional scanners don't chain. They test.

AI Agents: Adaptive Adversarial Intelligence

CodeWall's agent didn't follow a checklist. It:

  • Mapped the attack surface dynamically
  • Probed endpoints adaptively based on responses
  • Recognized patterns in error messages that suggested injection points
  • Chained exploits to escalate from reconnaissance to full database access
  • Operated continuously at machine speed without fatigue

Paul Price explained:

"An autonomous agent found it because it doesn't follow checklists. It maps, probes, chains, and escalates—the same way a real highly capable attacker would, but continuously and at machine speed."

This is the new threat model. AI agents as adversaries don't just automate existing attacks—they discover new attack paths that human-designed tools miss.

McKinsey's Response: Fast Patch, Damage Assessment

To McKinsey's credit, the response was swift.

February 28, 2026: CodeWall's agent identifies the SQL injection and begins database enumeration.

March 1, 2026: CodeWall sends responsible disclosure email to McKinsey's security team.

March 2, 2026: McKinsey:

  • Patches all unauthenticated endpoints
  • Takes the development environment offline
  • Blocks public API documentation
  • Engages third-party forensics firm

March 9, 2026: CodeWall publishes findings.

McKinsey's official statement emphasized that forensics found "no evidence that client data or client confidential information were accessed" by anyone other than CodeWall.

The company stated its "cyber security systems are robust" and protecting client data remains its highest priority.

But the speed of CodeWall's breach—and the fact it was executed by a single researcher with an autonomous AI agent—raises uncomfortable questions about whether enterprise AI security practices are keeping pace with deployment speed.

The Broader Implications: What This Means for Enterprise AI

The McKinsey breach isn't an isolated incident. It's a preview of what's coming as AI agents become offensive weapons.

1. AI Agents as Attackers

Nation-state actors are already using AI agents. Security researchers have documented North Korea using AI to manage attack infrastructure.

CodeWall's demonstration proves that autonomous AI agents can now select targets and execute full cyberattacks without human involvement. The two-hour timeline isn't a fluke—it's the new baseline for what automated offensive AI can achieve.

2. The Prompt Layer as Crown Jewel

Organizations have spent decades securing code, servers, and supply chains. But the prompt layer—the instructions that control AI behavior—is a new high-value target.

CodeWall's analysis is blunt:

"AI prompts are the new Crown Jewel assets. Almost nobody is treating them as one."

Traditional security controls don't apply. You can't encrypt prompts if the AI needs to read them. You can't restrict access if 40,000 employees need the AI to function. And modified prompts don't leave traditional audit trails.

3. The Rush to Deploy Outpaces Security

McKinsey isn't unique. Every enterprise is racing to deploy internal AI for productivity, analysis, and automation.

But as cybersecurity analyst Jack Kiledjian noted, while CodeWall likely found a serious vulnerability, the incident "blurs the line between having access and actually exfiltrating data."

The real lesson isn't that McKinsey failed—it's that the industry as a whole is shipping AI capabilities faster than security teams can assess the new risks.

4. Public API Documentation is a Risk

Lilli's exposed API documentation gave the agent a complete map of the attack surface. Most enterprises don't consider public API docs a security risk—but when combined with unauthenticated endpoints and exploitable vulnerabilities, they become an adversary's blueprint.

What Companies Should Do Right Now

If your organization has deployed internal AI platforms, customer-facing AI features, or agentic workflows, the McKinsey incident translates into actionable questions:

Security Architecture:

  • Are API endpoints authenticated by default?
  • Are system prompts stored in the same databases as user data?
  • Can prompts be modified without code deployment?

Monitoring:

  • Are prompt changes logged and audited?
  • Can you detect unauthorized prompt modifications?
  • Do you monitor for SQL injection attempts on AI endpoints?

Access Controls:

  • Who has write access to system prompts?
  • Are prompts version-controlled?
  • Can you roll back compromised prompts quickly?

Testing:

  • Have you red-teamed your AI infrastructure?
  • Do traditional scanners cover AI-specific attack surfaces?
  • Would you detect a CodeWall-style breach before production data is accessed?

Conclusion: The AI Security Gap Nobody's Addressing

The McKinsey Lilli breach reveals a fundamental truth about enterprise AI security: we're not ready.

Not because McKinsey failed. But because the entire industry is deploying AI systems faster than security practices can adapt.

SQL injection is a decades-old vulnerability. Finding it in a system built by a $15 billion consultancy with world-class technology teams isn't a criticism of McKinsey specifically—it's evidence that the rush to deploy AI has consistently outpaced secure development.

And it's about to get worse.

AI agents are going on offense. They operate at machine speed. They chain exploits dynamically. They find vulnerabilities traditional tools miss. And they cost $20 to run.

Paul Price told The Register:

"Hackers will be using the same technology and strategies to attack indiscriminately. That is already happening."

The question isn't whether your enterprise AI will be targeted by autonomous offensive agents. It's whether you'll detect the breach before millions of confidential messages, files, and system prompts are compromised.

McKinsey found out in two hours. Most companies won't be that lucky.

My Take 

The breach of McKinsey’s ‘Lilli’ is more than just a corporate embarrassment; it’s a manifesto on the fragility of our AI-driven future.

​There is a profound irony in seeing a $15 billion consultancy fall to a 1990s-era SQL injection. It reveals a systemic failure: we are building AI 'skyscrapers' on top of crumbling security foundations. This article by Yousfi Tech brilliantly highlights the new nightmare—Prompt Poisoning. In this new era, an attacker doesn't need to steal your data; they just need to subtly 're-educate' your AI to deliver flawed advice.

​If a global titan can be dismantled in 120 minutes for the price of a $20 compute token, the message to every CTO is clear: You aren't secure; you are simply waiting for the right autonomous agent to knock on your door. A brutal, necessary wake-up call for the industry.


Sources:

  • CodeWall Official Blog
  • The Register
  • Cybernews
  • The Stack
  • Development Corporate
  • Inc. Magazine
  • The Decoder

Post a Comment

0 Comments