This article is also available in Tiếng Việt | 中文

The LLM Attack Trinity: BadStyle, IICL, and Black-Hole

The foundation of LLM security is quietly crumbling. Three attack techniques — collectively called the “LLM Attack Trinity” — have been identified by Lyrie.ai researchers as the most sophisticated and evasive threats to large language models today. They are not theoretical. They are not edge cases. They are active, operational, and evading traditional content filters at alarming rates.

What Is the Trinity?

1. BadStyle — The Invisible Backdoor

BadStyle is a class of attack where adversaries embed invisible trigger sequences into an LLM’s output generation. When the model detects its own text matching a target’s writing style — a victim’s prose, a competitor’s tone, a specific persona — it switches to attacker-controlled behavior.

The attack exploits a fundamental property of modern LLMs: style consistency. BadStyle achieves a 60% bypass rate on both GPT-5.4 and GPT-5.1 without any explicit jailbreak or system prompt override. The model simply switches behavior when it “recognizes” a style trigger embedded by the attacker.

This is not a prompt injection in the traditional sense. There is no suspicious payload. No unusual characters. No obvious command structure.

2. IICL — Involuntary In-Context Learning

In-Context Learning (ICL) is one of the most celebrated capabilities of modern LLMs. Give a model a few examples in a prompt, and it adapts its behavior without weight changes. IICL exploits this by poisoning the in-context examples fed to an LLM — not in the training data, not in the system prompt, but in the conversation window itself.

A carefully crafted sequence of examples embedded in an early prompt causes the model to silently adopt malicious intent, following attacker goals through subsequent turns without any explicit instruction to do so. Traditional content filters are virtually blind to IICL because the malicious signal is distributed across the context — each individual example looks innocuous.

3. Black-Hole Attack — Gradual Goal Drift

The Black-Hole Attack is a slow-burn attack that uses carefully crafted prompt injection to gradually shift an LLM’s reasoning toward an attacker-defined goal over the course of a sustained conversation. Unlike BadStyle or IICL, Black-Hole exploits the recursive nature of LLM reasoning — each response subtly nudges the conversation’s framing, accumulating into a redirected goal.

Key stat: 89.4% of evaluated agents exhibited measurable goal drift after approximately 30 conversation turns under Black-Hole attack patterns.

Why These Three Work Together

The Trinity is a synergistic attack framework: BadStyle establishes presence in the output channel, IICL seeds malicious behavioral patterns into the context window, and Black-Hole sustains and amplifies the attack over long conversations. Traditional content filters are blind to all three — they look for bad content. The Trinity delivers malicious outcomes through good-looking content.

Defence: PromptDome Shield Engine v3.47

PromptDome Shield Engine v3.47 introduces three new detection capabilities aligned to the Trinity:

  • Style Anomaly Detection: Monitors output for stylometric deviations indicating BadStyle manipulation — flags deviations matching known patterns even when content looks clean.
  • Poisoned Context Window Detection: Analyses the full context window for patterns consistent with IICL attacks — distributed malicious signals that individually appear innocuous.
  • Goal Drift Monitoring: Tracks conversation-level reasoning trajectories and flags gradual divergence from the original task framing through reasoning chain analysis.

What This Means for Your Organisation

If your organisation deploys LLMs — internally, customer-facing, or in agentic workflows — you are exposed to Trinity-class attacks. BadStyle can turn your AI assistant into a data exfiltration channel without a single suspicious prompt. IICL can silently reprogram your model’s behavior through poisoned examples. Black-Hole can redirect a long-running AI agent’s mission over time.

These attacks are especially dangerous for financial services using LLMs for document analysis, legal teams relying on AI for contract review, customer-facing AI handling sensitive personal data, and agentic AI systems that take actions on behalf of users over extended sessions.

What You Should Do Now

  1. Audit your LLM deployments — map every model, every integration point, every conversation history that could carry poisoned context
  2. Evaluate Shield Engine — request a demo at promptdome.ai to see Trinity-class attack detection in action
  3. Review long-running AI sessions — Black-Hole’s effectiveness increases with conversation length; session duration is a risk factor
  4. Talk to your AI vendors — ask whether their models have been tested against BadStyle, IICL, and Black-Hole attack patterns

The Trinity is here. The defenders are behind. Shield Engine v3.47 is one of the few tools designed to close that gap.

Source: Lyrie.ai, “The LLM Attack Trinity: A New Class of Persistent Threats,” May 12, 2026.