AI Red Teaming Knowledge Base

AI Red Teaming

Guides, research, and references on AI red teaming, prompt injection, agent security, and adversarial testing for LLM and agent systems.

Browse Research Open Guides

Featured Reading

Current material worth reading

Curated research, system cards, and security write-ups that are useful for understanding how AI red teaming is evolving in practice.

Microsoft Security Blog March 12, 2026 guide

Detecting and analyzing prompt abuse in AI tools

Microsoft Incident Response walks through how to detect prompt abuse operationally, tying prompt injection risk back to logging, telemetry, and incident response workflows.

Prompt Injection Agent Security

Read summary Source link

OpenAI March 11, 2026 analysis

Designing AI agents to resist prompt injection

OpenAI frames prompt injection as an evolving agent-security problem that increasingly resembles social engineering rather than a simple string-matching issue.

Prompt Injection Agent Security

Read summary Source link

OpenAI March 9, 2026 news

OpenAI to acquire Promptfoo

OpenAI announced plans to acquire Promptfoo, highlighting automated AI security testing, red teaming, and evaluation as core enterprise requirements.

AI Red Teaming Prompt Engineering Prompt Injection

Read summary Source link

MITRE Center for Threat-Informed Defense February 9, 2026 framework

MITRE ATLAS OpenClaw Investigation Discovers New and Likeliest Techniques

MITRE maps incident patterns in an open-source agentic ecosystem to ATLAS techniques, showing how AI-first systems create distinct execution paths for attackers.

Agent Security AI Red Teaming

Read summary Source link

Latest Notes

New additions to the research library

Recent notes and references across prompt injection, agent security, evaluations, and adjacent AI security work.

Krebs on Security April 14, 2026 news

Patch Tuesday, April 2026 Edition

Krebs on Security covers April 2026 patching activity, including a record-sized Microsoft release and active exploitation notes.

AI Red Teaming

Read summary Source link

OpenAI December 22, 2025 analysis

Continuously hardening ChatGPT Atlas against prompt injection attacks

OpenAI describes using automated red teaming and reinforcement learning to discover agent prompt injection attacks before they appear in the wild.

Prompt Injection Agent Security AI Red Teaming

Read summary Source link

Google Cloud Blog December 4, 2025 guide

Building a Production-Ready AI Security Foundation

Google Cloud outlines a defense-in-depth view of AI security spanning application controls, data protections, and infrastructure isolation.

Agent Security Prompt Injection Adversarial ML

Read summary Source link

OpenAI November 7, 2025 guide

Understanding prompt injections: a frontier security challenge

An accessible explanation of prompt injection risk in real AI products, including how third-party content can redirect or manipulate agent behavior.

Prompt Injection Prompt Engineering

Read summary Source link

Guides

Core guides

Structured introductions to the main problem areas that keep showing up in AI red teaming and application security.

/ai-application-security

AI Application Security

How LLM features change application threat models once prompts, retrieval, tools, memory, and downstream systems are tied together.

A clearer system-level threat model for AI featuresA better sense of where to add approvals, isolation, and monitoring

Open guide

/llm-red-teaming

LLM Red Teaming

How adversarial testing is applied to LLM-backed products, including harmful outputs, prompt breakouts, and misuse paths.

Better visibility into failure modes that matter in productionFaster break-fix loops between testing and engineering

Open guide

/prompt-injection-testing

Prompt Injection

The core attack pattern in modern AI applications: malicious instructions arriving through users, retrieved content, tools, or hidden context.

A practical mental model for prompt injection beyond slogansBetter design instincts around content trust boundaries

Open guide

/prompt-engineering-review

Prompt Engineering

Instruction design and prompt structure as part of the security boundary, not just a usability exercise.

Prompts that are easier to reason aboutLower variance when inputs become messy or adversarial

Open guide

/agent-security-review

Agent Security

Security basics for systems that can plan, use tools, persist state, and take actions across multiple steps.

A more grounded model for agent-specific riskBetter boundaries around tools and action execution

Open guide

/adversarial-ml-and-model-risk

Adversarial ML and Model Risk

A compact guide to adversarial ML concepts and how they connect to modern AI product security.

Cleaner distinctions between model risk and system riskBetter alignment between AI security and traditional security controls

Open guide

Topic Coverage

Prompt engineering, prompt injection, agent security, and more

These topic hubs connect introductory guidance with current research, incident patterns, and product-facing security lessons from the broader AI ecosystem.

AI Red Teaming

Methods, case studies, and tooling for red teaming AI systems end to end.

Open topic

Prompt Engineering

Prompt design patterns, instruction hierarchy, and defensive prompt construction.

Open topic

Prompt Injection

Prompt injection attacks, mitigations, detection, and design patterns for safer AI applications.

Open topic

Agent Security

Controls and attack paths for browsing, tool use, memory, identity, and action-taking agents.

Open topic

Model Evaluation

Safety evaluations, system cards, preparedness, and security measurement for frontier models.

Open topic

Adversarial ML

Adversarial machine learning attacks, taxonomies, and mitigations across the ML lifecycle.

Open topic

Profile

Profile and contact

Focused on AI red teaming, prompt injection risk, agent security, and application-layer failures in LLM and agent systems.

View Profile Contact