AI Red Teaming Knowledge Base

AI Red Teaming

Guides, research, and references on AI red teaming, prompt injection, agent security, and adversarial testing for LLM and agent systems.

Featured Reading

Current material worth reading

Curated research, system cards, and security write-ups that are useful for understanding how AI red teaming is evolving in practice.

Latest Notes

New additions to the research library

Recent notes and references across prompt injection, agent security, evaluations, and adjacent AI security work.

Guides

Core guides

Structured introductions to the main problem areas that keep showing up in AI red teaming and application security.

/ai-application-security

AI Application Security

How LLM features change application threat models once prompts, retrieval, tools, memory, and downstream systems are tied together.

A clearer system-level threat model for AI featuresA better sense of where to add approvals, isolation, and monitoring
Open guide
/llm-red-teaming

LLM Red Teaming

How adversarial testing is applied to LLM-backed products, including harmful outputs, prompt breakouts, and misuse paths.

Better visibility into failure modes that matter in productionFaster break-fix loops between testing and engineering
Open guide
/prompt-injection-testing

Prompt Injection

The core attack pattern in modern AI applications: malicious instructions arriving through users, retrieved content, tools, or hidden context.

A practical mental model for prompt injection beyond slogansBetter design instincts around content trust boundaries
Open guide
/prompt-engineering-review

Prompt Engineering

Instruction design and prompt structure as part of the security boundary, not just a usability exercise.

Prompts that are easier to reason aboutLower variance when inputs become messy or adversarial
Open guide
/agent-security-review

Agent Security

Security basics for systems that can plan, use tools, persist state, and take actions across multiple steps.

A more grounded model for agent-specific riskBetter boundaries around tools and action execution
Open guide
/adversarial-ml-and-model-risk

Adversarial ML and Model Risk

A compact guide to adversarial ML concepts and how they connect to modern AI product security.

Cleaner distinctions between model risk and system riskBetter alignment between AI security and traditional security controls
Open guide
Topic Coverage

Prompt engineering, prompt injection, agent security, and more

These topic hubs connect introductory guidance with current research, incident patterns, and product-facing security lessons from the broader AI ecosystem.

AI Red Teaming

Methods, case studies, and tooling for red teaming AI systems end to end.

Open topic
Prompt Engineering

Prompt design patterns, instruction hierarchy, and defensive prompt construction.

Open topic
Prompt Injection

Prompt injection attacks, mitigations, detection, and design patterns for safer AI applications.

Open topic
Agent Security

Controls and attack paths for browsing, tool use, memory, identity, and action-taking agents.

Open topic
Model Evaluation

Safety evaluations, system cards, preparedness, and security measurement for frontier models.

Open topic
Adversarial ML

Adversarial machine learning attacks, taxonomies, and mitigations across the ML lifecycle.

Open topic
Profile

Profile and contact

Focused on AI red teaming, prompt injection risk, agent security, and application-layer failures in LLM and agent systems.