AI red teaming is the practice of testing AI-enabled systems the way an adversary, abusive user, or curious operator would interact with them in production. The real work usually sits in the surrounding application context rather than in isolated model prompts.
AI Red Teaming
Methods, case studies, and tooling for red teaming AI systems end to end.
- Prompt abuse, indirect injection, and trust-boundary failures
- Tool misuse, privilege expansion, and unsafe action chains
- System-level evaluation of how the model, workflow, and controls behave together
- What an attacker can influence, read, or trigger through the model
- Where approvals, isolation, monitoring, or policy controls are missing
- Which failures are model problems versus product and architecture problems
- People studying AI evaluation and red-team programs
- Product and platform teams launching copilots or agents
- Leaders who need concrete examples of AI risk in operational systems
Current notes, events, and source material
These items are included because they add useful evidence, framing, implementation detail, or upcoming context for teams working in this area.
GAISS 2026: IEEE GenAI for Secure Systems
GAISS 2026 is an IEEE conference at the University of Texas at Austin focused on generative AI for secure systems, including red teaming, blue-team automation, governance, and agentic secure AI.
DEF CON 34 / AI Village 2026
DEF CON 34 takes place in Las Vegas and is expected to include AI security activity through villages, workshops, contests, and community-led research tracks as schedules firm up.
Black Hat USA 2026 AI Summit
Black Hat USA 2026 includes an AI Summit and security briefings in Las Vegas focused on how artificial intelligence is changing digital defense.
New Attacks Trick OpenClaw AI Agent Into Running Code and Leaking Secrets
Two security teams have shown, in separate research published this week, that OpenClaw, the popular self-hosted AI agent, can be driven to run attacker-controlled code or hand over sensitive data through ordinary-looking inputs. Imperva buried instructions inside shared contacts, vCards, and location pins that the agen
Turn specs into evals for any agent with ASSERT
Adaptive Spec-driven Scoring for Evaluation and Regression Testing (ASSERT) is an open-source framework for converting natural language behavior requirements into executable evaluations of AI models and agents. The post Turn specs into evals for any agent with ASSERT appeared first on Microsoft Security Blog .
Anthropic Releases Claude Fable 5, Its Most Powerful AI Yet, With Cyber Safeguards
On June 9, Anthropic released Claude Fable 5, the most capable model it has ever made, generally available. It also did something unusual: it shipped one model as two products, split not by capability but by a layer of safety classifiers. Fable 5 goes to the public. Its twin, Claude Mythos 5, the same underlying model
Reconstructing AI activity in investigations
Learn how to investigate AI activity in Microsoft 365 Copilot and Azure AI services using a structured, telemetry-driven approach. This playbook helps security teams reconstruct events, assess data exposure, and detect potential threats faster. The post Reconstructing AI activity in investigations appeared first on Mic
AI brands as bait: How threat actors are using the AI hype in social engineering
As threat actors operationalize AI to accelerate attacks, they are also leveraging the wider global interest around AI itself as a social engineering lure. The post AI brands as bait: How threat actors are using the AI hype in social engineering appeared first on Microsoft Security Blog .
Securing CI/CD in an agentic world: Claude Code Github action case
Microsoft Threat Intelligence identified a prompt injection pathway in Claude Code GitHub Action that allowed access to workflow secrets under specific conditions. This research examines the attack chain, responsible disclosure process, Anthropic's mitigation, and guidance for securing AI-powered CI/CD workflows. The p
Updating the taxonomy of failure modes in agentic AI systems: What a year of red teaming taught us
A surge in real-world attacks against agentic AI systems is reshaping how we think about risk. Based on 12 months of red teaming, this update introduces seven new failure modes, from supply chain compromise to goal hijacking, and the practical mitigations teams need now. The post Updating the taxonomy of failure modes
Introducing new capabilities to GPT-Rosalind
GPT-Rosalind advances life sciences research with enhanced biological reasoning, medicinal chemistry expertise, genomics analysis, and experimental workflow capabilities.
Preinstall to persistence: Inside the Red Hat npm Miasma credential-stealing campaign
A large-scale npm supply chain attack compromised over 90 versions of @redhat-cloud-services packages, silently infecting CI/CD environments and developer systems. The malicious code steals credentials from GitHub, cloud platforms, and local machines, then spreads like a worm by republishing trusted packages. Discover
Microsoft Build 2026: Securing code, agents, and models across the development lifecycle
Discover how Microsoft enables fast, secure AI development with MDASH and new security capabilities. The post Microsoft Build 2026: Securing code, agents, and models across the development lifecycle appeared first on Microsoft Security Blog .
Gartner Security & Risk Management Summit 2026
Gartner Security & Risk Management Summit 2026 brings CISOs and security leaders together in National Harbor, Maryland, with tracks covering AI, cyber risk, application security, data security, operations, privacy, and governance.
Project Glasswing: An initial update
Anthropic reports early Project Glasswing results using Mythos Preview with infrastructure partners and external testers, including large-scale vulnerability discovery and a cautious disclosure posture.
Patch Tuesday, May 2026 Edition
Artificial intelligence platforms may be just as susceptible to social engineering as human beings, but they are proving remarkably good at finding security vulnerabilities in human-made computer code. That reality is on full display this month with some of the more widely-used software makers -- including Apple, Googl
Anthropic Responsible Scaling Policy v3.2
Anthropic’s current Responsible Scaling Policy page lists v3.2 as effective April 29, 2026, adding formal authority for external review of risk reports and regular briefings to its Long-Term Benefit Trust.
Play video
Claude Opus 4.7 - A New Frontier, in Performance … and Drama
This AI Explained video reviews a major AI development through the lens of benchmarks and evaluation evidence. It is useful context for AI engineering, evaluation, governance, and operational risk.
Patch Tuesday, April 2026 Edition
Krebs on Security covers April 2026 patching activity, including a record-sized Microsoft release and active exploitation notes.
Play video
Bending a Public MCP Server Without Breaking It — Nimrod Hauser, Baz
AI Engineer session on Bending a Public MCP Server Without Breaking It, presented by Nimrod Hauser, Baz. It adds practical context for how teams are building and operating AI systems in production.
Play video
Claude Mythos: Highlights from 244-page Release
This AI Explained video reviews a major AI development through the lens of agentic workflows and tool-use risk. It is useful context for AI engineering, evaluation, governance, and operational risk.
Assessing Claude Mythos Preview’s cybersecurity capabilities
Claude Mythos Preview is a new general-purpose language model that is strikingly capable at computer security tasks. This post provides technical details for researchers and practitioners who want to understand exactly how we have been testing this model, and what we have found over the past month. We hope this will sh
Play video
Two AI Models Set to “stir government urgency”, But Will This Challenge Undo Them?
This AI Explained video reviews a major AI development through the lens of governance and responsible deployment. It is useful context for AI engineering, evaluation, governance, and operational risk.
OpenAI to acquire Promptfoo
OpenAI announced plans to acquire Promptfoo, highlighting automated AI security testing, red teaming, and evaluation as core enterprise requirements.
How AI Assistants are Moving the Security Goalposts
AI-based assistants or "agents" -- autonomous programs that have access to the user's computer, files, online services and can automate virtually any task -- are growing in popularity with developers and IT workers. But as so many eyebrow-raising headlines over the past few weeks have shown, these powerful and assertiv
Reverse engineering Claude's CVE-2026-2796 exploit
This post dives deep into how Claude wrote an exploit for one of the vulnerabilities it found in Firefox.
Play video
What the New ChatGPT 5.4 Means for the World
This AI Explained video reviews a major AI development through the lens of governance and responsible deployment. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
Deadline Day for Autonomous AI Weapons & Mass Surveillance
This AI Explained video reviews a major AI development through the lens of governance and responsible deployment. It is useful context for AI engineering, evaluation, governance, and operational risk.
MITRE ATLAS OpenClaw Investigation Discovers New and Likeliest Techniques
MITRE maps incidents in an open-source agentic ecosystem to ATLAS techniques, showing how AI-first systems create distinct attacker paths.
LLM-discovered 0-days
AI models can now find high-severity vulnerabilities at scale. This is a moment to empower defenders. We're now using Claude to find and help fix vulnerabilities in open source software.
AI Models on Realistic Cyber Ranges
In a recent evaluation of AI models’ cyber capabilities, current Claude models can now succeed at multistage attacks on networks with dozens of hosts using only standard, open-source tools, instead of the custom tools needed by previous generations.
Finding Bugs with Claude and Property-based Testing
Ensuring that programs are bug-free is one of the most challenging aspects of software engineering. We developed an agent that can efficiently identify bugs in large software projects. Our agent infers general properties of code that should be true, and then applies property-based testing. After extensive manual valida
Experimenting with AI to Defend Critical Infrastructure
AI could help defenders of critical infrastructure identify the vulnerabilities that attackers might exploit—and close them before they are exploited. Anthropic has partnered with Pacific Northwest National Laboratory (PNNL) to explore this defensive application of AI, demonstrating both the potential of AI-accelerated
Play video
What the Freakiness of 2025 in AI Tells Us About 2026
This AI Explained video reviews a major AI development through the lens of governance and responsible deployment. It is useful context for AI engineering, evaluation, governance, and operational risk.
Continuously hardening ChatGPT Atlas against prompt injection attacks
OpenAI describes using automated red teaming and reinforcement learning to discover agent prompt injection attacks before they appear in the wild.
Play video
You Are Being Told Contradictory Things About AI
This AI Explained video reviews a major AI development through the lens of governance and responsible deployment. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
Is GPT-5.1 Really an Upgrade? But Models Can Auto-Hack Govts, so … there’s that
This AI Explained video reviews a major AI development through the lens of agentic workflows and tool-use risk. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
OpenAI Tests if GPT-5 Can Automate Your Job - 4 Unexpected Findings
This AI Explained video reviews a major AI development through the lens of AI safety and model behavior. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
ChatGPT Can Now Call the Cops, but 'Wait till 2100 for Full Job Impact' - Altman
This AI Explained video reviews a major AI development through the lens of governance and responsible deployment. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
Scaling AI Agents Without Breaking Reliability — Preeti Somal, Temporal
AI Engineer session on Scaling AI Agents Without Breaking Reliability, presented by Preeti Somal, Temporal. It adds practical context for how teams are building and operating AI systems in production.
Play video
AI Red Teaming Agent: Azure AI Foundry — Nagkumar Arkalgud & Keiji Kanazawa, Microsoft
AI Engineer session on AI Red Teaming Agent: Azure AI Foundry, presented by Nagkumar Arkalgud & Keiji Kanazawa, Microsoft. It adds practical context for how teams are building and operating AI systems in production.
Play video
Production software keeps breaking and it will only get worse — Anish Agarwal, Traversal.ai
AI Engineer session on Production software keeps breaking and it will only get worse, presented by Anish Agarwal, Traversal.ai. It adds practical context for how teams are building and operating AI systems in production.
Play video
When Vectors Break Down: Graph-Based RAG for Dense Enterprise Knowledge - Sam Julien, Writer
AI Engineer session on When Vectors Break Down: Graph-Based RAG for Dense Enterprise Knowledge - Sam Julien, Writer. It adds practical context for how teams are building and operating AI systems in production.
Play video
Prompt Engineering and AI Red Teaming — Sander Schulhoff, HackAPrompt/LearnPrompting
AI Engineer session on Prompt Engineering and AI Red Teaming, presented by Sander Schulhoff, HackAPrompt/LearnPrompting. It adds practical context for how teams are building and operating AI systems in production.
Play video
Break It 'Til You Make It: Building the Self-Improving Stack for AI Agents - Aparna Dhinakaran
AI Engineer session on Break It 'Til You Make It: Building the Self-Improving Stack for AI Agents - Aparna Dhinakaran. It adds practical context for how teams are building and operating AI systems in production.
Play video
Breaking the Chain: Agent Continuations for Resumable AI Workflows - Greg Benson
AI Engineer session on Breaking the Chain: Agent Continuations for Resumable AI Workflows - Greg Benson. It adds practical context for how teams are building and operating AI systems in production.
Play video
When Will AI Models Blackmail You, and Why?
This AI Explained video reviews a major AI development through the lens of agentic workflows and tool-use risk. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
Claude 4: Full 120 Page Breakdown … Is it the Best New Model?
This AI Explained video reviews a major AI development through the lens of benchmarks and evaluation evidence. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
"OpenAI is Not God” - The DeepSeek Documentary on Liang Wenfeng, R1 and What's Next
This AI Explained video reviews a major AI development through the lens of benchmarks and evaluation evidence. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
How Codeium Breaks Through the Ceiling for Retrieval: Kevin Hou
AI Engineer session on How Codeium Breaks Through the Ceiling for Retrieval: Kevin Hou. It adds practical context for how teams are building and operating AI systems in production.
Play video
The Adversarial Path to the Personal Assistant: Sumit Agarwal
AI Engineer session on The Adversarial Path to the Personal Assistant: Sumit Agarwal. It adds practical context for how teams are building and operating AI systems in production.
Play video
Breaking AI's 1-GHz Barrier: Sunny Madra (Groq)
AI Engineer session on Breaking AI's 1-GHz Barrier: Sunny Madra (Groq). It adds practical context for how teams are building and operating AI systems in production.
Play video
Understanding AI Stakes to Break Production Code: Philip Rathle
AI Engineer session on Understanding AI Stakes to Break Production Code: Philip Rathle. It adds practical context for how teams are building and operating AI systems in production.
Play video
AI CEO: ‘Stock Crash Could Stop AI Progress’, Llama 4 Anti-climax + ‘Superintelligence in 2027’ ...
This AI Explained video reviews a major AI development through the lens of benchmarks and evaluation evidence. It is useful context for AI engineering, evaluation, governance, and operational risk.
Progress from our Frontier Red Team
Anthropic shares lessons from frontier red teaming and discusses where models are showing early-warning signs of higher-risk cyber and biology capabilities.
Play video
Manus AI - The Calm Before the Hypestorm … (vs Deep Research + Grok 3)
This AI Explained video reviews a major AI development through the lens of agentic workflows and tool-use risk. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
Claude 3.7 is More Significant than its Name Implies (ft DeepSeek R2 + GPT 4.5 coming soon)
This AI Explained video reviews a major AI development through the lens of governance and responsible deployment. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
o3-mini and the “AI War”
This AI Explained video reviews a major AI development through the lens of benchmarks and evaluation evidence. It is useful context for AI engineering, evaluation, governance, and operational risk.
Enhancing AI safety: Insights and lessons from red teaming
Microsoft summarizes lessons from red teaming more than one hundred generative AI products, emphasizing system-level testing, human expertise, and automation.
3 takeaways from red teaming 100 generative AI products
Microsoft Security distills lessons from red teaming more than 100 generative AI products, including multimodal prompt injection and core cyber hygiene.
Play video
o3 - wow
This AI Explained video reviews a major AI development through the lens of benchmarks and evaluation evidence. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
o1 Pro Mode – ChatGPT Pro Full Analysis (plus o1 paper highlights)
This AI Explained video reviews a major AI development through the lens of benchmarks and evaluation evidence. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
$125B for Superintelligence? 3 Models Coming, Sutskever's Secret SSI, & Data Centers (in space)...
This AI Explained video reviews a major AI development through the lens of benchmarks and evaluation evidence. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
Was GPT-5 Underwhelming, Or Not? OpenAI Co-founder Exits, Figure02 Arrives, Character.AI Gutted
This AI Explained video reviews a major AI development through the lens of AI safety and model behavior. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
Llama 405b: Full 92 page Analysis, and Uncontaminated SIMPLE Benchmark Results
This AI Explained video reviews a major AI development through the lens of benchmarks and evaluation evidence. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
How Far Can We Scale AI? Gen 3, Claude 3.5 Sonnet and AI Hype
This AI Explained video reviews a major AI development through the lens of AI safety and model behavior. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
AI Won't Be AGI, Until It Can At Least Do This (plus 6 key ways LLMs are being upgraded)
This AI Explained video reviews a major AI development through the lens of governance and responsible deployment. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
Microsoft Promises a 'Whale' for GPT-5, Anthropic Delves Inside a Model’s Mind and Altman Stumbles
This AI Explained video reviews a major AI development through the lens of AI safety and model behavior. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
New OpenAI Model 'Imminent' and AI Stakes Get Raised (plus Med Gemini, GPT 2 Chatbot and Scale AI)
This AI Explained video reviews a major AI development through the lens of agentic workflows and tool-use risk. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
Move Fast Break Nothing: Dedy Kredo
AI Engineer session on Move Fast Break Nothing: Dedy Kredo. It adds practical context for how teams are building and operating AI systems in production.
Play video
‘Her’ AI, Almost Here? Llama 3, Vasa-1, and Altman ‘Plugging Into Everything You Want To Do’
This AI Explained video reviews a major AI development through the lens of governance and responsible deployment. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
5 Key Quotes: Altman, Huang and 'The Most Interesting Year'
This AI Explained video reviews a major AI development through the lens of AI safety and model behavior. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
GPT-5: Everything You Need to Know So Far
This AI Explained video reviews a major AI development through the lens of model capability and AI systems in practice. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
OpenAI Flip-Flops and '10% Chance of Outperforming Humans in Every Task by 2027' - 3K AI Researchers
This AI Explained video reviews a major AI development through the lens of governance and responsible deployment. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
Midjourney v6, Altman 'Age Reversal' and Gemini 2 - Christmas Edition
This AI Explained video reviews a major AI development through the lens of multimodal generation and provenance. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
Q* - Clues to the Puzzle?
This AI Explained video reviews a major AI development through the lens of governance and responsible deployment. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
Altman Out: Reasons, Reactions and the Repercussions for the Industry
This AI Explained video reviews a major AI development through the lens of AI safety and model behavior. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
AI Declarations and AGI Timelines – Looking More Optimistic?
This AI Explained video reviews a major AI development through the lens of governance and responsible deployment. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
The New Bard and AI Images, Videos, and Translations
This AI Explained video reviews a major AI development through the lens of scaling and compute economics. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
SmartGPT: Major Benchmark Broken - 89.0% on MMLU + Exam's Many Errors
This AI Explained video reviews a major AI development through the lens of benchmarks and evaluation evidence. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
AI Los Alamos? + New Realistic AI Avatars
This AI Explained video reviews a major AI development through the lens of AI safety and model behavior. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
11 Major AI Developments: RT-2 to '100X GPT-4'
This AI Explained video reviews a major AI development through the lens of AI safety and model behavior. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
Llama 2: Full Breakdown
This AI Explained video reviews a major AI development through the lens of governance and responsible deployment. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
Time Until Superintelligence: 1-2 Years, or 20? Something Doesn't Add Up
This AI Explained video reviews a major AI development through the lens of benchmarks and evaluation evidence. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
Phi-1: A 'Textbook' Model
This AI Explained video reviews a major AI development through the lens of benchmarks and evaluation evidence. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
Google Gemini: AlphaGo-GPT?
This AI Explained video reviews a major AI development through the lens of agentic workflows and tool-use risk. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
'Show Your Working': ChatGPT Performance Doubled w/ Process Rewards (+Synthetic Data Event Horizon)
This AI Explained video reviews a major AI development through the lens of benchmarks and evaluation evidence. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
Hassabis, Altman and AGI Labs Unite - AI Extinction Risk Statement [ft. Sutskever, Hinton + Voyager]
This AI Explained video reviews a major AI development through the lens of AI safety and model behavior. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
'This Could Go Quite Wrong' - Altman Testimony, GPT 5 Timeline, Self-Awareness, Drones and more
This AI Explained video reviews a major AI development through the lens of governance and responsible deployment. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
Enter PaLM 2 (New Bard): Full Breakdown - 92 Pages Read and Gemini Before GPT 5? Google I/O
This AI Explained video reviews a major AI development through the lens of governance and responsible deployment. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
8 Signs It's The Future: Thought-to-Text, Nvidia Text-to-Video, Character AI, and P(Doom) @Ted
This AI Explained video reviews a major AI development through the lens of agentic workflows and tool-use risk. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
‘We Must Slow Down the Race’ – X AI, GPT 4 Can Now Do Science and Altman GPT 5 Statement
This AI Explained video reviews a major AI development through the lens of agentic workflows and tool-use risk. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
GPT 5 Will be Released 'Incrementally' - 5 Points from Brockman Statement [plus Timelines & Safety]
This AI Explained video reviews a major AI development through the lens of AI safety and model behavior. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
'Pause Giant AI Experiments' - Letter Breakdown w/ Research Papers, Altman, Sutskever and more
This AI Explained video reviews a major AI development through the lens of AI safety and model behavior. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
What's Up With Bard? 9 Examples + 6 Reasons Google Fell Behind [ft. Muse, Med-PaLM 2 and more]
This AI Explained video reviews a major AI development through the lens of AI safety and model behavior. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
GPT 4: Full Breakdown (14 Details You May Have Missed)
This AI Explained video reviews a major AI development through the lens of AI safety and model behavior. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
GPT 5 is All About Data
This AI Explained video reviews a major AI development through the lens of benchmarks and evaluation evidence. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
8 New Ways to Use Bing's Upgraded 8 [now 20] Message Limit (ft. pdfs, quizzes, tables, scenarios...)
This AI Explained video reviews a major AI development through the lens of model capability and AI systems in practice. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
9 of the Best Bing (GPT 4) Prompts
This AI Explained video reviews a major AI development through the lens of model capability and AI systems in practice. It is useful context for AI engineering, evaluation, governance, and operational risk.