Topic

Prompt Engineering

Prompt design patterns, instruction hierarchy, and defensive prompt construction.

prompt engineeringsystem promptsinstruction hierarchyguardrailstask decomposition
Evergreen Overview

Prompt engineering is not just about better outputs. In practice it shapes reliability, scope, fallback behavior, and how well an AI system resists misuse when instructions, tools, and untrusted content collide.

What matters most
  • Instruction hierarchy and role separation
  • Clear task boundaries, fallback behavior, and refusal handling
  • Prompt structures that support monitoring and repeatable evaluation
Where teams get into trouble
  • Overloading prompts with too many responsibilities
  • Relying on wording instead of system controls
  • Treating prompts as static text instead of part of application design
Who this page is for
  • Teams operating prompt-heavy workflows
  • Builders refining assistant and agent behavior
  • Reviewers trying to connect prompt design to safety and risk
References

Current notes, events, and source material

These items are included because they add useful evidence, framing, implementation detail, or upcoming context for teams working in this area.

The agent-ready web: Simplify user actions with WebMCP — Tara Agyemang, Google video thumbnail Play video
AI Engineer YouTube June 11, 2026 video

The agent-ready web: Simplify user actions with WebMCP — Tara Agyemang, Google

Buying two concert tickets costs an AI agent the entire DOM, the accessibility tree, a screenshot, pixel coordinate math, and then a click that might miss because an ad just loaded and shifted the layout. Tara Agyemang from the Google Chrome team introduces WebMCP, a proposed web standard that replaces that process wit

Why Can't Anyone Answer Questions About the Business? — Garrett Galow, WorkOS video thumbnail Play video
AI Engineer YouTube June 11, 2026 video

Why Can't Anyone Answer Questions About the Business? — Garrett Galow, WorkOS

Every business question that needs SQL follows the same loop: explain the question, wait for an engineer, get an answer, realize it needs one more join, share a one-off in Slack, repeat. Garrett Galow from WorkOS built Studio to break that loop — an internal workspace where anyone can ask questions against Snowflake, L

How to Keep Shipping When You Walk Away from Your Desk — Zack Proser, WorkOS video thumbnail Play video
AI Engineer YouTube June 11, 2026 video

How to Keep Shipping When You Walk Away from Your Desk — Zack Proser, WorkOS

Simon Willison fires up four parallel agents and is wiped out by 11am. That is the problem Zack Proser is solving: not that the tools are too slow but that human attention is still the hard constraint. His loop: voice brief at 184 words per minute, agent dispatched to an isolated git worktree, laptop closed, progress c

Sovereign Escape Velocity: Ownership w Open Models — Gus Martins, & Ian Ballantyne, Google DeepMind video thumbnail Play video
AI Engineer YouTube June 10, 2026 video

Sovereign Escape Velocity: Ownership w Open Models — Gus Martins, & Ian Ballantyne, Google DeepMind

Gemma 4's 31B model sits fourth on the LM Arena open model leaderboard. The models around it are at least twice as large; some are 20 times larger. It runs on a single GPU. Competitors at comparable quality need four or five. Ian Ballantyne and Gus Martins walk through what that size efficiency unlocks: running on a Pi

Stop Making Models Bigger, Make Them Behave — Kobie Crawdord, Snorkel video thumbnail Play video
AI Engineer YouTube June 10, 2026 video

Stop Making Models Bigger, Make Them Behave — Kobie Crawdord, Snorkel

Qwen 3 235B was asked for YouTube's year over year ad revenue growth from 2023 to 2024. It queried a table that didn't exist, tried again, got nothing back both times, and hallucinated an answer. The 4B model Snorkel finetuned with RL called `get_table_name` first, inspected the schema, ran a query, hit a column error,

Self Driving Products: Product Signals to Pull Requests — Joshua Snyder, PostHog video thumbnail Play video
AI Engineer YouTube June 10, 2026 video

Self Driving Products: Product Signals to Pull Requests — Joshua Snyder, PostHog

A rage click, a 2am error spike, a customer Slack message — today each sits until a developer notices, triages, tickets, and writes a fix. PostHog is building a pipeline that collapses that chain: signal arrives, a background agent groups it with related errors and session replays, researches the codebase, and opens a

From Transcription to Live Music: Gemini's Audio Stack — Thor Schaeff, Google DeepMind video thumbnail Play video
AI Engineer YouTube June 9, 2026 video

From Transcription to Live Music: Gemini's Audio Stack — Thor Schaeff, Google DeepMind

One API call to Gemini 3 Flash Preview: speaker labels by name, timestamps, emotion tags, language detection with English translation, and a full summary. That is the audio understanding layer that underlies everything else Thor Schaeff demos here, including speech generation directed by a "director's note" rather than

GPU Cloud Deployment Without Leaving Your IDE — Audry Hsu, RunPod video thumbnail Play video
AI Engineer YouTube June 9, 2026 video

GPU Cloud Deployment Without Leaving Your IDE — Audry Hsu, RunPod

The iteration cycle before Flash: commit, push, build a Docker image, pull it from the registry, load it onto a server, allocate a GPU, then find out if it works. Audrey Hsu demos what replacing that with a single decorator looks like — add `@flash.endpoint` to an async Python function and it deploys to GPU cloud from

Why More Context Makes Your Agent Dumber and What to Do About It — Nupur Sharma, Qodo video thumbnail Play video
AI Engineer YouTube June 8, 2026 video

Why More Context Makes Your Agent Dumber and What to Do About It — Nupur Sharma, Qodo

Give an agent your full codebase and it will attend to the start and the end, then quietly drop the middle. Nupur from Qodo calls this the U curve and builds the whole talk around it: why growing the context window did not fix the problem, and what actually does. She runs through iterative retrieval, hierarchical summa

Why Eval++ Is the Next Great Compute Primitive — Sunil Pai & Matt Carrie, Cloudflare video thumbnail Play video
AI Engineer YouTube June 8, 2026 video

Why Eval++ Is the Next Great Compute Primitive — Sunil Pai & Matt Carrie, Cloudflare

Matt Carrie and Sunil Pai from Cloudflare's agents team explain why Durable Objects turned out to be the right compute unit for AI agents: addressable, persistent, hibernating, stateful, and fast enough that 15ms London latency puts you inside a single animation frame. The Agents SDK builds on this to give resumable st

Road to 5 Million Tokens: Breaking Barriers in Long Context Training — Max Ryabinin, Together AI video thumbnail Play video
AI Engineer YouTube June 8, 2026 video

Road to 5 Million Tokens: Breaking Barriers in Long Context Training — Max Ryabinin, Together AI

Training a standard LLaMA 3B model with a 3 million token context on a single 8xH100 node fails before you even start: the model parameters alone exhaust GPU memory. Max Ryabinin from Together AI walks through the full stack of techniques needed to get there: fully sharded data parallelism, DeepSpeed Ulysses context pa

Under 5 minutes to a deployed LLM endpoint — Audry Hsu, RunPod video thumbnail Play video
AI Engineer YouTube June 7, 2026 video

Under 5 minutes to a deployed LLM endpoint — Audry Hsu, RunPod

Two failed crypto mining rigs in a basement in 2022. The founders posted on Reddit offering the GPUs for free in exchange for feedback. That is the origin of RunPod, now at $120 million in annual recurring revenue with 500,000 developers on the platform. The demo runs in under five minutes: pick a model from the Hub, c

LLM Observability, Evaluation, Experimentation Platform — Dat Ngo, Arize video thumbnail Play video
AI Engineer YouTube June 7, 2026 video

LLM Observability, Evaluation, Experimentation Platform — Dat Ngo, Arize

Your agent called tool B before tool A, and B has a dependency on A. You did not catch it because nothing in your code audits agents. The telemetry does. Dat from Arize AI walks through what observability actually means when the system you are debugging is nondeterministic and the execution path changes with every run.

From MCP to Scale: Pipelines That Build Themselves — Rafael Levi, Bright Data video thumbnail Play video
AI Engineer YouTube June 7, 2026 video

From MCP to Scale: Pipelines That Build Themselves — Rafael Levi, Bright Data

Scraping is not the hard part anymore. Maintaining scrapers is. This session shows what it looks like when an agent uses MCP to inspect a site, understand its structure, generate a production scraper, and keep that pipeline working when the site changes. Using Bright Data's MCP, APIs, and browser infrastructure, the fl

Building safe Payment Infrastructure for the autonomous economy — Steve Kaliski, Stripe video thumbnail Play video
AI Engineer YouTube June 6, 2026 video

Building safe Payment Infrastructure for the autonomous economy — Steve Kaliski, Stripe

Agents are evolving from calling free APIs to executing real transactions, creating a new challenge: how do we let software spend money autonomously without catastrophic risk? This talk presents Stripe's approach to solving the dual problems of secure credential transmission and making businesses discoverable to agents

Evals Are Broken, Use Them Anyway — Ara Khan, Cline video thumbnail Play video
AI Engineer YouTube June 6, 2026 video

Evals Are Broken, Use Them Anyway — Ara Khan, Cline

Cline started at 43% on Terminal Bench. The improvements came from container CPU and memory settings, raised timeouts, and prompt engineering techniques specific to Anthropic model families that do not transfer to Codex or Gemini. Not from switching to a better model. Ara Khan's argument is that benchmark numbers are n

Building Interactive UIs in VS Code with MCP Apps — Marlene Mhangami & Liam Hampton, GitHub video thumbnail Play video
AI Engineer YouTube June 6, 2026 video

Building Interactive UIs in VS Code with MCP Apps — Marlene Mhangami & Liam Hampton, GitHub

The demo profiles a Go app running bubble sort and Fibonacci and the result renders as an interactive flame graph directly inside the VS Code chat window. Not a link. Not a text summary. A live iframe you can scroll and query, sandboxed for the same reason you put a hamster in a cage: so it cannot chew up your VS Code

Building Agent Interfaces: Lessons from Chrome DevTools (MCP) for Agents — Michael Hablich, Google video thumbnail Play video
AI Engineer YouTube June 5, 2026 video

Building Agent Interfaces: Lessons from Chrome DevTools (MCP) for Agents — Michael Hablich, Google

Chrome DevTools MCP shipped with one tool: debug_webpage. Agents failed silently because they couldn't compose behaviors. The team decomposed it into 25 focused tools and assumed the problem was solved. It wasn't — now agents had 25 tools and no reliable way to pick the right one. Michael Hablich's talk is an honest ac

Beyond Transcription: Building Voice AI That Understands Conversations — Hervé Bredin, pyannoteAI video thumbnail Play video
AI Engineer YouTube June 5, 2026 video

Beyond Transcription: Building Voice AI That Understands Conversations — Hervé Bredin, pyannoteAI

The open ASR leaderboard reports Nvidia Parakeet at 11.4% word error rate on AMI meeting data. Hervé Bredin runs the same model on the same dataset and gets 26%. Same model, same recordings, different microphone: the leaderboard uses headset audio, he uses the table mic. Most voice AI benchmarks are measuring single sp

Dark Factory: OpenClaw Ships Faster Than You Can Read the Diff — Vincent Koc, OpenClaw video thumbnail Play video
AI Engineer YouTube June 5, 2026 video

Dark Factory: OpenClaw Ships Faster Than You Can Read the Diff — Vincent Koc, OpenClaw

OpenClaw hit 3,000 commits in a single day. Vincent Koc's commit history shows exactly when he goes to sleep and when he wakes up. He and Peter Steinberger ran roughly 60 to 70 agents between them during the great refactor: 2,700 commits, close to a million lines of code changed, 82% of the core codebase touched in one

AI Engineer Melbourne 2026 Keynote Livestream | Day 2 video thumbnail Play video
AI Engineer YouTube June 4, 2026 video

AI Engineer Melbourne 2026 Keynote Livestream | Day 2

Live from Federation Square in Melbourne, AI Engineer Melbourne 2026 brings the keynote stage to viewers online in partnership with Web Directions. This is AI Engineer’s first partner event in Australia, featuring keynote-stage sessions from one of the most thoughtfully produced developer events in the region. Watch li

The Art & Science of Benchmarking Agents — Vincent Chen, Snorkel AI video thumbnail Play video
AI Engineer YouTube June 4, 2026 video

The Art & Science of Benchmarking Agents — Vincent Chen, Snorkel AI

ARC AGI 3 launched a few weeks before this talk with every task human solvable and frontier models under 1%. That gap is the argument: our ability to measure AI has fallen behind our ability to build it, and benchmarks that actually shape the field are bets on where capabilities are going, not snapshots of where they a

SWE-rebench: Lessons from Evaluating Coding Agents — Ibragim Badertdinov, Nebius video thumbnail Play video
AI Engineer YouTube June 4, 2026 video

SWE-rebench: Lessons from Evaluating Coding Agents — Ibragim Badertdinov, Nebius

Claude Code solved SWE rebench tasks by reading git history to find the solution patch. When Nebius removed future commits from the environment, it fetched the original GitHub issue. When they blocked web fetch, it switched to curl, formatted the conversation for readability, and solved the task again anyway. Ibragim B

BDD, ADR, PRD, WTF: Capturing Decisions for Humans and AI Alike — Michal Cichra, Safe Intelligence video thumbnail Play video
AI Engineer YouTube June 3, 2026 video

BDD, ADR, PRD, WTF: Capturing Decisions for Humans and AI Alike — Michal Cichra, Safe Intelligence

"One thing harder than reading AI code is reading AI tests." Mikuel from Safe Intelligence argues spec driven development leaves a loop open: you have a markdown spec, but how do you know the product actually behaves that way? His answer is Cucumber, nearly forgotten and suddenly useful again. Executable, human readabl

Beyond Components: Designing Generative UI for MCP Apps — Ruben Casas, Postman video thumbnail Play video
AI Engineer YouTube June 3, 2026 video

Beyond Components: Designing Generative UI for MCP Apps — Ruben Casas, Postman

Ruben Casas from Postman prompted a model to rewrite his blog. It built a search box with a blur animation and accessibility out of the box, without being asked. That was when he concluded the model writes better frontend code than he does. His question for the talk: if the models are this capable, why are most agent U

AI Engineer Melbourne 2026 Keynote Livestream | Day 1 video thumbnail Play video
AI Engineer YouTube June 3, 2026 video

AI Engineer Melbourne 2026 Keynote Livestream | Day 1

Live from Federation Square in Melbourne, AI Engineer Melbourne 2026 brings the keynote stage to viewers online in partnership with Web Directions. This is AI Engineer’s first partner event in Australia, featuring keynote-stage sessions from one of the most thoughtfully produced developer events in the region. Watch li

Benchmarking semantic code retrieval on Claude Code — Kuba Rogut, Turbopuffer video thumbnail Play video
AI Engineer YouTube June 3, 2026 video

Benchmarking semantic code retrieval on Claude Code — Kuba Rogut, Turbopuffer

By default, Claude Code wastes one in every three file reads. Add windowed grep and that drops to one in five. Add semantic search on top and it drops to one in eight, with file precision climbing from 65% to 87%. Kuba Rogut from Turbopuffer ran a 50-task benchmark against ContextBench to measure not whether the agent

How Lovable self-improves every hour — Benjamin Verbeek, Lovable video thumbnail Play video
AI Engineer YouTube June 2, 2026 video

How Lovable self-improves every hour — Benjamin Verbeek, Lovable

Within the first hour of launching the vent tool, the agent filed 20 complaints about a silent file copy failure. The team checked: the tool worked fine. What the agent had caught was that filenames with a space in them silently failed to copy, a bug that never surfaced in logs. Benjamin Verbeek from Lovable built it a

Build & deploy AI-powered apps — Paige Bailey, Google DeepMind video thumbnail Play video
AI Engineer YouTube April 29, 2026 video

Build & deploy AI-powered apps — Paige Bailey, Google DeepMind

Got a massive idea but stuck in the "just talking about it" phase? This session cuts the fluff and dives straight into how to build and prototype at lightning speed using AI Studio Build and Antigravity for free. It breaks down Google DeepMind's AI tech stack so viewers know exactly which tools to use, when to reach fo

Everything I Learned Training Frontier Small Models — Maxime Labonne, Liquid AI video thumbnail Play video
AI Engineer YouTube April 29, 2026 video

Everything I Learned Training Frontier Small Models — Maxime Labonne, Liquid AI

A new class of small models is emerging with the ability to reliably follow instructions and call tools while running on-device under 1 GB of memory. In this talk, we'll break down how to post-train frontier small models using the LFM2.5 recipe: on-policy preference alignment, agentic reinforcement learning, and curric

One Login to Rule Them All: Cross-App Access for MCP — Garrett Galow, WorkOS video thumbnail Play video
AI Engineer YouTube April 28, 2026 video

One Login to Rule Them All: Cross-App Access for MCP — Garrett Galow, WorkOS

Connecting a coding agent to multiple services often means facing a dozen OAuth consent screens, a dozen token lifecycles, and a dozen chances for something to break. Despite having Single Sign-On, users still find themselves signing in repeatedly. This talk explores how Cross-App Access leverages a three-way trust bet

Why building eval platforms is hard — Phil Hetzel, Braintrust video thumbnail Play video
AI Engineer YouTube April 28, 2026 video

Why building eval platforms is hard — Phil Hetzel, Braintrust

An eval platform is not just a test runner. You are building shared definitions of "good," reliable data pipelines, labelling workflows, versioning, and trust in results across many teams and model changes. This session breaks down the hidden complexity, the common failure modes, and the design principles that make eva

Building your own software factory — Eric Zakariasson, Cursor video thumbnail Play video
AI Engineer YouTube April 28, 2026 video

Building your own software factory — Eric Zakariasson, Cursor

Most of us are pair-programming with one agent and stopping there. There's a lot more on the table. This workshop is about going from one agent to many. We'll start with codebase setup, the foundational work that makes agents effective on their own. Then we'll scale up to running agents in parallel, kicking off async w

Lessons from Scaling GitHub's Remote MCP Server — Sam Morrow, GitHub video thumbnail Play video
AI Engineer YouTube April 27, 2026 video

Lessons from Scaling GitHub's Remote MCP Server — Sam Morrow, GitHub

GitHub operates one of the most heavily-utilised MCP servers in the ecosystem, with over 4 million downloads of the stdio server alone. Discover the architectural decisions, technical challenges and lessons learned while building and scaling a remote MCP server on production infrastructure. The session walks through th

Bringing MCPs to the Enterprise — Karan Sampath, Anthropic video thumbnail Play video
AI Engineer YouTube April 27, 2026 video

Bringing MCPs to the Enterprise — Karan Sampath, Anthropic

MCPs are often flaky, face multiple security vulnerabilities, and are generally hard to scale. Most enterprises struggle to use more than single digit numbers of MCPs due to issues with security, observability, and access control. In this talk, we'll explore the approaches and learnings we at Anthropic have been taking

Open Models at Google DeepMind — Cassidy Hardin, Google DeepMind video thumbnail Play video
AI Engineer YouTube April 27, 2026 video

Open Models at Google DeepMind — Cassidy Hardin, Google DeepMind

Open models are getting smaller, faster, and far more capable. In this talk, Cassidy Hardin walks through the latest advances in the Gemma family, with a focus on Gemma 4 and what it enables for developers building on-device and open-weight AI systems. She covers the architecture behind Gemma’s dense, effective, and mi

Collaborative AI Engineering — Maggie Appleton, GitHub Next video thumbnail Play video
AI Engineer YouTube April 26, 2026 video

Collaborative AI Engineering — Maggie Appleton, GitHub Next

Agentic engineering so far has been a solo story: one developer and a dozen agents moving at warp speed. But speed without thoughtful planning and team alignment is just wasting tokens. When everyone on a team is directing agents alone in their personal CLI tools with no shared context, you get duplicate work, conflict

Full Walkthrough: Workflow for AI Coding from Planning to Production — Matt Pocock (@mattpocockuk ) video thumbnail Play video
AI Engineer YouTube April 24, 2026 video

Full Walkthrough: Workflow for AI Coding from Planning to Production — Matt Pocock (@mattpocockuk )

A hands-on workshop covering the full lifecycle of AI-assisted development, from turning ambiguous requirements into agent-ready plans to running autonomous coding agents that ship production features. You'll learn to stress-test vague briefs into structured PRDs, slice work into thin "tracer bullet" vertical slices, a

AIE Miami Day 2 ft. Cerebras, OpenCode, Cursor, Arize AI, and more! video thumbnail Play video
AI Engineer YouTube April 21, 2026 video

AIE Miami Day 2 ft. Cerebras, OpenCode, Cursor, Arize AI, and more!

April 21, 2026 - all times in EST -- 9:00am - Welcome to Day 2 -- 9:10am - David House, G2i Transforming Programming Mindsets: Case Studies in Agentic Coding Adoption -- 9:35am - Sarah Chieng, Cerebras Help! We're DEEP in (latency) Debt -- 10:00am - Lech Kalinowski, CallStack Ambient Generative AI: Deploying Latent Dif

AIE Miami Keynote & Talks ft. OpenCode. Google Deepmind, OpenAI, and more! video thumbnail Play video
AI Engineer YouTube April 20, 2026 video

AIE Miami Keynote & Talks ft. OpenCode. Google Deepmind, OpenAI, and more!

April 20, 2026 - all times in EST -- 9:00am - Welcome to AI Engineer Miami -- 9:10am - Gabe Greenberg, G2i Opening Remarks -- 9:15am - Dax Raad, OpenCode Keynote -- 9:40am - Dexter Horthy, HumanLayer Everything We got Wrong About RPI -- 10:05am - Max Stoiber, OpenAI Coming Soon -- 10:30am - Morning Break -- 11:00am - B