Topic

Model Evaluation

Safety evaluations, system cards, preparedness, and security measurement for frontier models.

system cardevaluationpreparednessbenchmarkfrontier risk

Evergreen Overview

Model evaluation is where teams turn high-level claims about safety, preparedness, or quality into measurable evidence. For operational AI systems, evaluations matter most when they reflect the system context in which the model is actually being used.

What evaluations should cover

Capability, misuse, and safety behavior under realistic tasks
System cards, preparedness reporting, and evidence for launch decisions
Regression testing so known failures do not quietly reappear

Where programs fall short

Benchmarks that do not match the deployed workflow
Safety claims without repeatable evidence
No connection between findings, mitigations, and re-testing

Who this page is for

Teams building evaluation pipelines
Leaders interpreting evidence for safe deployment
Security and policy teams interpreting model documentation

References

Current notes, events, and source material

These items are included because they add useful evidence, framing, implementation detail, or upcoming context for teams working in this area.

NeurIPS December 6, 2026 - December 12, 2026 event upcoming

NeurIPS 2026

NeurIPS 2026 is the fortieth annual Conference on Neural Information Processing Systems, with the primary dates listed for Sydney, Australia, and additional satellite locations in Atlanta and Paris.

Model Evaluation Adversarial ML AI Compliance

View details Open event page

OpenAI September 29, 2026 event upcoming

OpenAI DevDay 2026

OpenAI DevDay 2026 is scheduled for September 29 in San Francisco and is OpenAI’s primary developer event for platform updates.

AI Engineering Agent Security Model Evaluation

View details Open event page

ICML July 6, 2026 - July 11, 2026 event upcoming

ICML 2026

ICML 2026 takes place at COEX in Seoul, South Korea, with tutorials, main conference sessions, and workshops covering core machine learning research.

Model Evaluation Adversarial ML AI Engineering

View details Open event page

Databricks June 15, 2026 - June 18, 2026 event upcoming

Data + AI Summit 2026

Data + AI Summit 2026 is Databricks’ global data and AI conference in San Francisco and online, with 800+ sessions across data engineering, analytics, ML, governance, and agent applications.

AI Engineering AI Compliance Model Evaluation

View details Open event page

Claude Fable Blocked - 11 Quiet Details on What’s Next video thumbnail

AI Explained YouTube June 14, 2026 video

Claude Fable Blocked - 11 Quiet Details on What’s Next

Claude Fable 5 banned, but what’s the bigger story. We go through 11 under-reported details, so you have the context to see what’s coming next for your use of AI. From whether the ban will last, what the possible motives are, what the model can actually do, and some wild over-extrapolations going on. Check out my fast-

Model Evaluation AI Compliance Agent Security

Open notes Watch on YouTube

Google Cloud Security Blog June 12, 2026 news

Powering the next era of Confidential AI

We are thrilled to collaborate with Apple on its expanded Private Cloud Compute (PCC) systems announced this week at WWDC 2026.

Agent Security Prompt Injection Model Evaluation

Read summary Source link

The agent-ready web: Simplify user actions with WebMCP — Tara Agyemang, Google video thumbnail

AI Engineer YouTube June 11, 2026 video

The agent-ready web: Simplify user actions with WebMCP — Tara Agyemang, Google

Buying two concert tickets costs an AI agent the entire DOM, the accessibility tree, a screenshot, pixel coordinate math, and then a click that might miss because an ad just loaded and shifted the layout. Tara Agyemang from the Google Chrome team introduces WebMCP, a proposed web standard that replaces that process wit

AI Engineering Model Evaluation Prompt Engineering

Open notes Watch on YouTube

Why Can't Anyone Answer Questions About the Business? — Garrett Galow, WorkOS video thumbnail

AI Engineer YouTube June 11, 2026 video

Why Can't Anyone Answer Questions About the Business? — Garrett Galow, WorkOS

Every business question that needs SQL follows the same loop: explain the question, wait for an engineer, get an answer, realize it needs one more join, share a one-off in Slack, repeat. Garrett Galow from WorkOS built Studio to break that loop — an internal workspace where anyone can ask questions against Snowflake, L

AI Engineering Model Evaluation Prompt Engineering

Open notes Watch on YouTube

How to Keep Shipping When You Walk Away from Your Desk — Zack Proser, WorkOS video thumbnail

AI Engineer YouTube June 11, 2026 video

How to Keep Shipping When You Walk Away from Your Desk — Zack Proser, WorkOS

Simon Willison fires up four parallel agents and is wiped out by 11am. That is the problem Zack Proser is solving: not that the tools are too slow but that human attention is still the hard constraint. His loop: voice brief at 184 words per minute, agent dispatched to an isolated git worktree, laptop closed, progress c

AI Engineering Model Evaluation Prompt Engineering

Open notes Watch on YouTube

Google DeepMind Blog June 10, 2026 news

Investing in multi-agent AI safety research

Google DeepMind and partners announce a $10M funding call for multi-agent safety research.

Model Evaluation Agent Security AI Engineering

Read summary Source link

Sovereign Escape Velocity: Ownership w Open Models — Gus Martins, & Ian Ballantyne, Google DeepMind video thumbnail

AI Engineer YouTube June 10, 2026 video

Sovereign Escape Velocity: Ownership w Open Models — Gus Martins, & Ian Ballantyne, Google DeepMind

Gemma 4's 31B model sits fourth on the LM Arena open model leaderboard. The models around it are at least twice as large; some are 20 times larger. It runs on a single GPU. Competitors at comparable quality need four or five. Ian Ballantyne and Gus Martins walk through what that size efficiency unlocks: running on a Pi

AI Engineering Model Evaluation Prompt Engineering

Open notes Watch on YouTube

Stop Making Models Bigger, Make Them Behave — Kobie Crawdord, Snorkel video thumbnail

AI Engineer YouTube June 10, 2026 video

Stop Making Models Bigger, Make Them Behave — Kobie Crawdord, Snorkel

Qwen 3 235B was asked for YouTube's year over year ad revenue growth from 2023 to 2024. It queried a table that didn't exist, tried again, got nothing back both times, and hallucinated an answer. The 4B model Snorkel finetuned with RL called `get_table_name` first, inspected the schema, ran a query, hit a column error,

AI Engineering Model Evaluation Prompt Engineering

Open notes Watch on YouTube

Claude Fable 5 - Full 319 page Breakdown video thumbnail

AI Explained YouTube June 10, 2026 video

Claude Fable 5 - Full 319 page Breakdown

Fable 5 is out - and it’s good, very good. But beyond the splashy demos, I want to bring you the 20+ nuggets from the 319 page system card, which I read in full, all day, plus benchmarks you may not have noticed. https://assemblyai.com/aiexplained Plus two worrying trends inside the ‘mind’ of Claude, how OpenAI counter

Model Evaluation AI Compliance Agent Security

Open notes Watch on YouTube

Self Driving Products: Product Signals to Pull Requests — Joshua Snyder, PostHog video thumbnail

AI Engineer YouTube June 10, 2026 video

Self Driving Products: Product Signals to Pull Requests — Joshua Snyder, PostHog

A rage click, a 2am error spike, a customer Slack message — today each sits until a developer notices, triages, tickets, and writes a fix. PostHog is building a pipeline that collapses that chain: signal arrives, a background agent groups it with related errors and session replays, researches the codebase, and opens a

AI Engineering Model Evaluation Prompt Engineering

Open notes Watch on YouTube

Google Cloud Security Blog June 9, 2026 news

Detecting and containing AI-powered threats with Google Security Operations agents

Learn how Google Security Operations works in concert with AI Threat Defense to monitor, detect, and respond to threats, particularly from code you do not own or can not patch.

Agent Security Prompt Injection Model Evaluation

Read summary Source link

From Transcription to Live Music: Gemini's Audio Stack — Thor Schaeff, Google DeepMind video thumbnail

AI Engineer YouTube June 9, 2026 video

From Transcription to Live Music: Gemini's Audio Stack — Thor Schaeff, Google DeepMind

One API call to Gemini 3 Flash Preview: speaker labels by name, timestamps, emotion tags, language detection with English translation, and a full summary. That is the audio understanding layer that underlies everything else Thor Schaeff demos here, including speech generation directed by a "director's note" rather than

AI Engineering Model Evaluation Prompt Engineering

Open notes Watch on YouTube

RAG is dead, right?? — Kuba Rogut, Turbopuffer video thumbnail

AI Engineer YouTube June 9, 2026 video

RAG is dead, right?? — Kuba Rogut, Turbopuffer

Cursor added semantic search and measured a 24% increase in answer accuracy on their composer model, a 2.6% gain in code retention in large codebases, and a 2.2% drop in dissatisfied user requests. Those numbers look small until you factor in that semantic search does not fire on every query. Meanwhile Google search vo

AI Engineering Model Evaluation Prompt Engineering

Open notes Watch on YouTube

2026 AI Engineer Vibe Reel video thumbnail

AI Engineer YouTube June 9, 2026 video

2026 AI Engineer Vibe Reel

W are getting ready for the World's Fair in San Francisco - Jun 29 to July 2! https://ai.engineer/wf - get tickets and see schedule!

AI Engineering Model Evaluation Prompt Engineering

Open notes Watch on YouTube

GPU Cloud Deployment Without Leaving Your IDE — Audry Hsu, RunPod video thumbnail

AI Engineer YouTube June 9, 2026 video

GPU Cloud Deployment Without Leaving Your IDE — Audry Hsu, RunPod

The iteration cycle before Flash: commit, push, build a Docker image, pull it from the registry, load it onto a server, allocate a GPU, then find out if it works. Audrey Hsu demos what replacing that with a single decorator looks like — add `@flash.endpoint` to an async Python function and it deploys to GPU cloud from

AI Engineering Model Evaluation Prompt Engineering

Open notes Watch on YouTube

Why More Context Makes Your Agent Dumber and What to Do About It — Nupur Sharma, Qodo video thumbnail

AI Engineer YouTube June 8, 2026 video

Why More Context Makes Your Agent Dumber and What to Do About It — Nupur Sharma, Qodo

Give an agent your full codebase and it will attend to the start and the end, then quietly drop the middle. Nupur from Qodo calls this the U curve and builds the whole talk around it: why growing the context window did not fix the problem, and what actually does. She runs through iterative retrieval, hierarchical summa

AI Engineering Model Evaluation Prompt Engineering

Open notes Watch on YouTube

Why Eval++ Is the Next Great Compute Primitive — Sunil Pai & Matt Carrie, Cloudflare video thumbnail

AI Engineer YouTube June 8, 2026 video

Why Eval++ Is the Next Great Compute Primitive — Sunil Pai & Matt Carrie, Cloudflare

Matt Carrie and Sunil Pai from Cloudflare's agents team explain why Durable Objects turned out to be the right compute unit for AI agents: addressable, persistent, hibernating, stateful, and fast enough that 15ms London latency puts you inside a single animation frame. The Agents SDK builds on this to give resumable st

AI Engineering Model Evaluation Prompt Engineering

Open notes Watch on YouTube

Road to 5 Million Tokens: Breaking Barriers in Long Context Training — Max Ryabinin, Together AI video thumbnail

AI Engineer YouTube June 8, 2026 video

Road to 5 Million Tokens: Breaking Barriers in Long Context Training — Max Ryabinin, Together AI

Training a standard LLaMA 3B model with a 3 million token context on a single 8xH100 node fails before you even start: the model parameters alone exhaust GPU memory. Max Ryabinin from Together AI walks through the full stack of techniques needed to get there: fully sharded data parallelism, DeepSpeed Ulysses context pa

AI Engineering Model Evaluation Prompt Engineering

Open notes Watch on YouTube

Under 5 minutes to a deployed LLM endpoint — Audry Hsu, RunPod video thumbnail

AI Engineer YouTube June 7, 2026 video

Under 5 minutes to a deployed LLM endpoint — Audry Hsu, RunPod

Two failed crypto mining rigs in a basement in 2022. The founders posted on Reddit offering the GPUs for free in exchange for feedback. That is the origin of RunPod, now at $120 million in annual recurring revenue with 500,000 developers on the platform. The demo runs in under five minutes: pick a model from the Hub, c

AI Engineering Model Evaluation Prompt Engineering

Open notes Watch on YouTube

LLM Observability, Evaluation, Experimentation Platform — Dat Ngo, Arize video thumbnail

AI Engineer YouTube June 7, 2026 video

LLM Observability, Evaluation, Experimentation Platform — Dat Ngo, Arize

Your agent called tool B before tool A, and B has a dependency on A. You did not catch it because nothing in your code audits agents. The telemetry does. Dat from Arize AI walks through what observability actually means when the system you are debugging is nondeterministic and the execution path changes with every run.

AI Engineering Model Evaluation Prompt Engineering

Open notes Watch on YouTube

From MCP to Scale: Pipelines That Build Themselves — Rafael Levi, Bright Data video thumbnail

AI Engineer YouTube June 7, 2026 video

From MCP to Scale: Pipelines That Build Themselves — Rafael Levi, Bright Data

Scraping is not the hard part anymore. Maintaining scrapers is. This session shows what it looks like when an agent uses MCP to inspect a site, understand its structure, generate a production scraper, and keep that pipeline working when the site changes. Using Bright Data's MCP, APIs, and browser infrastructure, the fl

AI Engineering Model Evaluation Prompt Engineering

Open notes Watch on YouTube

Building safe Payment Infrastructure for the autonomous economy — Steve Kaliski, Stripe video thumbnail

AI Engineer YouTube June 6, 2026 video

Building safe Payment Infrastructure for the autonomous economy — Steve Kaliski, Stripe

Agents are evolving from calling free APIs to executing real transactions, creating a new challenge: how do we let software spend money autonomously without catastrophic risk? This talk presents Stripe's approach to solving the dual problems of secure credential transmission and making businesses discoverable to agents

AI Engineering Model Evaluation Prompt Engineering

Open notes Watch on YouTube

Evals Are Broken, Use Them Anyway — Ara Khan, Cline video thumbnail

AI Engineer YouTube June 6, 2026 video

Evals Are Broken, Use Them Anyway — Ara Khan, Cline

Cline started at 43% on Terminal Bench. The improvements came from container CPU and memory settings, raised timeouts, and prompt engineering techniques specific to Anthropic model families that do not transfer to Codex or Gemini. Not from switching to a better model. Ara Khan's argument is that benchmark numbers are n

AI Engineering Model Evaluation Prompt Engineering

Open notes Watch on YouTube

Building Interactive UIs in VS Code with MCP Apps — Marlene Mhangami & Liam Hampton, GitHub video thumbnail

AI Engineer YouTube June 6, 2026 video

Building Interactive UIs in VS Code with MCP Apps — Marlene Mhangami & Liam Hampton, GitHub

The demo profiles a Go app running bubble sort and Fibonacci and the result renders as an interactive flame graph directly inside the VS Code chat window. Not a link. Not a text summary. A live iframe you can scroll and query, sandboxed for the same reason you put a hamster in a cage: so it cannot chew up your VS Code

AI Engineering Model Evaluation Prompt Engineering

Open notes Watch on YouTube

Building Agent Interfaces: Lessons from Chrome DevTools (MCP) for Agents — Michael Hablich, Google video thumbnail

AI Engineer YouTube June 5, 2026 video

Building Agent Interfaces: Lessons from Chrome DevTools (MCP) for Agents — Michael Hablich, Google

Chrome DevTools MCP shipped with one tool: debug_webpage. Agents failed silently because they couldn't compose behaviors. The team decomposed it into 25 focused tools and assumed the problem was solved. It wasn't — now agents had 25 tools and no reliable way to pick the right one. Michael Hablich's talk is an honest ac

AI Engineering Model Evaluation Prompt Engineering

Open notes Watch on YouTube

Beyond Transcription: Building Voice AI That Understands Conversations — Hervé Bredin, pyannoteAI video thumbnail

AI Engineer YouTube June 5, 2026 video

Beyond Transcription: Building Voice AI That Understands Conversations — Hervé Bredin, pyannoteAI

The open ASR leaderboard reports Nvidia Parakeet at 11.4% word error rate on AMI meeting data. Hervé Bredin runs the same model on the same dataset and gets 26%. Same model, same recordings, different microphone: the leaderboard uses headset audio, he uses the table mic. Most voice AI benchmarks are measuring single sp

AI Engineering Model Evaluation Prompt Engineering

Open notes Watch on YouTube

Dark Factory: OpenClaw Ships Faster Than You Can Read the Diff — Vincent Koc, OpenClaw video thumbnail

AI Engineer YouTube June 5, 2026 video

Dark Factory: OpenClaw Ships Faster Than You Can Read the Diff — Vincent Koc, OpenClaw

OpenClaw hit 3,000 commits in a single day. Vincent Koc's commit history shows exactly when he goes to sleep and when he wakes up. He and Peter Steinberger ran roughly 60 to 70 agents between them during the great refactor: 2,700 commits, close to a million lines of code changed, 82% of the core codebase touched in one

AI Engineering Model Evaluation Prompt Engineering

Open notes Watch on YouTube

AI Engineer Melbourne 2026 Keynote Livestream | Day 2 video thumbnail

AI Engineer YouTube June 4, 2026 video

AI Engineer Melbourne 2026 Keynote Livestream | Day 2

Live from Federation Square in Melbourne, AI Engineer Melbourne 2026 brings the keynote stage to viewers online in partnership with Web Directions. This is AI Engineer’s first partner event in Australia, featuring keynote-stage sessions from one of the most thoughtfully produced developer events in the region. Watch li

AI Engineering Model Evaluation Prompt Engineering

Open notes Watch on YouTube

The Art & Science of Benchmarking Agents — Vincent Chen, Snorkel AI video thumbnail

AI Engineer YouTube June 4, 2026 video

The Art & Science of Benchmarking Agents — Vincent Chen, Snorkel AI

ARC AGI 3 launched a few weeks before this talk with every task human solvable and frontier models under 1%. That gap is the argument: our ability to measure AI has fallen behind our ability to build it, and benchmarks that actually shape the field are bets on where capabilities are going, not snapshots of where they a

AI Engineering Model Evaluation Prompt Engineering

Open notes Watch on YouTube

Text Diffusion — Brendon Dillon, Google DeepMind video thumbnail

AI Engineer YouTube June 4, 2026 video

Text Diffusion — Brendon Dillon, Google DeepMind

GPT-4o answered 40. Gemini 2.5 Flash answered 42 and stuck to it even after working through the reasoning incorrectly. The Gemini Diffusion model, considerably smaller than both, answered 60 on the first forward pass, then 49, then corrected itself to 39 once it finished reasoning. Bidirectional attention means it can

AI Engineering Model Evaluation Prompt Engineering

Open notes Watch on YouTube

SWE-rebench: Lessons from Evaluating Coding Agents — Ibragim Badertdinov, Nebius video thumbnail

AI Engineer YouTube June 4, 2026 video

SWE-rebench: Lessons from Evaluating Coding Agents — Ibragim Badertdinov, Nebius

Claude Code solved SWE rebench tasks by reading git history to find the solution patch. When Nebius removed future commits from the environment, it fetched the original GitHub issue. When they blocked web fetch, it switched to curl, formatted the conversation for readability, and solved the task again anyway. Ibragim B

AI Engineering Model Evaluation Prompt Engineering

Open notes Watch on YouTube

OpenAI News June 3, 2026 news

Introducing new capabilities to GPT-Rosalind

GPT-Rosalind advances life sciences research with enhanced biological reasoning, medicinal chemistry expertise, genomics analysis, and experimental workflow capabilities.

AI Compliance AI Red Teaming Model Evaluation

Read summary Source link

BDD, ADR, PRD, WTF: Capturing Decisions for Humans and AI Alike — Michal Cichra, Safe Intelligence video thumbnail

AI Engineer YouTube June 3, 2026 video

BDD, ADR, PRD, WTF: Capturing Decisions for Humans and AI Alike — Michal Cichra, Safe Intelligence

"One thing harder than reading AI code is reading AI tests." Mikuel from Safe Intelligence argues spec driven development leaves a loop open: you have a markdown spec, but how do you know the product actually behaves that way? His answer is Cucumber, nearly forgotten and suddenly useful again. Executable, human readabl

AI Engineering Model Evaluation Prompt Engineering

Open notes Watch on YouTube

Beyond Components: Designing Generative UI for MCP Apps — Ruben Casas, Postman video thumbnail

AI Engineer YouTube June 3, 2026 video

Beyond Components: Designing Generative UI for MCP Apps — Ruben Casas, Postman

Ruben Casas from Postman prompted a model to rewrite his blog. It built a search box with a blur animation and accessibility out of the box, without being asked. That was when he concluded the model writes better frontend code than he does. His question for the talk: if the models are this capable, why are most agent U

AI Engineering Model Evaluation Prompt Engineering

Open notes Watch on YouTube

AI Engineer Melbourne 2026 Keynote Livestream | Day 1 video thumbnail

AI Engineer YouTube June 3, 2026 video

AI Engineer Melbourne 2026 Keynote Livestream | Day 1

Live from Federation Square in Melbourne, AI Engineer Melbourne 2026 brings the keynote stage to viewers online in partnership with Web Directions. This is AI Engineer’s first partner event in Australia, featuring keynote-stage sessions from one of the most thoughtfully produced developer events in the region. Watch li

AI Engineering Model Evaluation Prompt Engineering

Open notes Watch on YouTube

Benchmarking semantic code retrieval on Claude Code — Kuba Rogut, Turbopuffer video thumbnail

AI Engineer YouTube June 3, 2026 video

Benchmarking semantic code retrieval on Claude Code — Kuba Rogut, Turbopuffer

By default, Claude Code wastes one in every three file reads. Add windowed grep and that drops to one in five. Add semantic search on top and it drops to one in eight, with file precision climbing from 65% to 87%. Kuba Rogut from Turbopuffer ran a 50-task benchmark against ContextBench to measure not whether the agent

AI Engineering Model Evaluation Prompt Engineering

Open notes Watch on YouTube

What Lies Beneath the API — Benjamin Cowen, Modal video thumbnail

AI Engineer YouTube June 2, 2026 video

What Lies Beneath the API — Benjamin Cowen, Modal

Intercom is beating their frontier API at one tenth the cost. Pinterest claims orders of magnitude. Ben Cowen from Modal argues this pattern is not the exception for maturing AI products. It is the destination. Frontier labs want their models to win at everything. You want to win at your specific business logic. Those

AI Engineering Model Evaluation Prompt Engineering

Open notes Watch on YouTube

How Lovable self-improves every hour — Benjamin Verbeek, Lovable video thumbnail

AI Engineer YouTube June 2, 2026 video

How Lovable self-improves every hour — Benjamin Verbeek, Lovable

Within the first hour of launching the vent tool, the agent filed 20 complaints about a silent file copy failure. The team checked: the tool worked fine. What the agent had caught was that filenames with a space in them silently failed to copy, a bug that never surfaced in logs. Benjamin Verbeek from Lovable built it a

AI Engineering Model Evaluation Prompt Engineering

Open notes Watch on YouTube

Google Cloud Security Blog May 29, 2026 news

Cloud CISO Perspectives: How to build an AI-ready security program for the public sector

From industrial control systems to decades-old municipal databases, here’s our CISO guidance to prep AI-ready security programs for the public sector.

Agent Security Prompt Injection Model Evaluation

Read summary Source link

New Claude Opus 4.8: 15 Things You May’ve Missed video thumbnail

AI Explained YouTube May 29, 2026 video

New Claude Opus 4.8: 15 Things You May’ve Missed

The ‘best’ generally available AI model just dropped, but there is plenty I bet you missed about what it is, how it performs, and what the release tells us. 15 highlights from the 244 page system card, plus private testing, leader interview and more. AI Insiders ($9!): https://www.patreon.com/AIExplained Chapters: 00:0

Model Evaluation AI Compliance Agent Security

Open notes Watch on YouTube

ACM CAIS May 26, 2026 - May 29, 2026 event event archive

ACM CAIS 2026

ACM CAIS 2026 is a research-focused conference on compound AI architectures, optimization, deployment, and agentic AI systems in San Jose, California.

Agent Security AI Engineering Model Evaluation

View details Open event page

Anthropic Research May 22, 2026 analysis

Project Glasswing: An initial update

Anthropic reports early Project Glasswing results using Mythos Preview with infrastructure partners and external testers, including large-scale vulnerability discovery and a cautious disclosure posture.

AI Red Teaming Agent Security Model Evaluation

Read summary Source link

Google DeepMind Blog May 15, 2026 news

Gemini 3.5: frontier intelligence with action

Gemini 3.5 is built to help you execute complex, agentic workflows.

Model Evaluation Agent Security AI Engineering

Read summary Source link

Anthropic April 29, 2026 framework

Anthropic Responsible Scaling Policy v3.2

Anthropic’s current Responsible Scaling Policy page lists v3.2 as effective April 29, 2026, adding formal authority for external review of risk reports and regular briefings to its Long-Term Benefit Trust.

AI Compliance Model Evaluation AI Red Teaming

Read summary Source link

Build & deploy AI-powered apps — Paige Bailey, Google DeepMind video thumbnail

AI Engineer YouTube April 29, 2026 video

Build & deploy AI-powered apps — Paige Bailey, Google DeepMind

Got a massive idea but stuck in the "just talking about it" phase? This session cuts the fluff and dives straight into how to build and prototype at lightning speed using AI Studio Build and Antigravity for free. It breaks down Google DeepMind's AI tech stack so viewers know exactly which tools to use, when to reach fo

AI Engineering Model Evaluation Prompt Engineering

Open notes Watch on YouTube

Everything I Learned Training Frontier Small Models — Maxime Labonne, Liquid AI video thumbnail

AI Engineer YouTube April 29, 2026 video

Everything I Learned Training Frontier Small Models — Maxime Labonne, Liquid AI

A new class of small models is emerging with the ability to reliably follow instructions and call tools while running on-device under 1 GB of memory. In this talk, we'll break down how to post-train frontier small models using the LFM2.5 recipe: on-policy preference alignment, agentic reinforcement learning, and curric

AI Engineering Model Evaluation Prompt Engineering

Open notes Watch on YouTube

One Login to Rule Them All: Cross-App Access for MCP — Garrett Galow, WorkOS video thumbnail

AI Engineer YouTube April 28, 2026 video

One Login to Rule Them All: Cross-App Access for MCP — Garrett Galow, WorkOS

Connecting a coding agent to multiple services often means facing a dozen OAuth consent screens, a dozen token lifecycles, and a dozen chances for something to break. Despite having Single Sign-On, users still find themselves signing in repeatedly. This talk explores how Cross-App Access leverages a three-way trust bet

AI Engineering Model Evaluation Prompt Engineering

Open notes Watch on YouTube

Why building eval platforms is hard — Phil Hetzel, Braintrust video thumbnail

AI Engineer YouTube April 28, 2026 video

Why building eval platforms is hard — Phil Hetzel, Braintrust

An eval platform is not just a test runner. You are building shared definitions of "good," reliable data pipelines, labelling workflows, versioning, and trust in results across many teams and model changes. This session breaks down the hidden complexity, the common failure modes, and the design principles that make eva

AI Engineering Model Evaluation Prompt Engineering

Open notes Watch on YouTube

Building your own software factory — Eric Zakariasson, Cursor video thumbnail

AI Engineer YouTube April 28, 2026 video

Building your own software factory — Eric Zakariasson, Cursor

Most of us are pair-programming with one agent and stopping there. There's a lot more on the table. This workshop is about going from one agent to many. We'll start with codebase setup, the foundational work that makes agents effective on their own. Then we'll scale up to running agents in parallel, kicking off async w

AI Engineering Model Evaluation Prompt Engineering

Open notes Watch on YouTube

Lessons from Scaling GitHub's Remote MCP Server — Sam Morrow, GitHub video thumbnail

AI Engineer YouTube April 27, 2026 video

Lessons from Scaling GitHub's Remote MCP Server — Sam Morrow, GitHub

GitHub operates one of the most heavily-utilised MCP servers in the ecosystem, with over 4 million downloads of the stdio server alone. Discover the architectural decisions, technical challenges and lessons learned while building and scaling a remote MCP server on production infrastructure. The session walks through th

AI Engineering Model Evaluation Prompt Engineering

Open notes Watch on YouTube

Bringing MCPs to the Enterprise — Karan Sampath, Anthropic video thumbnail

AI Engineer YouTube April 27, 2026 video

Bringing MCPs to the Enterprise — Karan Sampath, Anthropic

MCPs are often flaky, face multiple security vulnerabilities, and are generally hard to scale. Most enterprises struggle to use more than single digit numbers of MCPs due to issues with security, observability, and access control. In this talk, we'll explore the approaches and learnings we at Anthropic have been taking

AI Engineering Model Evaluation Prompt Engineering

Open notes Watch on YouTube

Open Models at Google DeepMind — Cassidy Hardin, Google DeepMind video thumbnail

AI Engineer YouTube April 27, 2026 video

Open Models at Google DeepMind — Cassidy Hardin, Google DeepMind

Open models are getting smaller, faster, and far more capable. In this talk, Cassidy Hardin walks through the latest advances in the Gemma family, with a focus on Gemma 4 and what it enables for developers building on-device and open-weight AI systems. She covers the architecture behind Gemma’s dense, effective, and mi

AI Engineering Model Evaluation Prompt Engineering

Open notes Watch on YouTube

Collaborative AI Engineering — Maggie Appleton, GitHub Next video thumbnail

AI Engineer YouTube April 26, 2026 video

Collaborative AI Engineering — Maggie Appleton, GitHub Next

Agentic engineering so far has been a solo story: one developer and a dozen agents moving at warp speed. But speed without thoughtful planning and team alignment is just wasting tokens. When everyone on a team is directing agents alone in their personal CLI tools with no shared context, you get duplicate work, conflict

AI Engineering Model Evaluation Prompt Engineering

Open notes Watch on YouTube

MCP = Mega Context Problem - Matt Carey video thumbnail

AI Engineer YouTube April 25, 2026 video

MCP = Mega Context Problem - Matt Carey

The best MCP server is the one you didn't have to build. At Cloudflare we have a lot of products. Our REST OpenAPI spec is over 2.3 million tokens. When teams started building MCP servers, they did what everyone does: cherry-picked important endpoints for their product, wrote some tool definitions and shipped a separat

AI Engineering Model Evaluation Prompt Engineering

Open notes Watch on YouTube

Full Walkthrough: Workflow for AI Coding from Planning to Production — Matt Pocock (@mattpocockuk ) video thumbnail

AI Engineer YouTube April 24, 2026 video

Full Walkthrough: Workflow for AI Coding from Planning to Production — Matt Pocock (@mattpocockuk )

A hands-on workshop covering the full lifecycle of AI-assisted development, from turning ambiguous requirements into agent-ready plans to running autonomous coding agents that ship production features. You'll learn to stress-test vague briefs into structured PRDs, slice work into thin "tracer bullet" vertical slices, a

AI Engineering Model Evaluation Prompt Engineering

Open notes Watch on YouTube

What Do Models Still Suck At? - Peter Gostev, Arena.ai, BullshitBench video thumbnail

AI Engineer April 24, 2026 video

What Do Models Still Suck At? - Peter Gostev, Arena.ai, BullshitBench

AI Engineer session on What Do Models Still Suck At? - Peter Gostev, Arena.ai, BullshitBench. It adds practical context for how teams are building and operating AI systems in production.

AI Engineering Model Evaluation

Open notes Watch on YouTube

GPT 5.5 Arrives, DeepSeek V4 Drops, and the Compute War Intensifies video thumbnail

AI Explained YouTube April 24, 2026 video

GPT 5.5 Arrives, DeepSeek V4 Drops, and the Compute War Intensifies

GPT 5.5 full analysis, plus DeepSeek V4 paper highlights, comparisons with Mythos, a vibe-coded game w/ GPT Image 2, and 50 data-points you wouldn’t get from just reading the headlines. https://80000hours.org/aiexplained Check out my fast-growing (!) app, free to use, and code INSIDER15 for paid tiers: https://lmcounci

Model Evaluation AI Compliance Agent Security

Open notes Watch on YouTube

Building Generative Image & Video models at Scale - Sander Dieleman (Veo and Nano Banana) video thumbnail

AI Engineer April 22, 2026 video

Building Generative Image & Video models at Scale - Sander Dieleman (Veo and Nano Banana)

AI Engineer session on Building Generative Image & Video models at Scale - Sander Dieleman (Veo and Nano Banana). It adds practical context for how teams are building and operating AI systems in production.

AI Engineering Prompt Engineering Model Evaluation

Open notes Watch on YouTube

AIE Miami Day 2 ft. Cerebras, OpenCode, Cursor, Arize AI, and more! video thumbnail

AI Engineer YouTube April 21, 2026 video

AIE Miami Day 2 ft. Cerebras, OpenCode, Cursor, Arize AI, and more!

April 21, 2026 - all times in EST -- 9:00am - Welcome to Day 2 -- 9:10am - David House, G2i Transforming Programming Mindsets: Case Studies in Agentic Coding Adoption -- 9:35am - Sarah Chieng, Cerebras Help! We're DEEP in (latency) Debt -- 10:00am - Lech Kalinowski, CallStack Ambient Generative AI: Deploying Latent Dif

AI Engineering Model Evaluation Prompt Engineering

Open notes Watch on YouTube

Gemma, DeepMind's Family of Open Models — Omar Sanseviero, Google DeepMind video thumbnail

AI Engineer April 21, 2026 video

Gemma, DeepMind's Family of Open Models — Omar Sanseviero, Google DeepMind

AI Engineer session on Gemma, DeepMind's Family of Open Models, presented by Omar Sanseviero, Google DeepMind. It adds practical context for how teams are building and operating AI systems in production.

AI Engineering Model Evaluation

Open notes Watch on YouTube

Running LLMs on your iPhone: 40 tok/s Gemma 4 with MLX — Adrien Grondin, Locally AI video thumbnail

AI Engineer April 21, 2026 video

Running LLMs on your iPhone: 40 tok/s Gemma 4 with MLX — Adrien Grondin, Locally AI

AI Engineer session on Running LLMs on your iPhone: 40 tok/s Gemma 4 with MLX, presented by Adrien Grondin, Locally AI. It adds practical context for how teams are building and operating AI systems in production.

AI Engineering Prompt Engineering Model Evaluation

Open notes Watch on YouTube

AIE Miami Keynote & Talks ft. OpenCode. Google Deepmind, OpenAI, and more! video thumbnail

AI Engineer YouTube April 20, 2026 video

AIE Miami Keynote & Talks ft. OpenCode. Google Deepmind, OpenAI, and more!

April 20, 2026 - all times in EST -- 9:00am - Welcome to AI Engineer Miami -- 9:10am - Gabe Greenberg, G2i Opening Remarks -- 9:15am - Dax Raad, OpenCode Keynote -- 9:40am - Dexter Horthy, HumanLayer Everything We got Wrong About RPI -- 10:05am - Max Stoiber, OpenAI Coming Soon -- 10:30am - Morning Break -- 11:00am - B

AI Engineering Model Evaluation Prompt Engineering

Open notes Watch on YouTube

Frontier AI and the Future of Intelligence — Raia Hadsell, VP of Research, Google DeepMind video thumbnail

AI Engineer April 19, 2026 video

Frontier AI and the Future of Intelligence — Raia Hadsell, VP of Research, Google DeepMind

AI Engineer session on Frontier AI and the Future of Intelligence, presented by Raia Hadsell, VP of Research, Google DeepMind. It adds practical context for how teams are building and operating AI systems in production.

AI Engineering Model Evaluation

Open notes Watch on YouTube

Claude Opus 4.7 - A New Frontier, in Performance … and Drama video thumbnail

AI Explained April 17, 2026 video

Claude Opus 4.7 - A New Frontier, in Performance … and Drama

This AI Explained video reviews a major AI development through the lens of benchmarks and evaluation evidence. It is useful context for AI engineering, evaluation, governance, and operational risk.

Model Evaluation AI Red Teaming Adversarial ML

Open notes Watch on YouTube

Building pi in a World of Slop — Mario Zechner video thumbnail

AI Engineer April 17, 2026 video

Building pi in a World of Slop — Mario Zechner

AI Engineer session on Building pi in a World of Slop, presented by Mario Zechner. It adds practical context for how teams are building and operating AI systems in production.

AI Engineering Prompt Engineering Model Evaluation

Open notes Watch on YouTube

$1 AI Guardrails: The Unreasonable Effectiveness of Finetuned ModernBERTs — Diego Carpentero video thumbnail

AI Engineer April 16, 2026 video

$1 AI Guardrails: The Unreasonable Effectiveness of Finetuned ModernBERTs — Diego Carpentero

AI Engineer session on $1 AI Guardrails: The Unreasonable Effectiveness of Finetuned ModernBERTs, presented by Diego Carpentero. It adds practical context for how teams are building and operating AI systems in production.

AI Engineering AI Compliance Model Evaluation

Open notes Watch on YouTube

Let LLMs Wander: Engineering RL Environments — Stefano Fiorucci video thumbnail

AI Engineer April 10, 2026 video

Let LLMs Wander: Engineering RL Environments — Stefano Fiorucci

AI Engineer session on Let LLMs Wander: Engineering RL Environments, presented by Stefano Fiorucci. It adds practical context for how teams are building and operating AI systems in production.

AI Engineering Prompt Engineering Model Evaluation

Open notes Watch on YouTube

One Registry to Rule them All - Sonny Merla, Mauro Luchetti, & Mattia Redaelli, Quantyca video thumbnail

AI Engineer April 10, 2026 video

One Registry to Rule them All - Sonny Merla, Mauro Luchetti, & Mattia Redaelli, Quantyca

AI Engineer session on One Registry to Rule them All - Sonny Merla, Mauro Luchetti, & Mattia Redaelli, Quantyca. It adds practical context for how teams are building and operating AI systems in production.

AI Engineering Model Evaluation

Open notes Watch on YouTube

Claude Mythos: Highlights from 244-page Release video thumbnail

AI Explained April 8, 2026 video

Claude Mythos: Highlights from 244-page Release

This AI Explained video reviews a major AI development through the lens of agentic workflows and tool-use risk. It is useful context for AI engineering, evaluation, governance, and operational risk.

Model Evaluation Agent Security AI Red Teaming

Open notes Watch on YouTube

Anthropic Frontier Red Team April 7, 2026 news

Assessing Claude Mythos Preview’s cybersecurity capabilities

Claude Mythos Preview is a new general-purpose language model that is strikingly capable at computer security tasks. This post provides technical details for researchers and practitioners who want to understand exactly how we have been testing this model, and what we have found over the past month. We hope this will sh

AI Red Teaming Agent Security Model Evaluation

Read summary Source link

NIST April 7, 2026 framework

NIST AI RMF and Critical Infrastructure Profile

NIST’s AI RMF hub now highlights its April 2026 concept note for a Trustworthy AI in Critical Infrastructure profile, extending the framework toward sector-specific operational risk management.

AI Compliance Model Evaluation

Read summary Source link

Two AI Models Set to “stir government urgency”, But Will This Challenge Undo Them? video thumbnail

AI Explained March 26, 2026 video

Two AI Models Set to “stir government urgency”, But Will This Challenge Undo Them?

This AI Explained video reviews a major AI development through the lens of governance and responsible deployment. It is useful context for AI engineering, evaluation, governance, and operational risk.

Model Evaluation AI Compliance AI Red Teaming Adversarial ML

Open notes Watch on YouTube

Anthropic Frontier Red Team March 6, 2026 news

Reverse engineering Claude's CVE-2026-2796 exploit

This post dives deep into how Claude wrote an exploit for one of the vulnerabilities it found in Firefox.

AI Red Teaming Agent Security Model Evaluation

Read summary Source link

What the New ChatGPT 5.4 Means for the World video thumbnail

AI Explained March 6, 2026 video

What the New ChatGPT 5.4 Means for the World

This AI Explained video reviews a major AI development through the lens of governance and responsible deployment. It is useful context for AI engineering, evaluation, governance, and operational risk.

Model Evaluation AI Compliance AI Red Teaming

Open notes Watch on YouTube

Deadline Day for Autonomous AI Weapons & Mass Surveillance video thumbnail

AI Explained February 27, 2026 video

Deadline Day for Autonomous AI Weapons & Mass Surveillance

This AI Explained video reviews a major AI development through the lens of governance and responsible deployment. It is useful context for AI engineering, evaluation, governance, and operational risk.

Model Evaluation AI Compliance Agent Security AI Red Teaming

Open notes Watch on YouTube

Gemini 3.1 Pro and the Downfall of Benchmarks: Welcome to the Vibe Era of AI video thumbnail

AI Explained February 20, 2026 video

Gemini 3.1 Pro and the Downfall of Benchmarks: Welcome to the Vibe Era of AI

This AI Explained video reviews a major AI development through the lens of agentic workflows and tool-use risk. It is useful context for AI engineering, evaluation, governance, and operational risk.

Model Evaluation Agent Security Adversarial ML

Open notes Watch on YouTube

The Two Best AI Models/Enemies Just Got Released Simultaneously video thumbnail

AI Explained February 6, 2026 video

The Two Best AI Models/Enemies Just Got Released Simultaneously

This AI Explained video reviews a major AI development through the lens of agentic workflows and tool-use risk. It is useful context for AI engineering, evaluation, governance, and operational risk.

Model Evaluation Agent Security Adversarial ML

Open notes Watch on YouTube

Anthropic Frontier Red Team February 5, 2026 news

LLM-discovered 0-days

AI models can now find high-severity vulnerabilities at scale. This is a moment to empower defenders. We're now using Claude to find and help fix vulnerabilities in open source software.

AI Red Teaming Agent Security Model Evaluation

Read summary Source link

Claude AI Co-founder Publishes 4 Big Claims about Near Future: Breakdown video thumbnail

AI Explained January 28, 2026 video

Claude AI Co-founder Publishes 4 Big Claims about Near Future: Breakdown

This AI Explained video reviews a major AI development through the lens of governance and responsible deployment. It is useful context for AI engineering, evaluation, governance, and operational risk.

Model Evaluation AI Compliance Adversarial ML

Open notes Watch on YouTube

Identity for AI Agents - Patrick Riley & Carlos Galan, Auth0 video thumbnail

AI Engineer January 24, 2026 video

Identity for AI Agents - Patrick Riley & Carlos Galan, Auth0

AI Engineer session on Identity for AI Agents - Patrick Riley & Carlos Galan, Auth0. It adds practical context for how teams are building and operating AI systems in production.

AI Engineering Agent Security Model Evaluation

Open notes Watch on YouTube

Welcome to AIE CODE - Jed Borovik, Google DeepMind video thumbnail

AI Engineer January 24, 2026 video

Welcome to AIE CODE - Jed Borovik, Google DeepMind

AI Engineer session on Welcome to AIE CODE - Jed Borovik, Google DeepMind. It adds practical context for how teams are building and operating AI systems in production.

AI Engineering Model Evaluation

Open notes Watch on YouTube

Anthropic Frontier Red Team January 16, 2026 news

AI Models on Realistic Cyber Ranges

In a recent evaluation of AI models’ cyber capabilities, current Claude models can now succeed at multistage attacks on networks with dozens of hosts using only standard, open-source tools, instead of the custom tools needed by previous generations.

AI Red Teaming Agent Security Model Evaluation

Read summary Source link

Anthropic Frontier Red Team January 14, 2026 news

Finding Bugs with Claude and Property-based Testing

Ensuring that programs are bug-free is one of the most challenging aspects of software engineering. We developed an agent that can efficiently identify bugs in large software projects. Our agent infers general properties of code that should be true, and then applies property-based testing. After extensive manual valida

AI Red Teaming Agent Security Model Evaluation

Read summary Source link

Anthropic: Our AI just created a tool that can ‘automate all white collar work’, Me: video thumbnail

AI Explained January 14, 2026 video

Anthropic: Our AI just created a tool that can ‘automate all white collar work’, Me:

This AI Explained video reviews a major AI development through the lens of governance and responsible deployment. It is useful context for AI engineering, evaluation, governance, and operational risk.

Model Evaluation AI Compliance Agent Security

Open notes Watch on YouTube

Anthropic Frontier Red Team January 8, 2026 news

Experimenting with AI to Defend Critical Infrastructure

AI could help defenders of critical infrastructure identify the vulnerabilities that attackers might exploit—and close them before they are exploited. Anthropic has partnered with Pacific Northwest National Laboratory (PNNL) to explore this defensive application of AI, demonstrating both the potential of AI-accelerated

AI Red Teaming Agent Security Model Evaluation

Read summary Source link

Defying Gravity - Kevin Hou, Google DeepMind video thumbnail

AI Engineer December 24, 2025 video

Defying Gravity - Kevin Hou, Google DeepMind

AI Engineer session on Defying Gravity - Kevin Hou, Google DeepMind. It adds practical context for how teams are building and operating AI systems in production.

AI Engineering Model Evaluation

Open notes Watch on YouTube

RL Environments at Scale — Will Brown, Prime Intellect video thumbnail

AI Engineer December 24, 2025 video

RL Environments at Scale — Will Brown, Prime Intellect

AI Engineer session on RL Environments at Scale, presented by Will Brown, Prime Intellect. It adds practical context for how teams are building and operating AI systems in production.

AI Engineering Model Evaluation

Open notes Watch on YouTube

Building in the Gemini Era — Kat Kampf & Ammaar Reshi, Google DeepMind video thumbnail

AI Engineer December 24, 2025 video

Building in the Gemini Era — Kat Kampf & Ammaar Reshi, Google DeepMind

AI Engineer session on Building in the Gemini Era, presented by Kat Kampf & Ammaar Reshi, Google DeepMind. It adds practical context for how teams are building and operating AI systems in production.

AI Engineering Prompt Engineering Model Evaluation

Open notes Watch on YouTube

Developing Taste in Coding Agents: Applied Meta Neuro-Symbolic RL — Ahmad Awais, CommandCode video thumbnail

AI Engineer December 24, 2025 video

Developing Taste in Coding Agents: Applied Meta Neuro-Symbolic RL — Ahmad Awais, CommandCode

AI Engineer session on Developing Taste in Coding Agents: Applied Meta Neuro-Symbolic RL, presented by Ahmad Awais, CommandCode. It adds practical context for how teams are building and operating AI systems in production.

AI Engineering Agent Security Model Evaluation

Open notes Watch on YouTube

Minimax M2: Building the #1 Open Model — Olive Song, MiniMax video thumbnail

AI Engineer December 24, 2025 video

Minimax M2: Building the #1 Open Model — Olive Song, MiniMax

AI Engineer session on Minimax M2: Building the #1 Open Model, presented by Olive Song, MiniMax. It adds practical context for how teams are building and operating AI systems in production.

AI Engineering Prompt Engineering Model Evaluation

Open notes Watch on YouTube

Code World Model: Building World Models for Computation — Jacob Kahn, FAIR Meta video thumbnail

AI Engineer December 24, 2025 video

Code World Model: Building World Models for Computation — Jacob Kahn, FAIR Meta

AI Engineer session on Code World Model: Building World Models for Computation, presented by Jacob Kahn, FAIR Meta. It adds practical context for how teams are building and operating AI systems in production.

AI Engineering Prompt Engineering Model Evaluation

Open notes Watch on YouTube

What the Freakiness of 2025 in AI Tells Us About 2026 video thumbnail

AI Explained December 23, 2025 video

What the Freakiness of 2025 in AI Tells Us About 2026

This AI Explained video reviews a major AI development through the lens of governance and responsible deployment. It is useful context for AI engineering, evaluation, governance, and operational risk.

Model Evaluation AI Compliance Agent Security AI Red Teaming Adversarial ML

Open notes Watch on YouTube

Gemini Exponential, Demis Hassabis' ‘Proto-AGI’ coming, but … video thumbnail

AI Explained December 19, 2025 video

Gemini Exponential, Demis Hassabis' ‘Proto-AGI’ coming, but …

This AI Explained video reviews a major AI development through the lens of agentic workflows and tool-use risk. It is useful context for AI engineering, evaluation, governance, and operational risk.

Model Evaluation Agent Security Adversarial ML

Open notes Watch on YouTube

GPT 5.2: OpenAI Strikes Back video thumbnail

AI Explained December 12, 2025 video

GPT 5.2: OpenAI Strikes Back

This AI Explained video reviews a major AI development through the lens of benchmarks and evaluation evidence. It is useful context for AI engineering, evaluation, governance, and operational risk.

Model Evaluation Adversarial ML

Open notes Watch on YouTube

You Are Being Told Contradictory Things About AI video thumbnail

AI Explained December 5, 2025 video

You Are Being Told Contradictory Things About AI

This AI Explained video reviews a major AI development through the lens of governance and responsible deployment. It is useful context for AI engineering, evaluation, governance, and operational risk.

Model Evaluation AI Compliance AI Red Teaming

Open notes Watch on YouTube

Nano Banana Pro: But Did You Catch These 10 Details? video thumbnail

AI Explained November 20, 2025 video

Nano Banana Pro: But Did You Catch These 10 Details?

This AI Explained video reviews a major AI development through the lens of benchmarks and evaluation evidence. It is useful context for AI engineering, evaluation, governance, and operational risk.

Model Evaluation Prompt Engineering Adversarial ML

Open notes Watch on YouTube

Gemini 3 Pro: Breakdown video thumbnail

AI Explained November 19, 2025 video

Gemini 3 Pro: Breakdown

This AI Explained video reviews a major AI development through the lens of benchmarks and evaluation evidence. It is useful context for AI engineering, evaluation, governance, and operational risk.

Model Evaluation

Open notes Watch on YouTube

Is GPT-5.1 Really an Upgrade? But Models Can Auto-Hack Govts, so … there’s that video thumbnail

AI Explained November 14, 2025 video

Is GPT-5.1 Really an Upgrade? But Models Can Auto-Hack Govts, so … there’s that

This AI Explained video reviews a major AI development through the lens of agentic workflows and tool-use risk. It is useful context for AI engineering, evaluation, governance, and operational risk.

Model Evaluation Agent Security Prompt Engineering Prompt Injection AI Red Teaming Adversarial ML

Open notes Watch on YouTube

Bubble or No Bubble, AI Keeps Progressing (ft. Relentless Learning + Introspection) video thumbnail

AI Explained November 10, 2025 video

Bubble or No Bubble, AI Keeps Progressing (ft. Relentless Learning + Introspection)

This AI Explained video reviews a major AI development through the lens of multimodal generation and provenance. It is useful context for AI engineering, evaluation, governance, and operational risk.

Model Evaluation AI Compliance

Open notes Watch on YouTube

Did you miss these 2 AI stories? A *Real* LLM-crafted Breakthrough + Continual Learning Blocked? video thumbnail

AI Explained October 22, 2025 video

Did you miss these 2 AI stories? A Real LLM-crafted Breakthrough + Continual Learning Blocked?

This AI Explained video reviews a major AI development through the lens of agentic workflows and tool-use risk. It is useful context for AI engineering, evaluation, governance, and operational risk.

Model Evaluation Agent Security

Open notes Watch on YouTube

Sora 2 - It will only get more realistic from here video thumbnail

AI Explained October 1, 2025 video

Sora 2 - It will only get more realistic from here

This AI Explained video reviews a major AI development through the lens of governance and responsible deployment. It is useful context for AI engineering, evaluation, governance, and operational risk.

Model Evaluation AI Compliance Agent Security

Open notes Watch on YouTube

OpenAI Tests if GPT-5 Can Automate Your Job - 4 Unexpected Findings video thumbnail

AI Explained September 26, 2025 video

OpenAI Tests if GPT-5 Can Automate Your Job - 4 Unexpected Findings

This AI Explained video reviews a major AI development through the lens of AI safety and model behavior. It is useful context for AI engineering, evaluation, governance, and operational risk.

Model Evaluation AI Red Teaming

Open notes Watch on YouTube

ChatGPT Can Now Call the Cops, but 'Wait till 2100 for Full Job Impact' - Altman video thumbnail

AI Explained September 16, 2025 video

ChatGPT Can Now Call the Cops, but 'Wait till 2100 for Full Job Impact' - Altman

This AI Explained video reviews a major AI development through the lens of governance and responsible deployment. It is useful context for AI engineering, evaluation, governance, and operational risk.

Model Evaluation AI Compliance AI Red Teaming

Open notes Watch on YouTube

An ‘AI Bubble’? What Altman Actually said, the Facts and Nano Banana video thumbnail

AI Explained August 26, 2025 video

An ‘AI Bubble’? What Altman Actually said, the Facts and Nano Banana

This AI Explained video reviews a major AI development through the lens of governance and responsible deployment. It is useful context for AI engineering, evaluation, governance, and operational risk.

Model Evaluation AI Compliance Agent Security Adversarial ML

Open notes Watch on YouTube

The Rise of Open Models in the Enterprise — Amir Haghighat, Baseten video thumbnail

AI Engineer August 24, 2025 video

The Rise of Open Models in the Enterprise — Amir Haghighat, Baseten

AI Engineer session on The Rise of Open Models in the Enterprise, presented by Amir Haghighat, Baseten. It adds practical context for how teams are building and operating AI systems in production.

AI Engineering Model Evaluation

Open notes Watch on YouTube

Real World Development with GitHub Copilot and VS Code — Harald Kirschner, Christopher Harrison video thumbnail

AI Engineer August 24, 2025 video

Real World Development with GitHub Copilot and VS Code — Harald Kirschner, Christopher Harrison

AI Engineer session on Real World Development with GitHub Copilot and VS Code, presented by Harald Kirschner, Christopher Harrison. It adds practical context for how teams are building and operating AI systems in production.

AI Engineering Model Evaluation

Open notes Watch on YouTube

What Is a Humanoid Foundation Model? An Introduction to GR00T N1 - Annika & Aastha video thumbnail

AI Engineer August 24, 2025 video

What Is a Humanoid Foundation Model? An Introduction to GR00T N1 - Annika & Aastha

AI Engineer session on What Is a Humanoid Foundation Model? An Introduction to GR00T N1 - Annika & Aastha. It adds practical context for how teams are building and operating AI systems in production.

AI Engineering Model Evaluation

Open notes Watch on YouTube

Real-time Experiments with an AI Co-Scientist - Stefania Druga, fmr. Google Deepmind video thumbnail

AI Engineer August 24, 2025 video

Real-time Experiments with an AI Co-Scientist - Stefania Druga, fmr. Google Deepmind

AI Engineer session on Real-time Experiments with an AI Co-Scientist - Stefania Druga, fmr. Google Deepmind. It adds practical context for how teams are building and operating AI systems in production.

AI Engineering Model Evaluation

Open notes Watch on YouTube

GPT-5 has Arrived video thumbnail

AI Explained August 7, 2025 video

GPT-5 has Arrived

This AI Explained video reviews a major AI development through the lens of benchmarks and evaluation evidence. It is useful context for AI engineering, evaluation, governance, and operational risk.

Model Evaluation Adversarial ML

Open notes Watch on YouTube

Genie 3: The World Becomes Playable (DeepMind) video thumbnail

AI Explained August 5, 2025 video

Genie 3: The World Becomes Playable (DeepMind)

This AI Explained video reviews a major AI development through the lens of agentic workflows and tool-use risk. It is useful context for AI engineering, evaluation, governance, and operational risk.

Model Evaluation Agent Security

Open notes Watch on YouTube

Measuring AGI: Interactive Reasoning Benchmarks for ARC-AGI-3 — Greg Kamradt, ARC Prize Foundation video thumbnail

AI Engineer July 24, 2025 video

Measuring AGI: Interactive Reasoning Benchmarks for ARC-AGI-3 — Greg Kamradt, ARC Prize Foundation

AI Engineer session on Measuring AGI: Interactive Reasoning Benchmarks for ARC-AGI-3, presented by Greg Kamradt, ARC Prize Foundation. It adds practical context for how teams are building and operating AI systems in production.

AI Engineering Model Evaluation

Open notes Watch on YouTube

How to build world-class AI products — Sarah Sachs (AI lead @ Notion) & Carlos Esteban (Braintrust) video thumbnail

AI Engineer July 24, 2025 video

How to build world-class AI products — Sarah Sachs (AI lead @ Notion) & Carlos Esteban (Braintrust)

AI Engineer session on How to build world-class AI products, presented by Sarah Sachs (AI lead @ Notion) & Carlos Esteban (Braintrust). It adds practical context for how teams are building and operating AI systems in production.

AI Engineering Prompt Engineering Model Evaluation

Open notes Watch on YouTube

Thinking Deeper in Gemini — Jack Rae, Google DeepMind video thumbnail

AI Engineer July 24, 2025 video

Thinking Deeper in Gemini — Jack Rae, Google DeepMind

AI Engineer session on Thinking Deeper in Gemini, presented by Jack Rae, Google DeepMind. It adds practical context for how teams are building and operating AI systems in production.

AI Engineering Model Evaluation

Open notes Watch on YouTube

Netflix's Big Bet: One model to rule recommendations: Yesu Feng, Netflix video thumbnail

AI Engineer July 24, 2025 video

Netflix's Big Bet: One model to rule recommendations: Yesu Feng, Netflix

AI Engineer session on Netflix's Big Bet: One model to rule recommendations: Yesu Feng, Netflix. It adds practical context for how teams are building and operating AI systems in production.

AI Engineering Model Evaluation

Open notes Watch on YouTube

The Bitter Layout or: How I Learned to Love the Model Picker — Maximillian Piras, Yutori video thumbnail

AI Engineer July 24, 2025 video

The Bitter Layout or: How I Learned to Love the Model Picker — Maximillian Piras, Yutori

AI Engineer session on The Bitter Layout or: How I Learned to Love the Model Picker, presented by Maximillian Piras, Yutori. It adds practical context for how teams are building and operating AI systems in production.

AI Engineering Model Evaluation

Open notes Watch on YouTube

How fast are LLM inference engines anyway? — Charles Frye, Modal video thumbnail

AI Engineer July 24, 2025 video

How fast are LLM inference engines anyway? — Charles Frye, Modal

AI Engineer session on How fast are LLM inference engines anyway?, presented by Charles Frye, Modal. It adds practical context for how teams are building and operating AI systems in production.

AI Engineering Model Evaluation

Open notes Watch on YouTube

RL for Autonomous Coding — Aakanksha Chowdhery, Reflection.ai video thumbnail

AI Engineer July 24, 2025 video

RL for Autonomous Coding — Aakanksha Chowdhery, Reflection.ai

AI Engineer session on RL for Autonomous Coding, presented by Aakanksha Chowdhery, Reflection.ai. It adds practical context for how teams are building and operating AI systems in production.

AI Engineering Model Evaluation

Open notes Watch on YouTube

Real world MCPs in GitHub Copilot Agent Mode — Jon Peck, Microsoft video thumbnail

AI Engineer July 24, 2025 video

Real world MCPs in GitHub Copilot Agent Mode — Jon Peck, Microsoft

AI Engineer session on Real world MCPs in GitHub Copilot Agent Mode, presented by Jon Peck, Microsoft. It adds practical context for how teams are building and operating AI systems in production.

AI Engineering Agent Security Model Evaluation

Open notes Watch on YouTube

Benchmarks Are Memes: How What We Measure Shapes AI — and Us - Alex Duffy, Every.to video thumbnail

AI Engineer July 24, 2025 video

Benchmarks Are Memes: How What We Measure Shapes AI — and Us - Alex Duffy, Every.to

AI Engineer session on Benchmarks Are Memes: How What We Measure Shapes AI, presented by and Us - Alex Duffy, Every.to. It adds practical context for how teams are building and operating AI systems in production.

AI Engineering Model Evaluation

Open notes Watch on YouTube

Vector Search Benchmark[eting] - Philipp Krenn, Elastic video thumbnail

AI Engineer July 24, 2025 video

Vector Search Benchmark[eting] - Philipp Krenn, Elastic

AI Engineer session on Vector Search Benchmark[eting] - Philipp Krenn, Elastic. It adds practical context for how teams are building and operating AI systems in production.

AI Engineering Model Evaluation

Open notes Watch on YouTube

How to Train Your Agent: Building Reliable Agents with RL — Kyle Corbitt, OpenPipe video thumbnail

AI Engineer July 24, 2025 video

How to Train Your Agent: Building Reliable Agents with RL — Kyle Corbitt, OpenPipe

AI Engineer session on How to Train Your Agent: Building Reliable Agents with RL, presented by Kyle Corbitt, OpenPipe. It adds practical context for how teams are building and operating AI systems in production.

AI Engineering Agent Security Prompt Engineering Model Evaluation

Open notes Watch on YouTube

Using OSS models to build AI apps with millions of users — Hassan El Mghari video thumbnail

AI Engineer July 24, 2025 video

Using OSS models to build AI apps with millions of users — Hassan El Mghari

AI Engineer session on Using OSS models to build AI apps with millions of users, presented by Hassan El Mghari. It adds practical context for how teams are building and operating AI systems in production.

AI Engineering Prompt Engineering Model Evaluation

Open notes Watch on YouTube

Optimizing inference for voice models in production - Philip Kiely, Baseten video thumbnail

AI Engineer July 24, 2025 video

Optimizing inference for voice models in production - Philip Kiely, Baseten

AI Engineer session on Optimizing inference for voice models in production - Philip Kiely, Baseten. It adds practical context for how teams are building and operating AI systems in production.

AI Engineering Model Evaluation

Open notes Watch on YouTube

OpenThoughts: Data Recipes for Reasoning Models — Ryan Marten, Bespoke Labs video thumbnail

AI Engineer July 24, 2025 video

OpenThoughts: Data Recipes for Reasoning Models — Ryan Marten, Bespoke Labs

AI Engineer session on OpenThoughts: Data Recipes for Reasoning Models, presented by Ryan Marten, Bespoke Labs. It adds practical context for how teams are building and operating AI systems in production.

AI Engineering Model Evaluation

Open notes Watch on YouTube

A year of Gemini progress + what comes next — Logan Kilpatrick, Google DeepMind video thumbnail

AI Engineer July 24, 2025 video

A year of Gemini progress + what comes next — Logan Kilpatrick, Google DeepMind

AI Engineer session on A year of Gemini progress + what comes next, presented by Logan Kilpatrick, Google DeepMind. It adds practical context for how teams are building and operating AI systems in production.

AI Engineering Model Evaluation

Open notes Watch on YouTube

What every AI engineer needs to know about GPUs — Charles Frye, Modal video thumbnail

AI Engineer July 24, 2025 video

What every AI engineer needs to know about GPUs — Charles Frye, Modal

AI Engineer session on What every AI engineer needs to know about GPUs, presented by Charles Frye, Modal. It adds practical context for how teams are building and operating AI systems in production.

AI Engineering Model Evaluation

Open notes Watch on YouTube

AI Engineering with the Google Gemini 2.5 Model Family - Philipp Schmid, Google DeepMind video thumbnail

AI Engineer July 24, 2025 video

AI Engineering with the Google Gemini 2.5 Model Family - Philipp Schmid, Google DeepMind

AI Engineer session on AI Engineering with the Google Gemini 2.5 Model Family - Philipp Schmid, Google DeepMind. It adds practical context for how teams are building and operating AI systems in production.

AI Engineering Prompt Engineering Model Evaluation

Open notes Watch on YouTube

How Not to Read a Headline on AI (ft. new Olympiad Gold, GPT-5 …) video thumbnail

AI Explained July 21, 2025 video

How Not to Read a Headline on AI (ft. new Olympiad Gold, GPT-5 …)

This AI Explained video reviews a major AI development through the lens of governance and responsible deployment. It is useful context for AI engineering, evaluation, governance, and operational risk.

Model Evaluation AI Compliance Agent Security

Open notes Watch on YouTube

Grok 4 - 10 New Things to Know video thumbnail

AI Explained July 10, 2025 video

Grok 4 - 10 New Things to Know

This AI Explained video reviews a major AI development through the lens of benchmarks and evaluation evidence. It is useful context for AI engineering, evaluation, governance, and operational risk.

Model Evaluation Prompt Engineering Adversarial ML

Open notes Watch on YouTube

The Benchmarks Game: Why It's Rigged and How You Can (Really) Win - Darius Emrani video thumbnail

AI Engineer June 24, 2025 video

The Benchmarks Game: Why It's Rigged and How You Can (Really) Win - Darius Emrani

AI Engineer session on The Benchmarks Game: Why It's Rigged and How You Can (Really) Win - Darius Emrani. It adds practical context for how teams are building and operating AI systems in production.

AI Engineering Model Evaluation

Open notes Watch on YouTube

The Demo I Wish I'd Had: OpenAI's Agents SDK... serverless! - Brook Riggio video thumbnail

AI Engineer June 24, 2025 video

The Demo I Wish I'd Had: OpenAI's Agents SDK... serverless! - Brook Riggio

AI Engineer session on The Demo I Wish I'd Had: OpenAI's Agents SDK... serverless! - Brook Riggio. It adds practical context for how teams are building and operating AI systems in production.

AI Engineering Agent Security Model Evaluation

Open notes Watch on YouTube

The Future of Qwen: A Generalist Agent Model — Junyang Lin, Alibaba Qwen video thumbnail

AI Engineer June 24, 2025 video

The Future of Qwen: A Generalist Agent Model — Junyang Lin, Alibaba Qwen

AI Engineer session on The Future of Qwen: A Generalist Agent Model, presented by Junyang Lin, Alibaba Qwen. It adds practical context for how teams are building and operating AI systems in production.

AI Engineering Agent Security Model Evaluation

Open notes Watch on YouTube

Analyzing 10,000 Sales Calls With AI In 2 Weeks — Charlie Guo video thumbnail

AI Engineer June 24, 2025 video

Analyzing 10,000 Sales Calls With AI In 2 Weeks — Charlie Guo

AI Engineer session on Analyzing 10,000 Sales Calls With AI In 2 Weeks, presented by Charlie Guo. It adds practical context for how teams are building and operating AI systems in production.

AI Engineering Model Evaluation

Open notes Watch on YouTube

When Will AI Models Blackmail You, and Why? video thumbnail

AI Explained June 24, 2025 video

When Will AI Models Blackmail You, and Why?

This AI Explained video reviews a major AI development through the lens of agentic workflows and tool-use risk. It is useful context for AI engineering, evaluation, governance, and operational risk.

Model Evaluation Agent Security Prompt Engineering AI Red Teaming

Open notes Watch on YouTube

Veo 3 for Developers — Paige Bailey, Google DeepMind video thumbnail

AI Engineer June 24, 2025 video

Veo 3 for Developers — Paige Bailey, Google DeepMind

AI Engineer session on Veo 3 for Developers, presented by Paige Bailey, Google DeepMind. It adds practical context for how teams are building and operating AI systems in production.

AI Engineering Prompt Engineering Model Evaluation

Open notes Watch on YouTube

MCP: Origins and Requests For Startups — Theodora Chu, Model Context Protocol PM, Anthropic video thumbnail

AI Engineer June 24, 2025 video

MCP: Origins and Requests For Startups — Theodora Chu, Model Context Protocol PM, Anthropic

AI Engineer session on MCP: Origins and Requests For Startups, presented by Theodora Chu, Model Context Protocol PM, Anthropic. It adds practical context for how teams are building and operating AI systems in production.

AI Engineering Agent Security Model Evaluation

Open notes Watch on YouTube

ChatGPT is poorly designed. So I fixed it video thumbnail

AI Engineer June 24, 2025 video

ChatGPT is poorly designed. So I fixed it

AI Engineer session on ChatGPT is poorly designed. So I fixed it. It adds practical context for how teams are building and operating AI systems in production.

AI Engineering Model Evaluation

Open notes Watch on YouTube

The Voice-First AI Overlay: Designing Conversational Co-Pilots - Gregory Bruss video thumbnail

AI Engineer June 24, 2025 video

The Voice-First AI Overlay: Designing Conversational Co-Pilots - Gregory Bruss

AI Engineer session on The Voice-First AI Overlay: Designing Conversational Co-Pilots - Gregory Bruss. It adds practical context for how teams are building and operating AI systems in production.

AI Engineering Model Evaluation

Open notes Watch on YouTube

Apple’s ‘AI Can’t Reason’ Claim Seen By 13M+, What You Need to Know video thumbnail

AI Explained June 12, 2025 video

Apple’s ‘AI Can’t Reason’ Claim Seen By 13M+, What You Need to Know

This AI Explained video reviews a major AI development through the lens of governance and responsible deployment. It is useful context for AI engineering, evaluation, governance, and operational risk.

Model Evaluation AI Compliance Adversarial ML

Open notes Watch on YouTube

AI Accelerates: New Gemini Model + AI Unemployment Stories Analysed video thumbnail

AI Explained June 6, 2025 video

AI Accelerates: New Gemini Model + AI Unemployment Stories Analysed

This AI Explained video reviews a major AI development through the lens of governance and responsible deployment. It is useful context for AI engineering, evaluation, governance, and operational risk.

Model Evaluation AI Compliance Adversarial ML

Open notes Watch on YouTube

Claude 4: Full 120 Page Breakdown … Is it the Best New Model? video thumbnail

AI Explained May 22, 2025 video

Claude 4: Full 120 Page Breakdown … Is it the Best New Model?

This AI Explained video reviews a major AI development through the lens of benchmarks and evaluation evidence. It is useful context for AI engineering, evaluation, governance, and operational risk.

Model Evaluation AI Red Teaming Adversarial ML

Open notes Watch on YouTube

Google Takes No Prisoners Amid Torrent of AI Announcements video thumbnail

AI Explained May 21, 2025 video

Google Takes No Prisoners Amid Torrent of AI Announcements

This AI Explained video reviews a major AI development through the lens of benchmarks and evaluation evidence. It is useful context for AI engineering, evaluation, governance, and operational risk.

Model Evaluation Adversarial ML

Open notes Watch on YouTube

AI Improves at Self-improving video thumbnail

AI Explained May 19, 2025 video

AI Improves at Self-improving

This AI Explained video reviews a major AI development through the lens of agentic workflows and tool-use risk. It is useful context for AI engineering, evaluation, governance, and operational risk.

Model Evaluation Agent Security Prompt Engineering

Open notes Watch on YouTube

"OpenAI is Not God” - The DeepSeek Documentary on Liang Wenfeng, R1 and What's Next video thumbnail

AI Explained April 27, 2025 video

"OpenAI is Not God” - The DeepSeek Documentary on Liang Wenfeng, R1 and What's Next

This AI Explained video reviews a major AI development through the lens of benchmarks and evaluation evidence. It is useful context for AI engineering, evaluation, governance, and operational risk.

Model Evaluation AI Red Teaming

Open notes Watch on YouTube

o3 breaks (some) records, but AI becomes pay-to-win video thumbnail

AI Explained April 25, 2025 video

o3 breaks (some) records, but AI becomes pay-to-win

This AI Explained video reviews a major AI development through the lens of agentic workflows and tool-use risk. It is useful context for AI engineering, evaluation, governance, and operational risk.

Model Evaluation Agent Security Adversarial ML

Open notes Watch on YouTube

RAG and the MongoDB Document Model: Ben Flast video thumbnail

AI Engineer April 24, 2025 video

RAG and the MongoDB Document Model: Ben Flast

AI Engineer session on RAG and the MongoDB Document Model: Ben Flast. It adds practical context for how teams are building and operating AI systems in production.

AI Engineering Model Evaluation

Open notes Watch on YouTube

Decoding Mistral AI's Large Language Models: Devendra Chaplot video thumbnail

AI Engineer April 24, 2025 video

Decoding Mistral AI's Large Language Models: Devendra Chaplot

AI Engineer session on Decoding Mistral AI's Large Language Models: Devendra Chaplot. It adds practical context for how teams are building and operating AI systems in production.

AI Engineering Model Evaluation

Open notes Watch on YouTube

Frontier Feud: Anthropic, Google DeepMind, Meta FAIR, Thinking Machines — Barr Yaron, Amplify video thumbnail

AI Engineer April 24, 2025 video

Frontier Feud: Anthropic, Google DeepMind, Meta FAIR, Thinking Machines — Barr Yaron, Amplify

AI Engineer session on Frontier Feud: Anthropic, Google DeepMind, Meta FAIR, Thinking Machines, presented by Barr Yaron, Amplify. It adds practical context for how teams are building and operating AI systems in production.

AI Engineering Model Evaluation

Open notes Watch on YouTube

Making Open Models 10x faster and better for Modern Application Innovation: Dmytro (Dima) Dzhulgakov video thumbnail

AI Engineer April 24, 2025 video

Making Open Models 10x faster and better for Modern Application Innovation: Dmytro (Dima) Dzhulgakov

AI Engineer session on Making Open Models 10x faster and better for Modern Application Innovation: Dmytro (Dima) Dzhulgakov. It adds practical context for how teams are building and operating AI systems in production.

AI Engineering Model Evaluation

Open notes Watch on YouTube

Stateful Agents — Full Workshop with Charles Packer of Letta and MemGPT video thumbnail

AI Engineer April 24, 2025 video

Stateful Agents — Full Workshop with Charles Packer of Letta and MemGPT

AI Engineer session on Stateful Agents, presented by Full Workshop with Charles Packer of Letta and MemGPT. It adds practical context for how teams are building and operating AI systems in production.

AI Engineering Agent Security Model Evaluation

Open notes Watch on YouTube

The Model Isn’t Wrong — You’re Just Bad at Prompting video thumbnail

AI Engineer April 24, 2025 video

The Model Isn’t Wrong — You’re Just Bad at Prompting

AI Engineer session on The Model Isn’t Wrong, presented by You’re Just Bad at Prompting. It adds practical context for how teams are building and operating AI systems in production.

AI Engineering Prompt Engineering Model Evaluation

Open notes Watch on YouTube

From model weights to API endpoint with TensorRT LLM: Philip Kiely and Pankaj Gupta video thumbnail

AI Engineer April 24, 2025 video

From model weights to API endpoint with TensorRT LLM: Philip Kiely and Pankaj Gupta

AI Engineer session on From model weights to API endpoint with TensorRT LLM: Philip Kiely and Pankaj Gupta. It adds practical context for how teams are building and operating AI systems in production.

AI Engineering Model Evaluation

Open notes Watch on YouTube

Moondream: how does a tiny vision model slap so hard? — Vikhyat Korrapati video thumbnail

AI Engineer April 24, 2025 video

Moondream: how does a tiny vision model slap so hard? — Vikhyat Korrapati

AI Engineer session on Moondream: how does a tiny vision model slap so hard?, presented by Vikhyat Korrapati. It adds practical context for how teams are building and operating AI systems in production.

AI Engineering Model Evaluation

Open notes Watch on YouTube

Fixing bugs in Gemma, Llama, & Phi 3: Daniel Han video thumbnail

AI Engineer April 24, 2025 video

Fixing bugs in Gemma, Llama, & Phi 3: Daniel Han

AI Engineer session on Fixing bugs in Gemma, Llama, & Phi 3: Daniel Han. It adds practical context for how teams are building and operating AI systems in production.

AI Engineering Model Evaluation

Open notes Watch on YouTube

State Space Models for Realtime Multimodal Intelligence: Karan Goel video thumbnail

AI Engineer April 24, 2025 video

State Space Models for Realtime Multimodal Intelligence: Karan Goel

AI Engineer session on State Space Models for Realtime Multimodal Intelligence: Karan Goel. It adds practical context for how teams are building and operating AI systems in production.

AI Engineering Model Evaluation

Open notes Watch on YouTube

Unveiling the latest Gemma model advancements: Kathleen Kenealy video thumbnail

AI Engineer April 24, 2025 video

Unveiling the latest Gemma model advancements: Kathleen Kenealy

AI Engineer session on Unveiling the latest Gemma model advancements: Kathleen Kenealy. It adds practical context for how teams are building and operating AI systems in production.

AI Engineering Model Evaluation

Open notes Watch on YouTube

Multi model multimodal and multi agent innovations in Azure AI: Cedric Vidal video thumbnail

AI Engineer April 24, 2025 video

Multi model multimodal and multi agent innovations in Azure AI: Cedric Vidal

AI Engineer session on Multi model multimodal and multi agent innovations in Azure AI: Cedric Vidal. It adds practical context for how teams are building and operating AI systems in production.

AI Engineering Agent Security Model Evaluation

Open notes Watch on YouTube

How to build the world's fastest voice bot: Kwindla Hultman Kramer video thumbnail

AI Engineer April 24, 2025 video

How to build the world's fastest voice bot: Kwindla Hultman Kramer

AI Engineer session on How to build the world's fastest voice bot: Kwindla Hultman Kramer. It adds practical context for how teams are building and operating AI systems in production.

AI Engineering Prompt Engineering Model Evaluation

Open notes Watch on YouTube

How Deep Research Works - Mukund Sridhar & Aarush Selvan, Google DeepMind video thumbnail

AI Engineer April 24, 2025 video

How Deep Research Works - Mukund Sridhar & Aarush Selvan, Google DeepMind

AI Engineer session on How Deep Research Works - Mukund Sridhar & Aarush Selvan, Google DeepMind. It adds practical context for how teams are building and operating AI systems in production.

AI Engineering Agent Security Model Evaluation

Open notes Watch on YouTube

Customized, production ready inference with open source models: Dmytro (Dima) Dzhulgakov video thumbnail

AI Engineer April 24, 2025 video

Customized, production ready inference with open source models: Dmytro (Dima) Dzhulgakov

AI Engineer session on Customized, production ready inference with open source models: Dmytro (Dima) Dzhulgakov. It adds practical context for how teams are building and operating AI systems in production.

AI Engineering Model Evaluation

Open notes Watch on YouTube

Best Practices for Evaluating Large Language Model Applications with llmeval: Niklas Nielsen video thumbnail

AI Engineer April 24, 2025 video

Best Practices for Evaluating Large Language Model Applications with llmeval: Niklas Nielsen

AI Engineer session on Best Practices for Evaluating Large Language Model Applications with llmeval: Niklas Nielsen. It adds practical context for how teams are building and operating AI systems in production.

AI Engineering Model Evaluation

Open notes Watch on YouTube

System Design for Next-Gen Frontier Models — Dylan Patel, SemiAnalysis video thumbnail

AI Engineer April 24, 2025 video

System Design for Next-Gen Frontier Models — Dylan Patel, SemiAnalysis

AI Engineer session on System Design for Next-Gen Frontier Models, presented by Dylan Patel, SemiAnalysis. It adds practical context for how teams are building and operating AI systems in production.

AI Engineering Model Evaluation

Open notes Watch on YouTube

Accelerate your AI journey with Azure AI model catalog: Sharmila Chokalingam video thumbnail

AI Engineer April 24, 2025 video

Accelerate your AI journey with Azure AI model catalog: Sharmila Chokalingam

AI Engineer session on Accelerate your AI journey with Azure AI model catalog: Sharmila Chokalingam. It adds practical context for how teams are building and operating AI systems in production.

AI Engineering Model Evaluation

Open notes Watch on YouTube

Building Agents with Model Context Protocol - Full Workshop with Mahesh Murag of Anthropic video thumbnail

AI Engineer April 24, 2025 video

Building Agents with Model Context Protocol - Full Workshop with Mahesh Murag of Anthropic

AI Engineer session on Building Agents with Model Context Protocol - Full Workshop with Mahesh Murag of Anthropic. It adds practical context for how teams are building and operating AI systems in production.

AI Engineering Agent Security Prompt Engineering Model Evaluation

Open notes Watch on YouTube

Lessons from the Trenches: Building LLM Evals That Work IRL: Aparna Dhinkaran video thumbnail

AI Engineer April 24, 2025 video

Lessons from the Trenches: Building LLM Evals That Work IRL: Aparna Dhinkaran

AI Engineer session on Lessons from the Trenches: Building LLM Evals That Work IRL: Aparna Dhinkaran. It adds practical context for how teams are building and operating AI systems in production.

AI Engineering Prompt Engineering Model Evaluation

Open notes Watch on YouTube

Evaluating Domain Specific LLMs for Real World Finance — Waseem Alshikh, Writer video thumbnail

AI Engineer April 24, 2025 video

Evaluating Domain Specific LLMs for Real World Finance — Waseem Alshikh, Writer

AI Engineer session on Evaluating Domain Specific LLMs for Real World Finance, presented by Waseem Alshikh, Writer. It adds practical context for how teams are building and operating AI systems in production.

AI Engineering Model Evaluation

Open notes Watch on YouTube

How to evaluate a model for your use case: Emmanuel Turlay video thumbnail

AI Engineer April 24, 2025 video

How to evaluate a model for your use case: Emmanuel Turlay

AI Engineer session on How to evaluate a model for your use case: Emmanuel Turlay. It adds practical context for how teams are building and operating AI systems in production.

AI Engineering Model Evaluation

Open notes Watch on YouTube

[Full Workshop from Microsoft] Github Copilot - The World's Most Widely Adopted AI Developer Tool video thumbnail

AI Engineer April 24, 2025 video

[Full Workshop from Microsoft] Github Copilot - The World's Most Widely Adopted AI Developer Tool

AI Engineer session on [Full Workshop from Microsoft] Github Copilot - The World's Most Widely Adopted AI Developer Tool. It adds practical context for how teams are building and operating AI systems in production.

AI Engineering Prompt Engineering Model Evaluation

Open notes Watch on YouTube

GitHub Copilot: The World's Most Widely Adopted AI Developer Tool video thumbnail

AI Engineer April 24, 2025 video

GitHub Copilot: The World's Most Widely Adopted AI Developer Tool

AI Engineer session on GitHub Copilot: The World's Most Widely Adopted AI Developer Tool. It adds practical context for how teams are building and operating AI systems in production.

AI Engineering Prompt Engineering Model Evaluation

Open notes Watch on YouTube

Productionizing GenAI Models — Lessons from the world's best AI teams: Lukas Biewald video thumbnail

AI Engineer April 24, 2025 video

Productionizing GenAI Models — Lessons from the world's best AI teams: Lukas Biewald

AI Engineer session on Productionizing GenAI Models, presented by Lessons from the world's best AI teams: Lukas Biewald. It adds practical context for how teams are building and operating AI systems in production.

AI Engineering Model Evaluation

Open notes Watch on YouTube

WTF do people use Open Models for?? video thumbnail

AI Engineer April 24, 2025 video

WTF do people use Open Models for??

AI Engineer session on WTF do people use Open Models for??. It adds practical context for how teams are building and operating AI systems in production.

AI Engineering Model Evaluation

Open notes Watch on YouTube

Fine tune 20 Llama Models in 5 Minutes: Santosh Radha video thumbnail

AI Engineer April 24, 2025 video

Fine tune 20 Llama Models in 5 Minutes: Santosh Radha

AI Engineer session on Fine tune 20 Llama Models in 5 Minutes: Santosh Radha. It adds practical context for how teams are building and operating AI systems in production.

AI Engineering Model Evaluation

Open notes Watch on YouTube

o3 and o4-mini - they’re great, but easy to over-hype video thumbnail

AI Explained April 16, 2025 video

o3 and o4-mini - they’re great, but easy to over-hype

This AI Explained video reviews a major AI development through the lens of benchmarks and evaluation evidence. It is useful context for AI engineering, evaluation, governance, and operational risk.

Model Evaluation Adversarial ML

Open notes Watch on YouTube

‘Speaking Dolphin’ to AI Data Dominance, 4.1 + Kling 2.0: 7 Updates Critically Analysed video thumbnail

AI Explained April 16, 2025 video

‘Speaking Dolphin’ to AI Data Dominance, 4.1 + Kling 2.0: 7 Updates Critically Analysed

This AI Explained video reviews a major AI development through the lens of benchmarks and evaluation evidence. It is useful context for AI engineering, evaluation, governance, and operational risk.

Model Evaluation

Open notes Watch on YouTube

AI CEO: ‘Stock Crash Could Stop AI Progress’, Llama 4 Anti-climax + ‘Superintelligence in 2027’ ... video thumbnail

AI Explained April 7, 2025 video

AI CEO: ‘Stock Crash Could Stop AI Progress’, Llama 4 Anti-climax + ‘Superintelligence in 2027’ ...

This AI Explained video reviews a major AI development through the lens of benchmarks and evaluation evidence. It is useful context for AI engineering, evaluation, governance, and operational risk.

Model Evaluation Prompt Engineering AI Red Teaming

Open notes Watch on YouTube

Gemini 2.5 Pro - It’s a Darn Smart Chatbot … (New Simple High Score) video thumbnail

AI Explained March 28, 2025 video

Gemini 2.5 Pro - It’s a Darn Smart Chatbot … (New Simple High Score)

This AI Explained video reviews a major AI development through the lens of benchmarks and evaluation evidence. It is useful context for AI engineering, evaluation, governance, and operational risk.

Model Evaluation Prompt Engineering Adversarial ML

Open notes Watch on YouTube

OpenAI’s New ImageGen is Unexpectedly Epic … (ft. Reve, Imagen 3, Midjourney etc) video thumbnail

AI Explained March 25, 2025 video

OpenAI’s New ImageGen is Unexpectedly Epic … (ft. Reve, Imagen 3, Midjourney etc)

This AI Explained video reviews a major AI development through the lens of multimodal generation and provenance. It is useful context for AI engineering, evaluation, governance, and operational risk.

Model Evaluation Prompt Engineering

Open notes Watch on YouTube

Did AI Just Get Commoditized? Gemini 2.5, New DeepSeek V3, & Microsoft vs OpenAI video thumbnail

AI Explained March 25, 2025 video

Did AI Just Get Commoditized? Gemini 2.5, New DeepSeek V3, & Microsoft vs OpenAI

This AI Explained video reviews a major AI development through the lens of governance and responsible deployment. It is useful context for AI engineering, evaluation, governance, and operational risk.

Model Evaluation AI Compliance Adversarial ML

Open notes Watch on YouTube

NIST March 24, 2025 framework

Adversarial Machine Learning: A Taxonomy and Terminology of Attacks and Mitigations

NIST finalizes AI 100-2e2025, providing a terminology and taxonomy for adversarial machine learning across predictive and generative AI systems.

Adversarial ML Model Evaluation AI Compliance

Read summary Source link

Anthropic March 19, 2025 analysis

Progress from our Frontier Red Team

Anthropic shares lessons from frontier red teaming and discusses where models are showing early-warning signs of higher-risk cyber and biology capabilities.

AI Red Teaming Model Evaluation

Read summary Source link

Manus AI - The Calm Before the Hypestorm … (vs Deep Research + Grok 3) video thumbnail

AI Explained March 13, 2025 video

Manus AI - The Calm Before the Hypestorm … (vs Deep Research + Grok 3)

This AI Explained video reviews a major AI development through the lens of agentic workflows and tool-use risk. It is useful context for AI engineering, evaluation, governance, and operational risk.

Model Evaluation Agent Security AI Red Teaming Adversarial ML

Open notes Watch on YouTube

GPT 4.5 - not so much wow video thumbnail

AI Explained February 28, 2025 video

GPT 4.5 - not so much wow

This AI Explained video reviews a major AI development through the lens of benchmarks and evaluation evidence. It is useful context for AI engineering, evaluation, governance, and operational risk.

Model Evaluation Adversarial ML

Open notes Watch on YouTube

OpenAI February 25, 2025 framework

Deep research System Card

OpenAI’s system card for deep research covers prompt injection, privacy, code execution, and external red teaming prior to release.

Model Evaluation Prompt Injection AI Compliance

Read summary Source link

Claude 3.7 is More Significant than its Name Implies (ft DeepSeek R2 + GPT 4.5 coming soon) video thumbnail

AI Explained February 25, 2025 video

Claude 3.7 is More Significant than its Name Implies (ft DeepSeek R2 + GPT 4.5 coming soon)

This AI Explained video reviews a major AI development through the lens of governance and responsible deployment. It is useful context for AI engineering, evaluation, governance, and operational risk.

Model Evaluation AI Compliance Agent Security Prompt Engineering AI Red Teaming

Open notes Watch on YouTube

AGI: (gets close), Humans: ‘Who Gets to Own it?’ video thumbnail

AI Explained February 11, 2025 video

AGI: (gets close), Humans: ‘Who Gets to Own it?’

This AI Explained video reviews a major AI development through the lens of governance and responsible deployment. It is useful context for AI engineering, evaluation, governance, and operational risk.

Model Evaluation AI Compliance

Open notes Watch on YouTube

Deep Research by OpenAI - The Ups and Downs vs DeepSeek R1 Search + Gemini Deep Research video thumbnail

AI Explained February 3, 2025 video

Deep Research by OpenAI - The Ups and Downs vs DeepSeek R1 Search + Gemini Deep Research

This AI Explained video reviews a major AI development through the lens of benchmarks and evaluation evidence. It is useful context for AI engineering, evaluation, governance, and operational risk.

Model Evaluation Adversarial ML

Open notes Watch on YouTube

o3-mini and the “AI War” video thumbnail

AI Explained January 31, 2025 video

o3-mini and the “AI War”

This AI Explained video reviews a major AI development through the lens of benchmarks and evaluation evidence. It is useful context for AI engineering, evaluation, governance, and operational risk.

Model Evaluation AI Red Teaming

Open notes Watch on YouTube

Nothing Much Happens in AI, Then Everything Does All At Once video thumbnail

AI Explained January 24, 2025 video

Nothing Much Happens in AI, Then Everything Does All At Once

This AI Explained video reviews a major AI development through the lens of governance and responsible deployment. It is useful context for AI engineering, evaluation, governance, and operational risk.

Model Evaluation AI Compliance Agent Security Prompt Engineering Adversarial ML

Open notes Watch on YouTube

OpenAI January 23, 2025 framework

Operator System Card

The Operator system card documents red teaming and mitigation choices for a computer-using agent, with prompt injections listed as a central risk area.

Agent Security Model Evaluation Prompt Injection AI Compliance

Read summary Source link

Altman Expects a ‘Fast Take-off’, ‘Super-Agent’ Debuting Soon and DeepSeek R1 Out video thumbnail

AI Explained January 20, 2025 video

Altman Expects a ‘Fast Take-off’, ‘Super-Agent’ Debuting Soon and DeepSeek R1 Out

This AI Explained video reviews a major AI development through the lens of governance and responsible deployment. It is useful context for AI engineering, evaluation, governance, and operational risk.

Model Evaluation AI Compliance Agent Security Adversarial ML

Open notes Watch on YouTube

OpenAI Backtracks, Gunning for Superintelligence: Altman Brings His AGI Timeline Closer - '25 to '29 video thumbnail

AI Explained January 8, 2025 video

OpenAI Backtracks, Gunning for Superintelligence: Altman Brings His AGI Timeline Closer - '25 to '29

This AI Explained video reviews a major AI development through the lens of agentic workflows and tool-use risk. It is useful context for AI engineering, evaluation, governance, and operational risk.

Model Evaluation Agent Security Adversarial ML

Open notes Watch on YouTube

o3 - wow video thumbnail

AI Explained December 20, 2024 video

o3 - wow

This AI Explained video reviews a major AI development through the lens of benchmarks and evaluation evidence. It is useful context for AI engineering, evaluation, governance, and operational risk.

Model Evaluation AI Red Teaming Adversarial ML

Open notes Watch on YouTube

Never Browse Alone? Gemini 2 Live and ChatGPT Vision video thumbnail

AI Explained December 12, 2024 video

Never Browse Alone? Gemini 2 Live and ChatGPT Vision

This AI Explained video reviews a major AI development through the lens of agentic workflows and tool-use risk. It is useful context for AI engineering, evaluation, governance, and operational risk.

Model Evaluation Agent Security Adversarial ML

Open notes Watch on YouTube

Sora is Out, But is it a Distraction? video thumbnail

AI Explained December 9, 2024 video

Sora is Out, But is it a Distraction?

This AI Explained video reviews a major AI development through the lens of governance and responsible deployment. It is useful context for AI engineering, evaluation, governance, and operational risk.

Model Evaluation AI Compliance

Open notes Watch on YouTube

o1 Pro Mode – ChatGPT Pro Full Analysis (plus o1 paper highlights) video thumbnail

AI Explained December 5, 2024 video

o1 Pro Mode – ChatGPT Pro Full Analysis (plus o1 paper highlights)

This AI Explained video reviews a major AI development through the lens of benchmarks and evaluation evidence. It is useful context for AI engineering, evaluation, governance, and operational risk.

Model Evaluation AI Red Teaming Adversarial ML

Open notes Watch on YouTube

AI Breaks Its Silence: OpenAI’s ‘Next 12 Days’, Genie 2, and a Word of Caution video thumbnail

AI Explained December 4, 2024 video

AI Breaks Its Silence: OpenAI’s ‘Next 12 Days’, Genie 2, and a Word of Caution

This AI Explained video reviews a major AI development through the lens of agentic workflows and tool-use risk. It is useful context for AI engineering, evaluation, governance, and operational risk.

Model Evaluation Agent Security Adversarial ML

Open notes Watch on YouTube

New Google Model Ranked ‘No. 1 LLM’, But There’s a Problem video thumbnail

AI Explained November 15, 2024 video

New Google Model Ranked ‘No. 1 LLM’, But There’s a Problem

This AI Explained video reviews a major AI development through the lens of agentic workflows and tool-use risk. It is useful context for AI engineering, evaluation, governance, and operational risk.

Model Evaluation Agent Security Adversarial ML

Open notes Watch on YouTube

Leak: ‘GPT-5 exhibits diminishing returns’, Sam Altman: ‘lol’ video thumbnail

AI Explained November 10, 2024 video

Leak: ‘GPT-5 exhibits diminishing returns’, Sam Altman: ‘lol’

This AI Explained video reviews a major AI development through the lens of benchmarks and evaluation evidence. It is useful context for AI engineering, evaluation, governance, and operational risk.

Model Evaluation

Open notes Watch on YouTube

ChatGPT with Search, Altman AMA video thumbnail

AI Explained October 31, 2024 video

ChatGPT with Search, Altman AMA

This AI Explained video reviews a major AI development through the lens of benchmarks and evaluation evidence. It is useful context for AI engineering, evaluation, governance, and operational risk.

Model Evaluation

Open notes Watch on YouTube

The New Claude 3.5 Sonnet: Better, Yes, But Not Just in the Way You Might Think video thumbnail

AI Explained October 23, 2024 video

The New Claude 3.5 Sonnet: Better, Yes, But Not Just in the Way You Might Think

This AI Explained video reviews a major AI development through the lens of agentic workflows and tool-use risk. It is useful context for AI engineering, evaluation, governance, and operational risk.

Model Evaluation Agent Security Adversarial ML

Open notes Watch on YouTube

AI - 2024AD: 212-page Report (from this morning) Fully Read w/ Highlights video thumbnail

AI Explained October 10, 2024 video

AI - 2024AD: 212-page Report (from this morning) Fully Read w/ Highlights

This AI Explained video reviews a major AI development through the lens of governance and responsible deployment. It is useful context for AI engineering, evaluation, governance, and operational risk.

Model Evaluation AI Compliance Prompt Engineering Prompt Injection

Open notes Watch on YouTube

OpenAI: ‘We Just Reached Human-level Reasoning’. video thumbnail

AI Explained October 3, 2024 video

OpenAI: ‘We Just Reached Human-level Reasoning’.

This AI Explained video reviews a major AI development through the lens of agentic workflows and tool-use risk. It is useful context for AI engineering, evaluation, governance, and operational risk.

Model Evaluation Agent Security Adversarial ML

Open notes Watch on YouTube

‘Advanced Voice’ ChatGPT Just Happened … But There's 3 Other Stories You Probably Shouldn’t Ignore video thumbnail

AI Explained September 25, 2024 video

‘Advanced Voice’ ChatGPT Just Happened … But There's 3 Other Stories You Probably Shouldn’t Ignore

This AI Explained video reviews a major AI development through the lens of benchmarks and evaluation evidence. It is useful context for AI engineering, evaluation, governance, and operational risk.

Model Evaluation

Open notes Watch on YouTube

o1 - What is Going On? Why o1 is a 3rd Paradigm of Model + 10 Things You Might Not Know video thumbnail

AI Explained September 18, 2024 video

o1 - What is Going On? Why o1 is a 3rd Paradigm of Model + 10 Things You Might Not Know

This AI Explained video reviews a major AI development through the lens of benchmarks and evaluation evidence. It is useful context for AI engineering, evaluation, governance, and operational risk.

Model Evaluation Prompt Engineering

Open notes Watch on YouTube

ChatGPT o1 - In-Depth Analysis and Reaction (o1-preview) video thumbnail

AI Explained September 13, 2024 video

ChatGPT o1 - In-Depth Analysis and Reaction (o1-preview)

This AI Explained video reviews a major AI development through the lens of benchmarks and evaluation evidence. It is useful context for AI engineering, evaluation, governance, and operational risk.

Model Evaluation Adversarial ML

Open notes Watch on YouTube

$125B for Superintelligence? 3 Models Coming, Sutskever's Secret SSI, & Data Centers (in space)... video thumbnail

AI Explained September 4, 2024 video

$125B for Superintelligence? 3 Models Coming, Sutskever's Secret SSI, & Data Centers (in space)...

This AI Explained video reviews a major AI development through the lens of benchmarks and evaluation evidence. It is useful context for AI engineering, evaluation, governance, and operational risk.

Model Evaluation AI Red Teaming

Open notes Watch on YouTube

Grok-2 Actually Out, But What If It Were 10,000x the Size? video thumbnail

AI Explained August 22, 2024 video

Grok-2 Actually Out, But What If It Were 10,000x the Size?

This AI Explained video reviews a major AI development through the lens of benchmarks and evaluation evidence. It is useful context for AI engineering, evaluation, governance, and operational risk.

Model Evaluation Prompt Engineering

Open notes Watch on YouTube

Was GPT-5 Underwhelming, Or Not? OpenAI Co-founder Exits, Figure02 Arrives, Character.AI Gutted video thumbnail

AI Explained August 6, 2024 video

Was GPT-5 Underwhelming, Or Not? OpenAI Co-founder Exits, Figure02 Arrives, Character.AI Gutted

This AI Explained video reviews a major AI development through the lens of AI safety and model behavior. It is useful context for AI engineering, evaluation, governance, and operational risk.

Model Evaluation AI Red Teaming

Open notes Watch on YouTube

Llama 405b: Full 92 page Analysis, and Uncontaminated SIMPLE Benchmark Results video thumbnail

AI Explained July 24, 2024 video

Llama 405b: Full 92 page Analysis, and Uncontaminated SIMPLE Benchmark Results

This AI Explained video reviews a major AI development through the lens of benchmarks and evaluation evidence. It is useful context for AI engineering, evaluation, governance, and operational risk.

Model Evaluation AI Red Teaming Adversarial ML

Open notes Watch on YouTube

GPT-4o Mini Arrives In Global IT Outage, But How ‘Mini’ Is Its Intelligence? video thumbnail

AI Explained July 19, 2024 video

GPT-4o Mini Arrives In Global IT Outage, But How ‘Mini’ Is Its Intelligence?

This AI Explained video reviews a major AI development through the lens of benchmarks and evaluation evidence. It is useful context for AI engineering, evaluation, governance, and operational risk.

Model Evaluation

Open notes Watch on YouTube

How Far Can We Scale AI? Gen 3, Claude 3.5 Sonnet and AI Hype video thumbnail

AI Explained June 30, 2024 video

How Far Can We Scale AI? Gen 3, Claude 3.5 Sonnet and AI Hype

This AI Explained video reviews a major AI development through the lens of AI safety and model behavior. It is useful context for AI engineering, evaluation, governance, and operational risk.

Model Evaluation Prompt Engineering AI Red Teaming

Open notes Watch on YouTube

AI Won't Be AGI, Until It Can At Least Do This (plus 6 key ways LLMs are being upgraded) video thumbnail

AI Explained June 17, 2024 video

AI Won't Be AGI, Until It Can At Least Do This (plus 6 key ways LLMs are being upgraded)

This AI Explained video reviews a major AI development through the lens of governance and responsible deployment. It is useful context for AI engineering, evaluation, governance, and operational risk.

Model Evaluation AI Compliance AI Red Teaming

Open notes Watch on YouTube

‘Everything is Going to Be Robotic’ Nvidia Promises, as AI Gets More Real video thumbnail

AI Explained June 4, 2024 video

‘Everything is Going to Be Robotic’ Nvidia Promises, as AI Gets More Real

This AI Explained video reviews a major AI development through the lens of governance and responsible deployment. It is useful context for AI engineering, evaluation, governance, and operational risk.

Model Evaluation AI Compliance

Open notes Watch on YouTube

Microsoft Promises a 'Whale' for GPT-5, Anthropic Delves Inside a Model’s Mind and Altman Stumbles video thumbnail

AI Explained May 22, 2024 video

Microsoft Promises a 'Whale' for GPT-5, Anthropic Delves Inside a Model’s Mind and Altman Stumbles

This AI Explained video reviews a major AI development through the lens of AI safety and model behavior. It is useful context for AI engineering, evaluation, governance, and operational risk.

Model Evaluation AI Red Teaming

Open notes Watch on YouTube

GPT-4o - Full Breakdown + Bonus Details video thumbnail

AI Explained May 13, 2024 video

GPT-4o - Full Breakdown + Bonus Details

This AI Explained video reviews a major AI development through the lens of benchmarks and evaluation evidence. It is useful context for AI engineering, evaluation, governance, and operational risk.

Model Evaluation Adversarial ML

Open notes Watch on YouTube

AI Conquers Gravity: Robo-dog, Trained by GPT-4, Stays Balanced on Rolling, Deflating Yoga Ball video thumbnail

AI Explained May 5, 2024 video

AI Conquers Gravity: Robo-dog, Trained by GPT-4, Stays Balanced on Rolling, Deflating Yoga Ball

This AI Explained video reviews a major AI development through the lens of robotics, world models, and embodied AI. It is useful context for AI engineering, evaluation, governance, and operational risk.

Model Evaluation

Open notes Watch on YouTube

New OpenAI Model 'Imminent' and AI Stakes Get Raised (plus Med Gemini, GPT 2 Chatbot and Scale AI) video thumbnail

AI Explained May 2, 2024 video

New OpenAI Model 'Imminent' and AI Stakes Get Raised (plus Med Gemini, GPT 2 Chatbot and Scale AI)

This AI Explained video reviews a major AI development through the lens of agentic workflows and tool-use risk. It is useful context for AI engineering, evaluation, governance, and operational risk.

Model Evaluation Agent Security Prompt Engineering AI Red Teaming Adversarial ML

Open notes Watch on YouTube

Storyteller: Building Multi-modal Apps with TS & ModelFusion - Lars Grammel, PhD video thumbnail

AI Engineer April 24, 2024 video

Storyteller: Building Multi-modal Apps with TS & ModelFusion - Lars Grammel, PhD

AI Engineer session on Storyteller: Building Multi-modal Apps with TS & ModelFusion - Lars Grammel, PhD. It adds practical context for how teams are building and operating AI systems in production.

AI Engineering Prompt Engineering Model Evaluation

Open notes Watch on YouTube

The Code AI Maturity Model and What It Means For You: Ado Kukic video thumbnail

AI Engineer April 24, 2024 video

The Code AI Maturity Model and What It Means For You: Ado Kukic

AI Engineer session on The Code AI Maturity Model and What It Means For You: Ado Kukic. It adds practical context for how teams are building and operating AI systems in production.

AI Engineering Model Evaluation

Open notes Watch on YouTube

‘Her’ AI, Almost Here? Llama 3, Vasa-1, and Altman ‘Plugging Into Everything You Want To Do’ video thumbnail

AI Explained April 18, 2024 video

‘Her’ AI, Almost Here? Llama 3, Vasa-1, and Altman ‘Plugging Into Everything You Want To Do’

This AI Explained video reviews a major AI development through the lens of governance and responsible deployment. It is useful context for AI engineering, evaluation, governance, and operational risk.

Model Evaluation AI Compliance Agent Security AI Red Teaming Adversarial ML

Open notes Watch on YouTube

Udio, the Mysterious GPT Update, and Infinite Attention video thumbnail

AI Explained April 11, 2024 video

Udio, the Mysterious GPT Update, and Infinite Attention

This AI Explained video reviews a major AI development through the lens of benchmarks and evaluation evidence. It is useful context for AI engineering, evaluation, governance, and operational risk.

Model Evaluation Adversarial ML

Open notes Watch on YouTube

Why Does OpenAI Need a 'Stargate' Supercomputer? Ft. Perplexity CEO Aravind Srinivas video thumbnail

AI Explained April 2, 2024 video

Why Does OpenAI Need a 'Stargate' Supercomputer? Ft. Perplexity CEO Aravind Srinivas

This AI Explained video reviews a major AI development through the lens of multimodal generation and provenance. It is useful context for AI engineering, evaluation, governance, and operational risk.

Model Evaluation

Open notes Watch on YouTube

5 Key Quotes: Altman, Huang and 'The Most Interesting Year' video thumbnail

AI Explained March 19, 2024 video

5 Key Quotes: Altman, Huang and 'The Most Interesting Year'

This AI Explained video reviews a major AI development through the lens of AI safety and model behavior. It is useful context for AI engineering, evaluation, governance, and operational risk.

Model Evaluation AI Red Teaming

Open notes Watch on YouTube

AI Agents Take the Wheel: Devin, SIMA, Figure 01 and The Future of Jobs video thumbnail

AI Explained March 14, 2024 video

AI Agents Take the Wheel: Devin, SIMA, Figure 01 and The Future of Jobs

This AI Explained video reviews a major AI development through the lens of governance and responsible deployment. It is useful context for AI engineering, evaluation, governance, and operational risk.

Model Evaluation AI Compliance Agent Security Adversarial ML

Open notes Watch on YouTube

The New, Smartest AI: Claude 3 – Tested vs Gemini 1.5 + GPT-4 video thumbnail

AI Explained March 4, 2024 video

The New, Smartest AI: Claude 3 – Tested vs Gemini 1.5 + GPT-4

This AI Explained video reviews a major AI development through the lens of benchmarks and evaluation evidence. It is useful context for AI engineering, evaluation, governance, and operational risk.

Model Evaluation Adversarial ML

Open notes Watch on YouTube

The AI 'Genie' is Out + Humanoid Robotics Step Closer video thumbnail

AI Explained February 26, 2024 video

The AI 'Genie' is Out + Humanoid Robotics Step Closer

This AI Explained video reviews a major AI development through the lens of multimodal generation and provenance. It is useful context for AI engineering, evaluation, governance, and operational risk.

Model Evaluation

Open notes Watch on YouTube

Sora - Full Analysis (with new details) video thumbnail

AI Explained February 16, 2024 video

Sora - Full Analysis (with new details)

This AI Explained video reviews a major AI development through the lens of multimodal generation and provenance. It is useful context for AI engineering, evaluation, governance, and operational risk.

Model Evaluation

Open notes Watch on YouTube

Gemini 1.5 and The Biggest Night in AI video thumbnail

AI Explained February 15, 2024 video

Gemini 1.5 and The Biggest Night in AI

This AI Explained video reviews a major AI development through the lens of benchmarks and evaluation evidence. It is useful context for AI engineering, evaluation, governance, and operational risk.

Model Evaluation Adversarial ML

Open notes Watch on YouTube

Gemini Ultra - Full Review video thumbnail

AI Explained February 8, 2024 video

Gemini Ultra - Full Review

This AI Explained video reviews a major AI development through the lens of scaling and compute economics. It is useful context for AI engineering, evaluation, governance, and operational risk.

Model Evaluation Prompt Engineering Prompt Injection

Open notes Watch on YouTube

GPT-5: Everything You Need to Know So Far video thumbnail

AI Explained January 26, 2024 video

GPT-5: Everything You Need to Know So Far

This AI Explained video reviews a major AI development through the lens of model capability and AI systems in practice. It is useful context for AI engineering, evaluation, governance, and operational risk.

Model Evaluation AI Red Teaming

Open notes Watch on YouTube

Alpha Everywhere: AlphaGeometry, AlphaCodium and the Future of LLMs video thumbnail

AI Explained January 18, 2024 video

Alpha Everywhere: AlphaGeometry, AlphaCodium and the Future of LLMs

This AI Explained video reviews a major AI development through the lens of scaling and compute economics. It is useful context for AI engineering, evaluation, governance, and operational risk.

Model Evaluation

Open notes Watch on YouTube

OpenAI Flip-Flops and '10% Chance of Outperforming Humans in Every Task by 2027' - 3K AI Researchers video thumbnail

AI Explained January 12, 2024 video

OpenAI Flip-Flops and '10% Chance of Outperforming Humans in Every Task by 2027' - 3K AI Researchers

This AI Explained video reviews a major AI development through the lens of governance and responsible deployment. It is useful context for AI engineering, evaluation, governance, and operational risk.

Model Evaluation AI Compliance AI Red Teaming

Open notes Watch on YouTube

AI On An Exponential? Data, Mamba, and More video thumbnail

AI Explained January 1, 2024 video

AI On An Exponential? Data, Mamba, and More

This AI Explained video reviews a major AI development through the lens of scaling and compute economics. It is useful context for AI engineering, evaluation, governance, and operational risk.

Model Evaluation Prompt Engineering

Open notes Watch on YouTube

Midjourney v6, Altman 'Age Reversal' and Gemini 2 - Christmas Edition video thumbnail

AI Explained December 22, 2023 video

Midjourney v6, Altman 'Age Reversal' and Gemini 2 - Christmas Edition

This AI Explained video reviews a major AI development through the lens of multimodal generation and provenance. It is useful context for AI engineering, evaluation, governance, and operational risk.

Model Evaluation AI Red Teaming

Open notes Watch on YouTube

A 100T Transformer Model Coming? Plus ByteDance Saga and the Mixtral Price Drop video thumbnail

AI Explained December 18, 2023 video

A 100T Transformer Model Coming? Plus ByteDance Saga and the Mixtral Price Drop

This AI Explained video reviews a major AI development through the lens of multimodal generation and provenance. It is useful context for AI engineering, evaluation, governance, and operational risk.

Model Evaluation

Open notes Watch on YouTube

Phi-2, Imagen-2, Optimus-Gen-2: Small New Models to Change the World? video thumbnail

AI Explained December 13, 2023 video

Phi-2, Imagen-2, Optimus-Gen-2: Small New Models to Change the World?

This AI Explained video reviews a major AI development through the lens of benchmarks and evaluation evidence. It is useful context for AI engineering, evaluation, governance, and operational risk.

Model Evaluation Prompt Engineering

Open notes Watch on YouTube

Gemini Full Breakdown + AlphaCode 2 Bombshell video thumbnail

AI Explained December 6, 2023 video

Gemini Full Breakdown + AlphaCode 2 Bombshell

This AI Explained video reviews a major AI development through the lens of benchmarks and evaluation evidence. It is useful context for AI engineering, evaluation, governance, and operational risk.

Model Evaluation

Open notes Watch on YouTube

OpenAI Insights and Training Data Shenanigans - 7 'Complicated' Developments + Guest Star video thumbnail

AI Explained December 3, 2023 video

OpenAI Insights and Training Data Shenanigans - 7 'Complicated' Developments + Guest Star

This AI Explained video reviews a major AI development through the lens of model capability and AI systems in practice. It is useful context for AI engineering, evaluation, governance, and operational risk.

Model Evaluation Prompt Engineering Prompt Injection

Open notes Watch on YouTube

Q* - Clues to the Puzzle? video thumbnail

AI Explained November 24, 2023 video

Q* - Clues to the Puzzle?

This AI Explained video reviews a major AI development through the lens of governance and responsible deployment. It is useful context for AI engineering, evaluation, governance, and operational risk.

Model Evaluation AI Compliance AI Red Teaming

Open notes Watch on YouTube

Are We Back to Before? OpenAI 2.0, Inflection-2 and a Major AI Cancer Breakthrough video thumbnail

AI Explained November 22, 2023 video

Are We Back to Before? OpenAI 2.0, Inflection-2 and a Major AI Cancer Breakthrough

This AI Explained video reviews a major AI development through the lens of model capability and AI systems in practice. It is useful context for AI engineering, evaluation, governance, and operational risk.

Model Evaluation

Open notes Watch on YouTube

Altman@Microsoft, Shear@OpenAI, Chaos@Everywhere: Sutskever Regret and the Weekend That Changed AI video thumbnail

AI Explained November 20, 2023 video

Altman@Microsoft, Shear@OpenAI, Chaos@Everywhere: Sutskever Regret and the Weekend That Changed AI

This AI Explained video reviews a major AI development through the lens of benchmarks and evaluation evidence. It is useful context for AI engineering, evaluation, governance, and operational risk.

Model Evaluation

Open notes Watch on YouTube

Altman Out: Reasons, Reactions and the Repercussions for the Industry video thumbnail

AI Explained November 18, 2023 video

Altman Out: Reasons, Reactions and the Repercussions for the Industry

This AI Explained video reviews a major AI development through the lens of AI safety and model behavior. It is useful context for AI engineering, evaluation, governance, and operational risk.

Model Evaluation AI Red Teaming

Open notes Watch on YouTube

GPT 4 Turbo-Charged? Plus Custom GPTS, Grok, AGI Tier List, Vision Demos, Whisper V3 and more video thumbnail

AI Explained November 9, 2023 video

GPT 4 Turbo-Charged? Plus Custom GPTS, Grok, AGI Tier List, Vision Demos, Whisper V3 and more

This AI Explained video reviews a major AI development through the lens of model capability and AI systems in practice. It is useful context for AI engineering, evaluation, governance, and operational risk.

Model Evaluation

Open notes Watch on YouTube

AI Declarations and AGI Timelines – Looking More Optimistic? video thumbnail

AI Explained November 2, 2023 video

AI Declarations and AGI Timelines – Looking More Optimistic?

This AI Explained video reviews a major AI development through the lens of governance and responsible deployment. It is useful context for AI engineering, evaluation, governance, and operational risk.

Model Evaluation AI Compliance Prompt Engineering AI Red Teaming

Open notes Watch on YouTube

State of AI 2023: Highlights of 163 Page Report + Eureka Self-Improvement, MEG, Suno AI and GPT F video thumbnail

AI Explained October 22, 2023 video

State of AI 2023: Highlights of 163 Page Report + Eureka Self-Improvement, MEG, Suno AI and GPT F

This AI Explained video reviews a major AI development through the lens of governance and responsible deployment. It is useful context for AI engineering, evaluation, governance, and operational risk.

Model Evaluation AI Compliance Agent Security

Open notes Watch on YouTube

Not Slowing Down: GAIA-1 to GPT Vision Tips, Nvidia B100 to Bard vs LLaVA video thumbnail

AI Explained October 13, 2023 video

Not Slowing Down: GAIA-1 to GPT Vision Tips, Nvidia B100 to Bard vs LLaVA

This AI Explained video reviews a major AI development through the lens of AI safety and model behavior. It is useful context for AI engineering, evaluation, governance, and operational risk.

Model Evaluation

Open notes Watch on YouTube

RT-X and the Dawn of Large Multimodal Models: Google Breakthrough and 160-page Report Highlights video thumbnail

AI Explained October 3, 2023 video

RT-X and the Dawn of Large Multimodal Models: Google Breakthrough and 160-page Report Highlights

This AI Explained video reviews a major AI development through the lens of multimodal generation and provenance. It is useful context for AI engineering, evaluation, governance, and operational risk.

Model Evaluation Prompt Engineering

Open notes Watch on YouTube

An Actually Big Week in AI: AutoGen, The A-Phone, Mistral 7B, GPT-Fathom and Meta Hunts CharacterAI video thumbnail

AI Explained September 29, 2023 video

An Actually Big Week in AI: AutoGen, The A-Phone, Mistral 7B, GPT-Fathom and Meta Hunts CharacterAI

This AI Explained video reviews a major AI development through the lens of agentic workflows and tool-use risk. It is useful context for AI engineering, evaluation, governance, and operational risk.

Model Evaluation Agent Security

Open notes Watch on YouTube

ChatGPT Fails Basic Logic but Now Has Vision, Wins at Chess and Prompts a Masterpiece video thumbnail

AI Explained September 25, 2023 video

ChatGPT Fails Basic Logic but Now Has Vision, Wins at Chess and Prompts a Masterpiece

This AI Explained video reviews a major AI development through the lens of governance and responsible deployment. It is useful context for AI engineering, evaluation, governance, and operational risk.

Model Evaluation AI Compliance Prompt Engineering

Open notes Watch on YouTube

The New Bard and AI Images, Videos, and Translations video thumbnail

AI Explained September 20, 2023 video

The New Bard and AI Images, Videos, and Translations

This AI Explained video reviews a major AI development through the lens of scaling and compute economics. It is useful context for AI engineering, evaluation, governance, and operational risk.

Model Evaluation AI Red Teaming

Open notes Watch on YouTube

9 AI Developments: HeyGen 2.0 to AjaxGPT, Open Interpreter to NExT-GPT and Roblox AI video thumbnail

AI Explained September 15, 2023 video

9 AI Developments: HeyGen 2.0 to AjaxGPT, Open Interpreter to NExT-GPT and Roblox AI

This AI Explained video reviews a major AI development through the lens of agentic workflows and tool-use risk. It is useful context for AI engineering, evaluation, governance, and operational risk.

Model Evaluation AI Compliance Agent Security

Open notes Watch on YouTube

AGI Will Not Be A Chatbot - Autonomy, Acceleration, and Arguments Behind the Scenes video thumbnail

AI Explained September 7, 2023 video

AGI Will Not Be A Chatbot - Autonomy, Acceleration, and Arguments Behind the Scenes

This AI Explained video reviews a major AI development through the lens of governance and responsible deployment. It is useful context for AI engineering, evaluation, governance, and operational risk.

Model Evaluation AI Compliance Agent Security

Open notes Watch on YouTube

SmartGPT: Major Benchmark Broken - 89.0% on MMLU + Exam's Many Errors video thumbnail

AI Explained August 28, 2023 video

SmartGPT: Major Benchmark Broken - 89.0% on MMLU + Exam's Many Errors

This AI Explained video reviews a major AI development through the lens of benchmarks and evaluation evidence. It is useful context for AI engineering, evaluation, governance, and operational risk.

Model Evaluation AI Red Teaming Adversarial ML

Open notes Watch on YouTube

9 New Gemini Leaks, Code Llama and A Major AI Consciousness Paper video thumbnail

AI Explained August 24, 2023 video

9 New Gemini Leaks, Code Llama and A Major AI Consciousness Paper

This AI Explained video reviews a major AI development through the lens of AI safety and model behavior. It is useful context for AI engineering, evaluation, governance, and operational risk.

Model Evaluation

Open notes Watch on YouTube

AI Los Alamos? + New Realistic AI Avatars video thumbnail

AI Explained August 11, 2023 video

AI Los Alamos? + New Realistic AI Avatars

This AI Explained video reviews a major AI development through the lens of AI safety and model behavior. It is useful context for AI engineering, evaluation, governance, and operational risk.

Model Evaluation AI Red Teaming

Open notes Watch on YouTube

11 Major AI Developments: RT-2 to '100X GPT-4' video thumbnail

AI Explained July 30, 2023 video

11 Major AI Developments: RT-2 to '100X GPT-4'

This AI Explained video reviews a major AI development through the lens of AI safety and model behavior. It is useful context for AI engineering, evaluation, governance, and operational risk.

Model Evaluation Prompt Engineering Prompt Injection AI Red Teaming

Open notes Watch on YouTube

Llama 2: Full Breakdown video thumbnail

AI Explained July 19, 2023 video

Llama 2: Full Breakdown

This AI Explained video reviews a major AI development through the lens of governance and responsible deployment. It is useful context for AI engineering, evaluation, governance, and operational risk.

Model Evaluation AI Compliance AI Red Teaming Adversarial ML

Open notes Watch on YouTube

Bad AI Predictions: Bard Upgrade, 2 Years to AI Auto-Money, OpenAI Investigation and more video thumbnail

AI Explained July 17, 2023 video

Bad AI Predictions: Bard Upgrade, 2 Years to AI Auto-Money, OpenAI Investigation and more

This AI Explained video reviews a major AI development through the lens of robotics, world models, and embodied AI. It is useful context for AI engineering, evaluation, governance, and operational risk.

Model Evaluation

Open notes Watch on YouTube

Time Until Superintelligence: 1-2 Years, or 20? Something Doesn't Add Up video thumbnail

AI Explained July 10, 2023 video

Time Until Superintelligence: 1-2 Years, or 20? Something Doesn't Add Up

This AI Explained video reviews a major AI development through the lens of benchmarks and evaluation evidence. It is useful context for AI engineering, evaluation, governance, and operational risk.

Model Evaluation AI Red Teaming

Open notes Watch on YouTube

Phi-1: A 'Textbook' Model video thumbnail

AI Explained July 3, 2023 video

Phi-1: A 'Textbook' Model

This AI Explained video reviews a major AI development through the lens of benchmarks and evaluation evidence. It is useful context for AI engineering, evaluation, governance, and operational risk.

Model Evaluation AI Red Teaming

Open notes Watch on YouTube

Google Gemini: AlphaGo-GPT? video thumbnail

AI Explained June 28, 2023 video

Google Gemini: AlphaGo-GPT?

This AI Explained video reviews a major AI development through the lens of agentic workflows and tool-use risk. It is useful context for AI engineering, evaluation, governance, and operational risk.

Model Evaluation Agent Security AI Red Teaming

Open notes Watch on YouTube

ChatGPT's Achilles' Heel video thumbnail

AI Explained June 25, 2023 video

ChatGPT's Achilles' Heel

This AI Explained video reviews a major AI development through the lens of scaling and compute economics. It is useful context for AI engineering, evaluation, governance, and operational risk.

Model Evaluation Prompt Engineering Prompt Injection

Open notes Watch on YouTube

Sam Altman's World Tour, in 16 Moments video thumbnail

AI Explained June 13, 2023 video

Sam Altman's World Tour, in 16 Moments

This AI Explained video reviews a major AI development through the lens of AI safety and model behavior. It is useful context for AI engineering, evaluation, governance, and operational risk.

Model Evaluation

Open notes Watch on YouTube

The AI News You Might Have Missed This Week - Zuckerberg to Falcon w/ SPQR video thumbnail

AI Explained June 11, 2023 video

The AI News You Might Have Missed This Week - Zuckerberg to Falcon w/ SPQR

This AI Explained video reviews a major AI development through the lens of multimodal generation and provenance. It is useful context for AI engineering, evaluation, governance, and operational risk.

Model Evaluation

Open notes Watch on YouTube

Orca: The Model Few Saw Coming video thumbnail

AI Explained June 7, 2023 video

Orca: The Model Few Saw Coming

This AI Explained video reviews a major AI development through the lens of benchmarks and evaluation evidence. It is useful context for AI engineering, evaluation, governance, and operational risk.

Model Evaluation Adversarial ML

Open notes Watch on YouTube

The AI News You Might Have Missed This Week video thumbnail

AI Explained June 4, 2023 video

The AI News You Might Have Missed This Week

This AI Explained video reviews a major AI development through the lens of robotics, world models, and embodied AI. It is useful context for AI engineering, evaluation, governance, and operational risk.

Model Evaluation

Open notes Watch on YouTube

'Show Your Working': ChatGPT Performance Doubled w/ Process Rewards (+Synthetic Data Event Horizon) video thumbnail

AI Explained June 1, 2023 video

'Show Your Working': ChatGPT Performance Doubled w/ Process Rewards (+Synthetic Data Event Horizon)

This AI Explained video reviews a major AI development through the lens of benchmarks and evaluation evidence. It is useful context for AI engineering, evaluation, governance, and operational risk.

Model Evaluation Prompt Engineering AI Red Teaming

Open notes Watch on YouTube

Hassabis, Altman and AGI Labs Unite - AI Extinction Risk Statement [ft. Sutskever, Hinton + Voyager] video thumbnail

AI Explained May 30, 2023 video

Hassabis, Altman and AGI Labs Unite - AI Extinction Risk Statement [ft. Sutskever, Hinton + Voyager]

This AI Explained video reviews a major AI development through the lens of AI safety and model behavior. It is useful context for AI engineering, evaluation, governance, and operational risk.

Model Evaluation AI Red Teaming

Open notes Watch on YouTube

12 New Code Interpreter Uses (Image to 3D, Book Scans, Multiple Datasets, Error Analysis ... ) video thumbnail

AI Explained May 22, 2023 video

12 New Code Interpreter Uses (Image to 3D, Book Scans, Multiple Datasets, Error Analysis ... )

This AI Explained video reviews a major AI development through the lens of agentic workflows and tool-use risk. It is useful context for AI engineering, evaluation, governance, and operational risk.

Model Evaluation Agent Security

Open notes Watch on YouTube

GPT 4 Got Upgraded - Code Interpreter (ft. Image Editing, MP4s, 3D Plots, Data Analytics and more!) video thumbnail

AI Explained May 20, 2023 video

GPT 4 Got Upgraded - Code Interpreter (ft. Image Editing, MP4s, 3D Plots, Data Analytics and more!)

This AI Explained video reviews a major AI development through the lens of agentic workflows and tool-use risk. It is useful context for AI engineering, evaluation, governance, and operational risk.

Model Evaluation Agent Security

Open notes Watch on YouTube

'This Could Go Quite Wrong' - Altman Testimony, GPT 5 Timeline, Self-Awareness, Drones and more video thumbnail

AI Explained May 17, 2023 video

'This Could Go Quite Wrong' - Altman Testimony, GPT 5 Timeline, Self-Awareness, Drones and more

This AI Explained video reviews a major AI development through the lens of governance and responsible deployment. It is useful context for AI engineering, evaluation, governance, and operational risk.

Model Evaluation AI Compliance AI Red Teaming

Open notes Watch on YouTube

Enter PaLM 2 (New Bard): Full Breakdown - 92 Pages Read and Gemini Before GPT 5? Google I/O video thumbnail

AI Explained May 11, 2023 video

Enter PaLM 2 (New Bard): Full Breakdown - 92 Pages Read and Gemini Before GPT 5? Google I/O

This AI Explained video reviews a major AI development through the lens of governance and responsible deployment. It is useful context for AI engineering, evaluation, governance, and operational risk.

Model Evaluation AI Compliance AI Red Teaming Adversarial ML

Open notes Watch on YouTube

GPT 4 is Smarter than You Think: Introducing SmartGPT video thumbnail

AI Explained May 7, 2023 video

GPT 4 is Smarter than You Think: Introducing SmartGPT

This AI Explained video reviews a major AI development through the lens of agentic workflows and tool-use risk. It is useful context for AI engineering, evaluation, governance, and operational risk.

Model Evaluation Agent Security Prompt Engineering Adversarial ML

Open notes Watch on YouTube

What's Behind the ChatGPT History Change? How You Can Benefit + The 6 New Developments This Week video thumbnail

AI Explained April 26, 2023 video

What's Behind the ChatGPT History Change? How You Can Benefit + The 6 New Developments This Week

This AI Explained video reviews a major AI development through the lens of governance and responsible deployment. It is useful context for AI engineering, evaluation, governance, and operational risk.

Model Evaluation AI Compliance

Open notes Watch on YouTube

8 Signs It's The Future: Thought-to-Text, Nvidia Text-to-Video, Character AI, and P(Doom) @Ted video thumbnail

AI Explained April 20, 2023 video

8 Signs It's The Future: Thought-to-Text, Nvidia Text-to-Video, Character AI, and P(Doom) @Ted

This AI Explained video reviews a major AI development through the lens of agentic workflows and tool-use risk. It is useful context for AI engineering, evaluation, governance, and operational risk.

Model Evaluation Agent Security AI Red Teaming

Open notes Watch on YouTube

‘We Must Slow Down the Race’ – X AI, GPT 4 Can Now Do Science and Altman GPT 5 Statement video thumbnail

AI Explained April 16, 2023 video

‘We Must Slow Down the Race’ – X AI, GPT 4 Can Now Do Science and Altman GPT 5 Statement

This AI Explained video reviews a major AI development through the lens of agentic workflows and tool-use risk. It is useful context for AI engineering, evaluation, governance, and operational risk.

Model Evaluation Agent Security AI Red Teaming

Open notes Watch on YouTube

GPT 5 Will be Released 'Incrementally' - 5 Points from Brockman Statement [plus Timelines & Safety] video thumbnail

AI Explained April 13, 2023 video

GPT 5 Will be Released 'Incrementally' - 5 Points from Brockman Statement [plus Timelines & Safety]

This AI Explained video reviews a major AI development through the lens of AI safety and model behavior. It is useful context for AI engineering, evaluation, governance, and operational risk.

Model Evaluation AI Red Teaming

Open notes Watch on YouTube

Can GPT 4 Prompt Itself? MemoryGPT, AutoGPT, Jarvis, Claude-Next [10x GPT 4!] and more... video thumbnail

AI Explained April 9, 2023 video

Can GPT 4 Prompt Itself? MemoryGPT, AutoGPT, Jarvis, Claude-Next [10x GPT 4!] and more...

This AI Explained video reviews a major AI development through the lens of agentic workflows and tool-use risk. It is useful context for AI engineering, evaluation, governance, and operational risk.

Model Evaluation Agent Security Prompt Engineering

Open notes Watch on YouTube

Do We Get the $100 Trillion AI Windfall? Sam Altman's Plans, Jobs & the Falling Cost of Intelligence video thumbnail

AI Explained April 6, 2023 video

Do We Get the $100 Trillion AI Windfall? Sam Altman's Plans, Jobs & the Falling Cost of Intelligence

This AI Explained video reviews a major AI development through the lens of governance and responsible deployment. It is useful context for AI engineering, evaluation, governance, and operational risk.

Model Evaluation AI Compliance

Open notes Watch on YouTube

'Pause Giant AI Experiments' - Letter Breakdown w/ Research Papers, Altman, Sutskever and more video thumbnail

AI Explained March 29, 2023 video

'Pause Giant AI Experiments' - Letter Breakdown w/ Research Papers, Altman, Sutskever and more

This AI Explained video reviews a major AI development through the lens of AI safety and model behavior. It is useful context for AI engineering, evaluation, governance, and operational risk.

Model Evaluation AI Red Teaming

Open notes Watch on YouTube

How Well Can GPT-4 See? And the 5 Upgrades That Are Next video thumbnail

AI Explained March 26, 2023 video

How Well Can GPT-4 See? And the 5 Upgrades That Are Next

This AI Explained video reviews a major AI development through the lens of scaling and compute economics. It is useful context for AI engineering, evaluation, governance, and operational risk.

Model Evaluation

Open notes Watch on YouTube

'Sparks of AGI' - Bombshell GPT-4 Paper: Fully Read w/ 15 Revelations video thumbnail

AI Explained March 23, 2023 video

'Sparks of AGI' - Bombshell GPT-4 Paper: Fully Read w/ 15 Revelations

This AI Explained video reviews a major AI development through the lens of benchmarks and evaluation evidence. It is useful context for AI engineering, evaluation, governance, and operational risk.

Model Evaluation

Open notes Watch on YouTube

What's Up With Bard? 9 Examples + 6 Reasons Google Fell Behind [ft. Muse, Med-PaLM 2 and more] video thumbnail

AI Explained March 22, 2023 video

What's Up With Bard? 9 Examples + 6 Reasons Google Fell Behind [ft. Muse, Med-PaLM 2 and more]

This AI Explained video reviews a major AI development through the lens of AI safety and model behavior. It is useful context for AI engineering, evaluation, governance, and operational risk.

Model Evaluation AI Red Teaming

Open notes Watch on YouTube

Google Bard - The Full Review. Bard vs Bing [LaMDA vs GPT 4] video thumbnail

AI Explained March 21, 2023 video

Google Bard - The Full Review. Bard vs Bing [LaMDA vs GPT 4]

This AI Explained video reviews a major AI development through the lens of multimodal generation and provenance. It is useful context for AI engineering, evaluation, governance, and operational risk.

Model Evaluation Prompt Engineering

Open notes Watch on YouTube

GPT 4: 9 Revelations (not covered elsewhere) video thumbnail

AI Explained March 15, 2023 video

GPT 4: 9 Revelations (not covered elsewhere)

This AI Explained video reviews a major AI development through the lens of governance and responsible deployment. It is useful context for AI engineering, evaluation, governance, and operational risk.

Model Evaluation AI Compliance Adversarial ML

Open notes Watch on YouTube

GPT 4: Full Breakdown (14 Details You May Have Missed) video thumbnail

AI Explained March 14, 2023 video

GPT 4: Full Breakdown (14 Details You May Have Missed)

This AI Explained video reviews a major AI development through the lens of AI safety and model behavior. It is useful context for AI engineering, evaluation, governance, and operational risk.

Model Evaluation AI Red Teaming

Open notes Watch on YouTube

GPT 5 is All About Data video thumbnail

AI Explained March 5, 2023 video

GPT 5 is All About Data

This AI Explained video reviews a major AI development through the lens of benchmarks and evaluation evidence. It is useful context for AI engineering, evaluation, governance, and operational risk.

Model Evaluation AI Red Teaming Adversarial ML

Open notes Watch on YouTube

8 New Ways to Use Bing's Upgraded 8 [now 20] Message Limit (ft. pdfs, quizzes, tables, scenarios...) video thumbnail

AI Explained March 4, 2023 video

8 New Ways to Use Bing's Upgraded 8 [now 20] Message Limit (ft. pdfs, quizzes, tables, scenarios...)

This AI Explained video reviews a major AI development through the lens of model capability and AI systems in practice. It is useful context for AI engineering, evaluation, governance, and operational risk.

Model Evaluation Prompt Engineering AI Red Teaming

Open notes Watch on YouTube

9 of the Best Bing (GPT 4) Prompts video thumbnail

AI Explained February 23, 2023 video

9 of the Best Bing (GPT 4) Prompts

This AI Explained video reviews a major AI development through the lens of model capability and AI systems in practice. It is useful context for AI engineering, evaluation, governance, and operational risk.

Model Evaluation Prompt Engineering AI Red Teaming

Open notes Watch on YouTube

8 Ways ChatGPT 4 [Is] Better Than ChatGPT video thumbnail

AI Explained February 6, 2023 video

8 Ways ChatGPT 4 [Is] Better Than ChatGPT

This AI Explained video reviews a major AI development through the lens of benchmarks and evaluation evidence. It is useful context for AI engineering, evaluation, governance, and operational risk.

Model Evaluation Adversarial ML

Open notes Watch on YouTube

GPT 4 - hype vs reality video thumbnail

AI Explained January 20, 2023 video

GPT 4 - hype vs reality

This AI Explained video reviews a major AI development through the lens of scaling and compute economics. It is useful context for AI engineering, evaluation, governance, and operational risk.

Model Evaluation

Open notes Watch on YouTube