Model evaluation is where teams turn high-level claims about safety, preparedness, or quality into measurable evidence. For operational AI systems, evaluations matter most when they reflect the system context in which the model is actually being used.
Model Evaluation
Safety evaluations, system cards, preparedness, and security measurement for frontier models.
- Capability, misuse, and safety behavior under realistic tasks
- System cards, preparedness reporting, and evidence for launch decisions
- Regression testing so known failures do not quietly reappear
- Benchmarks that do not match the deployed workflow
- Safety claims without repeatable evidence
- No connection between findings, mitigations, and re-testing
- Teams building evaluation pipelines
- Leaders interpreting evidence for safe deployment
- Security and policy teams interpreting model documentation
Current notes, events, and source material
These items are included because they add useful evidence, framing, implementation detail, or upcoming context for teams working in this area.
NeurIPS 2026
NeurIPS 2026 is the fortieth annual Conference on Neural Information Processing Systems, with the primary dates listed for Sydney, Australia, and additional satellite locations in Atlanta and Paris.
OpenAI DevDay 2026
OpenAI DevDay 2026 is scheduled for September 29 in San Francisco and is OpenAI’s primary developer event for platform updates.
ICML 2026
ICML 2026 takes place at COEX in Seoul, South Korea, with tutorials, main conference sessions, and workshops covering core machine learning research.
Data + AI Summit 2026
Data + AI Summit 2026 is Databricks’ global data and AI conference in San Francisco and online, with 800+ sessions across data engineering, analytics, ML, governance, and agent applications.
Play video
Claude Fable Blocked - 11 Quiet Details on What’s Next
Claude Fable 5 banned, but what’s the bigger story. We go through 11 under-reported details, so you have the context to see what’s coming next for your use of AI. From whether the ban will last, what the possible motives are, what the model can actually do, and some wild over-extrapolations going on. Check out my fast-
Powering the next era of Confidential AI
We are thrilled to collaborate with Apple on its expanded Private Cloud Compute (PCC) systems announced this week at WWDC 2026.
Play video
The agent-ready web: Simplify user actions with WebMCP — Tara Agyemang, Google
Buying two concert tickets costs an AI agent the entire DOM, the accessibility tree, a screenshot, pixel coordinate math, and then a click that might miss because an ad just loaded and shifted the layout. Tara Agyemang from the Google Chrome team introduces WebMCP, a proposed web standard that replaces that process wit
Play video
Why Can't Anyone Answer Questions About the Business? — Garrett Galow, WorkOS
Every business question that needs SQL follows the same loop: explain the question, wait for an engineer, get an answer, realize it needs one more join, share a one-off in Slack, repeat. Garrett Galow from WorkOS built Studio to break that loop — an internal workspace where anyone can ask questions against Snowflake, L
Play video
How to Keep Shipping When You Walk Away from Your Desk — Zack Proser, WorkOS
Simon Willison fires up four parallel agents and is wiped out by 11am. That is the problem Zack Proser is solving: not that the tools are too slow but that human attention is still the hard constraint. His loop: voice brief at 184 words per minute, agent dispatched to an isolated git worktree, laptop closed, progress c
Investing in multi-agent AI safety research
Google DeepMind and partners announce a $10M funding call for multi-agent safety research.
Play video
Sovereign Escape Velocity: Ownership w Open Models — Gus Martins, & Ian Ballantyne, Google DeepMind
Gemma 4's 31B model sits fourth on the LM Arena open model leaderboard. The models around it are at least twice as large; some are 20 times larger. It runs on a single GPU. Competitors at comparable quality need four or five. Ian Ballantyne and Gus Martins walk through what that size efficiency unlocks: running on a Pi
Play video
Stop Making Models Bigger, Make Them Behave — Kobie Crawdord, Snorkel
Qwen 3 235B was asked for YouTube's year over year ad revenue growth from 2023 to 2024. It queried a table that didn't exist, tried again, got nothing back both times, and hallucinated an answer. The 4B model Snorkel finetuned with RL called `get_table_name` first, inspected the schema, ran a query, hit a column error,
Play video
Claude Fable 5 - Full 319 page Breakdown
Fable 5 is out - and it’s good, very good. But beyond the splashy demos, I want to bring you the 20+ nuggets from the 319 page system card, which I read in full, all day, plus benchmarks you may not have noticed. https://assemblyai.com/aiexplained Plus two worrying trends inside the ‘mind’ of Claude, how OpenAI counter
Play video
Self Driving Products: Product Signals to Pull Requests — Joshua Snyder, PostHog
A rage click, a 2am error spike, a customer Slack message — today each sits until a developer notices, triages, tickets, and writes a fix. PostHog is building a pipeline that collapses that chain: signal arrives, a background agent groups it with related errors and session replays, researches the codebase, and opens a
Detecting and containing AI-powered threats with Google Security Operations agents
Learn how Google Security Operations works in concert with AI Threat Defense to monitor, detect, and respond to threats, particularly from code you do not own or can not patch.
Play video
From Transcription to Live Music: Gemini's Audio Stack — Thor Schaeff, Google DeepMind
One API call to Gemini 3 Flash Preview: speaker labels by name, timestamps, emotion tags, language detection with English translation, and a full summary. That is the audio understanding layer that underlies everything else Thor Schaeff demos here, including speech generation directed by a "director's note" rather than
Play video
RAG is dead, right?? — Kuba Rogut, Turbopuffer
Cursor added semantic search and measured a 24% increase in answer accuracy on their composer model, a 2.6% gain in code retention in large codebases, and a 2.2% drop in dissatisfied user requests. Those numbers look small until you factor in that semantic search does not fire on every query. Meanwhile Google search vo
Play video
2026 AI Engineer Vibe Reel
W are getting ready for the World's Fair in San Francisco - Jun 29 to July 2! https://ai.engineer/wf - get tickets and see schedule!
Play video
GPU Cloud Deployment Without Leaving Your IDE — Audry Hsu, RunPod
The iteration cycle before Flash: commit, push, build a Docker image, pull it from the registry, load it onto a server, allocate a GPU, then find out if it works. Audrey Hsu demos what replacing that with a single decorator looks like — add `@flash.endpoint` to an async Python function and it deploys to GPU cloud from
Play video
Why More Context Makes Your Agent Dumber and What to Do About It — Nupur Sharma, Qodo
Give an agent your full codebase and it will attend to the start and the end, then quietly drop the middle. Nupur from Qodo calls this the U curve and builds the whole talk around it: why growing the context window did not fix the problem, and what actually does. She runs through iterative retrieval, hierarchical summa
Play video
Why Eval++ Is the Next Great Compute Primitive — Sunil Pai & Matt Carrie, Cloudflare
Matt Carrie and Sunil Pai from Cloudflare's agents team explain why Durable Objects turned out to be the right compute unit for AI agents: addressable, persistent, hibernating, stateful, and fast enough that 15ms London latency puts you inside a single animation frame. The Agents SDK builds on this to give resumable st
Play video
Road to 5 Million Tokens: Breaking Barriers in Long Context Training — Max Ryabinin, Together AI
Training a standard LLaMA 3B model with a 3 million token context on a single 8xH100 node fails before you even start: the model parameters alone exhaust GPU memory. Max Ryabinin from Together AI walks through the full stack of techniques needed to get there: fully sharded data parallelism, DeepSpeed Ulysses context pa
Play video
Under 5 minutes to a deployed LLM endpoint — Audry Hsu, RunPod
Two failed crypto mining rigs in a basement in 2022. The founders posted on Reddit offering the GPUs for free in exchange for feedback. That is the origin of RunPod, now at $120 million in annual recurring revenue with 500,000 developers on the platform. The demo runs in under five minutes: pick a model from the Hub, c
Play video
LLM Observability, Evaluation, Experimentation Platform — Dat Ngo, Arize
Your agent called tool B before tool A, and B has a dependency on A. You did not catch it because nothing in your code audits agents. The telemetry does. Dat from Arize AI walks through what observability actually means when the system you are debugging is nondeterministic and the execution path changes with every run.
Play video
From MCP to Scale: Pipelines That Build Themselves — Rafael Levi, Bright Data
Scraping is not the hard part anymore. Maintaining scrapers is. This session shows what it looks like when an agent uses MCP to inspect a site, understand its structure, generate a production scraper, and keep that pipeline working when the site changes. Using Bright Data's MCP, APIs, and browser infrastructure, the fl
Play video
Building safe Payment Infrastructure for the autonomous economy — Steve Kaliski, Stripe
Agents are evolving from calling free APIs to executing real transactions, creating a new challenge: how do we let software spend money autonomously without catastrophic risk? This talk presents Stripe's approach to solving the dual problems of secure credential transmission and making businesses discoverable to agents
Play video
Evals Are Broken, Use Them Anyway — Ara Khan, Cline
Cline started at 43% on Terminal Bench. The improvements came from container CPU and memory settings, raised timeouts, and prompt engineering techniques specific to Anthropic model families that do not transfer to Codex or Gemini. Not from switching to a better model. Ara Khan's argument is that benchmark numbers are n
Play video
Building Interactive UIs in VS Code with MCP Apps — Marlene Mhangami & Liam Hampton, GitHub
The demo profiles a Go app running bubble sort and Fibonacci and the result renders as an interactive flame graph directly inside the VS Code chat window. Not a link. Not a text summary. A live iframe you can scroll and query, sandboxed for the same reason you put a hamster in a cage: so it cannot chew up your VS Code
Play video
Building Agent Interfaces: Lessons from Chrome DevTools (MCP) for Agents — Michael Hablich, Google
Chrome DevTools MCP shipped with one tool: debug_webpage. Agents failed silently because they couldn't compose behaviors. The team decomposed it into 25 focused tools and assumed the problem was solved. It wasn't — now agents had 25 tools and no reliable way to pick the right one. Michael Hablich's talk is an honest ac
Play video
Beyond Transcription: Building Voice AI That Understands Conversations — Hervé Bredin, pyannoteAI
The open ASR leaderboard reports Nvidia Parakeet at 11.4% word error rate on AMI meeting data. Hervé Bredin runs the same model on the same dataset and gets 26%. Same model, same recordings, different microphone: the leaderboard uses headset audio, he uses the table mic. Most voice AI benchmarks are measuring single sp
Play video
Dark Factory: OpenClaw Ships Faster Than You Can Read the Diff — Vincent Koc, OpenClaw
OpenClaw hit 3,000 commits in a single day. Vincent Koc's commit history shows exactly when he goes to sleep and when he wakes up. He and Peter Steinberger ran roughly 60 to 70 agents between them during the great refactor: 2,700 commits, close to a million lines of code changed, 82% of the core codebase touched in one
Play video
AI Engineer Melbourne 2026 Keynote Livestream | Day 2
Live from Federation Square in Melbourne, AI Engineer Melbourne 2026 brings the keynote stage to viewers online in partnership with Web Directions. This is AI Engineer’s first partner event in Australia, featuring keynote-stage sessions from one of the most thoughtfully produced developer events in the region. Watch li
Play video
The Art & Science of Benchmarking Agents — Vincent Chen, Snorkel AI
ARC AGI 3 launched a few weeks before this talk with every task human solvable and frontier models under 1%. That gap is the argument: our ability to measure AI has fallen behind our ability to build it, and benchmarks that actually shape the field are bets on where capabilities are going, not snapshots of where they a
Play video
Text Diffusion — Brendon Dillon, Google DeepMind
GPT-4o answered 40. Gemini 2.5 Flash answered 42 and stuck to it even after working through the reasoning incorrectly. The Gemini Diffusion model, considerably smaller than both, answered 60 on the first forward pass, then 49, then corrected itself to 39 once it finished reasoning. Bidirectional attention means it can
Play video
SWE-rebench: Lessons from Evaluating Coding Agents — Ibragim Badertdinov, Nebius
Claude Code solved SWE rebench tasks by reading git history to find the solution patch. When Nebius removed future commits from the environment, it fetched the original GitHub issue. When they blocked web fetch, it switched to curl, formatted the conversation for readability, and solved the task again anyway. Ibragim B
Introducing new capabilities to GPT-Rosalind
GPT-Rosalind advances life sciences research with enhanced biological reasoning, medicinal chemistry expertise, genomics analysis, and experimental workflow capabilities.
Play video
BDD, ADR, PRD, WTF: Capturing Decisions for Humans and AI Alike — Michal Cichra, Safe Intelligence
"One thing harder than reading AI code is reading AI tests." Mikuel from Safe Intelligence argues spec driven development leaves a loop open: you have a markdown spec, but how do you know the product actually behaves that way? His answer is Cucumber, nearly forgotten and suddenly useful again. Executable, human readabl
Play video
Beyond Components: Designing Generative UI for MCP Apps — Ruben Casas, Postman
Ruben Casas from Postman prompted a model to rewrite his blog. It built a search box with a blur animation and accessibility out of the box, without being asked. That was when he concluded the model writes better frontend code than he does. His question for the talk: if the models are this capable, why are most agent U
Play video
AI Engineer Melbourne 2026 Keynote Livestream | Day 1
Live from Federation Square in Melbourne, AI Engineer Melbourne 2026 brings the keynote stage to viewers online in partnership with Web Directions. This is AI Engineer’s first partner event in Australia, featuring keynote-stage sessions from one of the most thoughtfully produced developer events in the region. Watch li
Play video
Benchmarking semantic code retrieval on Claude Code — Kuba Rogut, Turbopuffer
By default, Claude Code wastes one in every three file reads. Add windowed grep and that drops to one in five. Add semantic search on top and it drops to one in eight, with file precision climbing from 65% to 87%. Kuba Rogut from Turbopuffer ran a 50-task benchmark against ContextBench to measure not whether the agent
Play video
What Lies Beneath the API — Benjamin Cowen, Modal
Intercom is beating their frontier API at one tenth the cost. Pinterest claims orders of magnitude. Ben Cowen from Modal argues this pattern is not the exception for maturing AI products. It is the destination. Frontier labs want their models to win at everything. You want to win at your specific business logic. Those
Play video
How Lovable self-improves every hour — Benjamin Verbeek, Lovable
Within the first hour of launching the vent tool, the agent filed 20 complaints about a silent file copy failure. The team checked: the tool worked fine. What the agent had caught was that filenames with a space in them silently failed to copy, a bug that never surfaced in logs. Benjamin Verbeek from Lovable built it a
Cloud CISO Perspectives: How to build an AI-ready security program for the public sector
From industrial control systems to decades-old municipal databases, here’s our CISO guidance to prep AI-ready security programs for the public sector.
Play video
New Claude Opus 4.8: 15 Things You May’ve Missed
The ‘best’ generally available AI model just dropped, but there is plenty I bet you missed about what it is, how it performs, and what the release tells us. 15 highlights from the 244 page system card, plus private testing, leader interview and more. AI Insiders ($9!): https://www.patreon.com/AIExplained Chapters: 00:0
ACM CAIS 2026
ACM CAIS 2026 is a research-focused conference on compound AI architectures, optimization, deployment, and agentic AI systems in San Jose, California.
Project Glasswing: An initial update
Anthropic reports early Project Glasswing results using Mythos Preview with infrastructure partners and external testers, including large-scale vulnerability discovery and a cautious disclosure posture.
Gemini 3.5: frontier intelligence with action
Gemini 3.5 is built to help you execute complex, agentic workflows.
Anthropic Responsible Scaling Policy v3.2
Anthropic’s current Responsible Scaling Policy page lists v3.2 as effective April 29, 2026, adding formal authority for external review of risk reports and regular briefings to its Long-Term Benefit Trust.
Play video
Build & deploy AI-powered apps — Paige Bailey, Google DeepMind
Got a massive idea but stuck in the "just talking about it" phase? This session cuts the fluff and dives straight into how to build and prototype at lightning speed using AI Studio Build and Antigravity for free. It breaks down Google DeepMind's AI tech stack so viewers know exactly which tools to use, when to reach fo
Play video
Everything I Learned Training Frontier Small Models — Maxime Labonne, Liquid AI
A new class of small models is emerging with the ability to reliably follow instructions and call tools while running on-device under 1 GB of memory. In this talk, we'll break down how to post-train frontier small models using the LFM2.5 recipe: on-policy preference alignment, agentic reinforcement learning, and curric
Play video
One Login to Rule Them All: Cross-App Access for MCP — Garrett Galow, WorkOS
Connecting a coding agent to multiple services often means facing a dozen OAuth consent screens, a dozen token lifecycles, and a dozen chances for something to break. Despite having Single Sign-On, users still find themselves signing in repeatedly. This talk explores how Cross-App Access leverages a three-way trust bet
Play video
Why building eval platforms is hard — Phil Hetzel, Braintrust
An eval platform is not just a test runner. You are building shared definitions of "good," reliable data pipelines, labelling workflows, versioning, and trust in results across many teams and model changes. This session breaks down the hidden complexity, the common failure modes, and the design principles that make eva
Play video
Building your own software factory — Eric Zakariasson, Cursor
Most of us are pair-programming with one agent and stopping there. There's a lot more on the table. This workshop is about going from one agent to many. We'll start with codebase setup, the foundational work that makes agents effective on their own. Then we'll scale up to running agents in parallel, kicking off async w
Play video
Lessons from Scaling GitHub's Remote MCP Server — Sam Morrow, GitHub
GitHub operates one of the most heavily-utilised MCP servers in the ecosystem, with over 4 million downloads of the stdio server alone. Discover the architectural decisions, technical challenges and lessons learned while building and scaling a remote MCP server on production infrastructure. The session walks through th
Play video
Bringing MCPs to the Enterprise — Karan Sampath, Anthropic
MCPs are often flaky, face multiple security vulnerabilities, and are generally hard to scale. Most enterprises struggle to use more than single digit numbers of MCPs due to issues with security, observability, and access control. In this talk, we'll explore the approaches and learnings we at Anthropic have been taking
Play video
Open Models at Google DeepMind — Cassidy Hardin, Google DeepMind
Open models are getting smaller, faster, and far more capable. In this talk, Cassidy Hardin walks through the latest advances in the Gemma family, with a focus on Gemma 4 and what it enables for developers building on-device and open-weight AI systems. She covers the architecture behind Gemma’s dense, effective, and mi
Play video
Collaborative AI Engineering — Maggie Appleton, GitHub Next
Agentic engineering so far has been a solo story: one developer and a dozen agents moving at warp speed. But speed without thoughtful planning and team alignment is just wasting tokens. When everyone on a team is directing agents alone in their personal CLI tools with no shared context, you get duplicate work, conflict
Play video
MCP = Mega Context Problem - Matt Carey
The best MCP server is the one you didn't have to build. At Cloudflare we have a lot of products. Our REST OpenAPI spec is over 2.3 million tokens. When teams started building MCP servers, they did what everyone does: cherry-picked important endpoints for their product, wrote some tool definitions and shipped a separat
Play video
Full Walkthrough: Workflow for AI Coding from Planning to Production — Matt Pocock (@mattpocockuk )
A hands-on workshop covering the full lifecycle of AI-assisted development, from turning ambiguous requirements into agent-ready plans to running autonomous coding agents that ship production features. You'll learn to stress-test vague briefs into structured PRDs, slice work into thin "tracer bullet" vertical slices, a
Play video
What Do Models Still Suck At? - Peter Gostev, Arena.ai, BullshitBench
AI Engineer session on What Do Models Still Suck At? - Peter Gostev, Arena.ai, BullshitBench. It adds practical context for how teams are building and operating AI systems in production.
Play video
GPT 5.5 Arrives, DeepSeek V4 Drops, and the Compute War Intensifies
GPT 5.5 full analysis, plus DeepSeek V4 paper highlights, comparisons with Mythos, a vibe-coded game w/ GPT Image 2, and 50 data-points you wouldn’t get from just reading the headlines. https://80000hours.org/aiexplained Check out my fast-growing (!) app, free to use, and code INSIDER15 for paid tiers: https://lmcounci
Play video
Building Generative Image & Video models at Scale - Sander Dieleman (Veo and Nano Banana)
AI Engineer session on Building Generative Image & Video models at Scale - Sander Dieleman (Veo and Nano Banana). It adds practical context for how teams are building and operating AI systems in production.
Play video
AIE Miami Day 2 ft. Cerebras, OpenCode, Cursor, Arize AI, and more!
April 21, 2026 - all times in EST -- 9:00am - Welcome to Day 2 -- 9:10am - David House, G2i Transforming Programming Mindsets: Case Studies in Agentic Coding Adoption -- 9:35am - Sarah Chieng, Cerebras Help! We're DEEP in (latency) Debt -- 10:00am - Lech Kalinowski, CallStack Ambient Generative AI: Deploying Latent Dif
Play video
Gemma, DeepMind's Family of Open Models — Omar Sanseviero, Google DeepMind
AI Engineer session on Gemma, DeepMind's Family of Open Models, presented by Omar Sanseviero, Google DeepMind. It adds practical context for how teams are building and operating AI systems in production.
Play video
Running LLMs on your iPhone: 40 tok/s Gemma 4 with MLX — Adrien Grondin, Locally AI
AI Engineer session on Running LLMs on your iPhone: 40 tok/s Gemma 4 with MLX, presented by Adrien Grondin, Locally AI. It adds practical context for how teams are building and operating AI systems in production.
Play video
AIE Miami Keynote & Talks ft. OpenCode. Google Deepmind, OpenAI, and more!
April 20, 2026 - all times in EST -- 9:00am - Welcome to AI Engineer Miami -- 9:10am - Gabe Greenberg, G2i Opening Remarks -- 9:15am - Dax Raad, OpenCode Keynote -- 9:40am - Dexter Horthy, HumanLayer Everything We got Wrong About RPI -- 10:05am - Max Stoiber, OpenAI Coming Soon -- 10:30am - Morning Break -- 11:00am - B
Play video
Frontier AI and the Future of Intelligence — Raia Hadsell, VP of Research, Google DeepMind
AI Engineer session on Frontier AI and the Future of Intelligence, presented by Raia Hadsell, VP of Research, Google DeepMind. It adds practical context for how teams are building and operating AI systems in production.
Play video
Claude Opus 4.7 - A New Frontier, in Performance … and Drama
This AI Explained video reviews a major AI development through the lens of benchmarks and evaluation evidence. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
Building pi in a World of Slop — Mario Zechner
AI Engineer session on Building pi in a World of Slop, presented by Mario Zechner. It adds practical context for how teams are building and operating AI systems in production.
Play video
$1 AI Guardrails: The Unreasonable Effectiveness of Finetuned ModernBERTs — Diego Carpentero
AI Engineer session on $1 AI Guardrails: The Unreasonable Effectiveness of Finetuned ModernBERTs, presented by Diego Carpentero. It adds practical context for how teams are building and operating AI systems in production.
Play video
Let LLMs Wander: Engineering RL Environments — Stefano Fiorucci
AI Engineer session on Let LLMs Wander: Engineering RL Environments, presented by Stefano Fiorucci. It adds practical context for how teams are building and operating AI systems in production.
Play video
One Registry to Rule them All - Sonny Merla, Mauro Luchetti, & Mattia Redaelli, Quantyca
AI Engineer session on One Registry to Rule them All - Sonny Merla, Mauro Luchetti, & Mattia Redaelli, Quantyca. It adds practical context for how teams are building and operating AI systems in production.
Play video
Claude Mythos: Highlights from 244-page Release
This AI Explained video reviews a major AI development through the lens of agentic workflows and tool-use risk. It is useful context for AI engineering, evaluation, governance, and operational risk.
Assessing Claude Mythos Preview’s cybersecurity capabilities
Claude Mythos Preview is a new general-purpose language model that is strikingly capable at computer security tasks. This post provides technical details for researchers and practitioners who want to understand exactly how we have been testing this model, and what we have found over the past month. We hope this will sh
NIST AI RMF and Critical Infrastructure Profile
NIST’s AI RMF hub now highlights its April 2026 concept note for a Trustworthy AI in Critical Infrastructure profile, extending the framework toward sector-specific operational risk management.
Play video
Two AI Models Set to “stir government urgency”, But Will This Challenge Undo Them?
This AI Explained video reviews a major AI development through the lens of governance and responsible deployment. It is useful context for AI engineering, evaluation, governance, and operational risk.
Reverse engineering Claude's CVE-2026-2796 exploit
This post dives deep into how Claude wrote an exploit for one of the vulnerabilities it found in Firefox.
Play video
What the New ChatGPT 5.4 Means for the World
This AI Explained video reviews a major AI development through the lens of governance and responsible deployment. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
Deadline Day for Autonomous AI Weapons & Mass Surveillance
This AI Explained video reviews a major AI development through the lens of governance and responsible deployment. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
Gemini 3.1 Pro and the Downfall of Benchmarks: Welcome to the Vibe Era of AI
This AI Explained video reviews a major AI development through the lens of agentic workflows and tool-use risk. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
The Two Best AI Models/Enemies Just Got Released Simultaneously
This AI Explained video reviews a major AI development through the lens of agentic workflows and tool-use risk. It is useful context for AI engineering, evaluation, governance, and operational risk.
LLM-discovered 0-days
AI models can now find high-severity vulnerabilities at scale. This is a moment to empower defenders. We're now using Claude to find and help fix vulnerabilities in open source software.
Play video
Claude AI Co-founder Publishes 4 Big Claims about Near Future: Breakdown
This AI Explained video reviews a major AI development through the lens of governance and responsible deployment. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
Identity for AI Agents - Patrick Riley & Carlos Galan, Auth0
AI Engineer session on Identity for AI Agents - Patrick Riley & Carlos Galan, Auth0. It adds practical context for how teams are building and operating AI systems in production.
Play video
Welcome to AIE CODE - Jed Borovik, Google DeepMind
AI Engineer session on Welcome to AIE CODE - Jed Borovik, Google DeepMind. It adds practical context for how teams are building and operating AI systems in production.
AI Models on Realistic Cyber Ranges
In a recent evaluation of AI models’ cyber capabilities, current Claude models can now succeed at multistage attacks on networks with dozens of hosts using only standard, open-source tools, instead of the custom tools needed by previous generations.
Finding Bugs with Claude and Property-based Testing
Ensuring that programs are bug-free is one of the most challenging aspects of software engineering. We developed an agent that can efficiently identify bugs in large software projects. Our agent infers general properties of code that should be true, and then applies property-based testing. After extensive manual valida
Play video
Anthropic: Our AI just created a tool that can ‘automate all white collar work’, Me:
This AI Explained video reviews a major AI development through the lens of governance and responsible deployment. It is useful context for AI engineering, evaluation, governance, and operational risk.
Experimenting with AI to Defend Critical Infrastructure
AI could help defenders of critical infrastructure identify the vulnerabilities that attackers might exploit—and close them before they are exploited. Anthropic has partnered with Pacific Northwest National Laboratory (PNNL) to explore this defensive application of AI, demonstrating both the potential of AI-accelerated
Play video
Defying Gravity - Kevin Hou, Google DeepMind
AI Engineer session on Defying Gravity - Kevin Hou, Google DeepMind. It adds practical context for how teams are building and operating AI systems in production.
Play video
RL Environments at Scale — Will Brown, Prime Intellect
AI Engineer session on RL Environments at Scale, presented by Will Brown, Prime Intellect. It adds practical context for how teams are building and operating AI systems in production.
Play video
Building in the Gemini Era — Kat Kampf & Ammaar Reshi, Google DeepMind
AI Engineer session on Building in the Gemini Era, presented by Kat Kampf & Ammaar Reshi, Google DeepMind. It adds practical context for how teams are building and operating AI systems in production.
Play video
Developing Taste in Coding Agents: Applied Meta Neuro-Symbolic RL — Ahmad Awais, CommandCode
AI Engineer session on Developing Taste in Coding Agents: Applied Meta Neuro-Symbolic RL, presented by Ahmad Awais, CommandCode. It adds practical context for how teams are building and operating AI systems in production.
Play video
Minimax M2: Building the #1 Open Model — Olive Song, MiniMax
AI Engineer session on Minimax M2: Building the #1 Open Model, presented by Olive Song, MiniMax. It adds practical context for how teams are building and operating AI systems in production.
Play video
Code World Model: Building World Models for Computation — Jacob Kahn, FAIR Meta
AI Engineer session on Code World Model: Building World Models for Computation, presented by Jacob Kahn, FAIR Meta. It adds practical context for how teams are building and operating AI systems in production.
Play video
What the Freakiness of 2025 in AI Tells Us About 2026
This AI Explained video reviews a major AI development through the lens of governance and responsible deployment. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
Gemini Exponential, Demis Hassabis' ‘Proto-AGI’ coming, but …
This AI Explained video reviews a major AI development through the lens of agentic workflows and tool-use risk. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
GPT 5.2: OpenAI Strikes Back
This AI Explained video reviews a major AI development through the lens of benchmarks and evaluation evidence. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
You Are Being Told Contradictory Things About AI
This AI Explained video reviews a major AI development through the lens of governance and responsible deployment. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
Nano Banana Pro: But Did You Catch These 10 Details?
This AI Explained video reviews a major AI development through the lens of benchmarks and evaluation evidence. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
Gemini 3 Pro: Breakdown
This AI Explained video reviews a major AI development through the lens of benchmarks and evaluation evidence. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
Is GPT-5.1 Really an Upgrade? But Models Can Auto-Hack Govts, so … there’s that
This AI Explained video reviews a major AI development through the lens of agentic workflows and tool-use risk. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
Bubble or No Bubble, AI Keeps Progressing (ft. Relentless Learning + Introspection)
This AI Explained video reviews a major AI development through the lens of multimodal generation and provenance. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
Did you miss these 2 AI stories? A *Real* LLM-crafted Breakthrough + Continual Learning Blocked?
This AI Explained video reviews a major AI development through the lens of agentic workflows and tool-use risk. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
Sora 2 - It will only get more realistic from here
This AI Explained video reviews a major AI development through the lens of governance and responsible deployment. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
OpenAI Tests if GPT-5 Can Automate Your Job - 4 Unexpected Findings
This AI Explained video reviews a major AI development through the lens of AI safety and model behavior. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
ChatGPT Can Now Call the Cops, but 'Wait till 2100 for Full Job Impact' - Altman
This AI Explained video reviews a major AI development through the lens of governance and responsible deployment. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
An ‘AI Bubble’? What Altman Actually said, the Facts and Nano Banana
This AI Explained video reviews a major AI development through the lens of governance and responsible deployment. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
The Rise of Open Models in the Enterprise — Amir Haghighat, Baseten
AI Engineer session on The Rise of Open Models in the Enterprise, presented by Amir Haghighat, Baseten. It adds practical context for how teams are building and operating AI systems in production.
Play video
Real World Development with GitHub Copilot and VS Code — Harald Kirschner, Christopher Harrison
AI Engineer session on Real World Development with GitHub Copilot and VS Code, presented by Harald Kirschner, Christopher Harrison. It adds practical context for how teams are building and operating AI systems in production.
Play video
What Is a Humanoid Foundation Model? An Introduction to GR00T N1 - Annika & Aastha
AI Engineer session on What Is a Humanoid Foundation Model? An Introduction to GR00T N1 - Annika & Aastha. It adds practical context for how teams are building and operating AI systems in production.
Play video
Real-time Experiments with an AI Co-Scientist - Stefania Druga, fmr. Google Deepmind
AI Engineer session on Real-time Experiments with an AI Co-Scientist - Stefania Druga, fmr. Google Deepmind. It adds practical context for how teams are building and operating AI systems in production.
Play video
GPT-5 has Arrived
This AI Explained video reviews a major AI development through the lens of benchmarks and evaluation evidence. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
Genie 3: The World Becomes Playable (DeepMind)
This AI Explained video reviews a major AI development through the lens of agentic workflows and tool-use risk. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
Measuring AGI: Interactive Reasoning Benchmarks for ARC-AGI-3 — Greg Kamradt, ARC Prize Foundation
AI Engineer session on Measuring AGI: Interactive Reasoning Benchmarks for ARC-AGI-3, presented by Greg Kamradt, ARC Prize Foundation. It adds practical context for how teams are building and operating AI systems in production.
Play video
How to build world-class AI products — Sarah Sachs (AI lead @ Notion) & Carlos Esteban (Braintrust)
AI Engineer session on How to build world-class AI products, presented by Sarah Sachs (AI lead @ Notion) & Carlos Esteban (Braintrust). It adds practical context for how teams are building and operating AI systems in production.
Play video
Thinking Deeper in Gemini — Jack Rae, Google DeepMind
AI Engineer session on Thinking Deeper in Gemini, presented by Jack Rae, Google DeepMind. It adds practical context for how teams are building and operating AI systems in production.
Play video
Netflix's Big Bet: One model to rule recommendations: Yesu Feng, Netflix
AI Engineer session on Netflix's Big Bet: One model to rule recommendations: Yesu Feng, Netflix. It adds practical context for how teams are building and operating AI systems in production.
Play video
The Bitter Layout or: How I Learned to Love the Model Picker — Maximillian Piras, Yutori
AI Engineer session on The Bitter Layout or: How I Learned to Love the Model Picker, presented by Maximillian Piras, Yutori. It adds practical context for how teams are building and operating AI systems in production.
Play video
How fast are LLM inference engines anyway? — Charles Frye, Modal
AI Engineer session on How fast are LLM inference engines anyway?, presented by Charles Frye, Modal. It adds practical context for how teams are building and operating AI systems in production.
Play video
RL for Autonomous Coding — Aakanksha Chowdhery, Reflection.ai
AI Engineer session on RL for Autonomous Coding, presented by Aakanksha Chowdhery, Reflection.ai. It adds practical context for how teams are building and operating AI systems in production.
Play video
Real world MCPs in GitHub Copilot Agent Mode — Jon Peck, Microsoft
AI Engineer session on Real world MCPs in GitHub Copilot Agent Mode, presented by Jon Peck, Microsoft. It adds practical context for how teams are building and operating AI systems in production.
Play video
Benchmarks Are Memes: How What We Measure Shapes AI — and Us - Alex Duffy, Every.to
AI Engineer session on Benchmarks Are Memes: How What We Measure Shapes AI, presented by and Us - Alex Duffy, Every.to. It adds practical context for how teams are building and operating AI systems in production.
Play video
Vector Search Benchmark[eting] - Philipp Krenn, Elastic
AI Engineer session on Vector Search Benchmark[eting] - Philipp Krenn, Elastic. It adds practical context for how teams are building and operating AI systems in production.
Play video
How to Train Your Agent: Building Reliable Agents with RL — Kyle Corbitt, OpenPipe
AI Engineer session on How to Train Your Agent: Building Reliable Agents with RL, presented by Kyle Corbitt, OpenPipe. It adds practical context for how teams are building and operating AI systems in production.
Play video
Using OSS models to build AI apps with millions of users — Hassan El Mghari
AI Engineer session on Using OSS models to build AI apps with millions of users, presented by Hassan El Mghari. It adds practical context for how teams are building and operating AI systems in production.
Play video
Optimizing inference for voice models in production - Philip Kiely, Baseten
AI Engineer session on Optimizing inference for voice models in production - Philip Kiely, Baseten. It adds practical context for how teams are building and operating AI systems in production.
Play video
OpenThoughts: Data Recipes for Reasoning Models — Ryan Marten, Bespoke Labs
AI Engineer session on OpenThoughts: Data Recipes for Reasoning Models, presented by Ryan Marten, Bespoke Labs. It adds practical context for how teams are building and operating AI systems in production.
Play video
A year of Gemini progress + what comes next — Logan Kilpatrick, Google DeepMind
AI Engineer session on A year of Gemini progress + what comes next, presented by Logan Kilpatrick, Google DeepMind. It adds practical context for how teams are building and operating AI systems in production.
Play video
What every AI engineer needs to know about GPUs — Charles Frye, Modal
AI Engineer session on What every AI engineer needs to know about GPUs, presented by Charles Frye, Modal. It adds practical context for how teams are building and operating AI systems in production.
Play video
AI Engineering with the Google Gemini 2.5 Model Family - Philipp Schmid, Google DeepMind
AI Engineer session on AI Engineering with the Google Gemini 2.5 Model Family - Philipp Schmid, Google DeepMind. It adds practical context for how teams are building and operating AI systems in production.
Play video
How Not to Read a Headline on AI (ft. new Olympiad Gold, GPT-5 …)
This AI Explained video reviews a major AI development through the lens of governance and responsible deployment. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
Grok 4 - 10 New Things to Know
This AI Explained video reviews a major AI development through the lens of benchmarks and evaluation evidence. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
The Benchmarks Game: Why It's Rigged and How You Can (Really) Win - Darius Emrani
AI Engineer session on The Benchmarks Game: Why It's Rigged and How You Can (Really) Win - Darius Emrani. It adds practical context for how teams are building and operating AI systems in production.
Play video
The Demo I Wish I'd Had: OpenAI's Agents SDK... serverless! - Brook Riggio
AI Engineer session on The Demo I Wish I'd Had: OpenAI's Agents SDK... serverless! - Brook Riggio. It adds practical context for how teams are building and operating AI systems in production.
Play video
The Future of Qwen: A Generalist Agent Model — Junyang Lin, Alibaba Qwen
AI Engineer session on The Future of Qwen: A Generalist Agent Model, presented by Junyang Lin, Alibaba Qwen. It adds practical context for how teams are building and operating AI systems in production.
Play video
Analyzing 10,000 Sales Calls With AI In 2 Weeks — Charlie Guo
AI Engineer session on Analyzing 10,000 Sales Calls With AI In 2 Weeks, presented by Charlie Guo. It adds practical context for how teams are building and operating AI systems in production.
Play video
When Will AI Models Blackmail You, and Why?
This AI Explained video reviews a major AI development through the lens of agentic workflows and tool-use risk. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
Veo 3 for Developers — Paige Bailey, Google DeepMind
AI Engineer session on Veo 3 for Developers, presented by Paige Bailey, Google DeepMind. It adds practical context for how teams are building and operating AI systems in production.
Play video
MCP: Origins and Requests For Startups — Theodora Chu, Model Context Protocol PM, Anthropic
AI Engineer session on MCP: Origins and Requests For Startups, presented by Theodora Chu, Model Context Protocol PM, Anthropic. It adds practical context for how teams are building and operating AI systems in production.
Play video
ChatGPT is poorly designed. So I fixed it
AI Engineer session on ChatGPT is poorly designed. So I fixed it. It adds practical context for how teams are building and operating AI systems in production.
Play video
The Voice-First AI Overlay: Designing Conversational Co-Pilots - Gregory Bruss
AI Engineer session on The Voice-First AI Overlay: Designing Conversational Co-Pilots - Gregory Bruss. It adds practical context for how teams are building and operating AI systems in production.
Play video
Apple’s ‘AI Can’t Reason’ Claim Seen By 13M+, What You Need to Know
This AI Explained video reviews a major AI development through the lens of governance and responsible deployment. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
AI Accelerates: New Gemini Model + AI Unemployment Stories Analysed
This AI Explained video reviews a major AI development through the lens of governance and responsible deployment. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
Claude 4: Full 120 Page Breakdown … Is it the Best New Model?
This AI Explained video reviews a major AI development through the lens of benchmarks and evaluation evidence. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
Google Takes No Prisoners Amid Torrent of AI Announcements
This AI Explained video reviews a major AI development through the lens of benchmarks and evaluation evidence. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
AI Improves at Self-improving
This AI Explained video reviews a major AI development through the lens of agentic workflows and tool-use risk. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
"OpenAI is Not God” - The DeepSeek Documentary on Liang Wenfeng, R1 and What's Next
This AI Explained video reviews a major AI development through the lens of benchmarks and evaluation evidence. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
o3 breaks (some) records, but AI becomes pay-to-win
This AI Explained video reviews a major AI development through the lens of agentic workflows and tool-use risk. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
RAG and the MongoDB Document Model: Ben Flast
AI Engineer session on RAG and the MongoDB Document Model: Ben Flast. It adds practical context for how teams are building and operating AI systems in production.
Play video
Decoding Mistral AI's Large Language Models: Devendra Chaplot
AI Engineer session on Decoding Mistral AI's Large Language Models: Devendra Chaplot. It adds practical context for how teams are building and operating AI systems in production.
Play video
Frontier Feud: Anthropic, Google DeepMind, Meta FAIR, Thinking Machines — Barr Yaron, Amplify
AI Engineer session on Frontier Feud: Anthropic, Google DeepMind, Meta FAIR, Thinking Machines, presented by Barr Yaron, Amplify. It adds practical context for how teams are building and operating AI systems in production.
Play video
Making Open Models 10x faster and better for Modern Application Innovation: Dmytro (Dima) Dzhulgakov
AI Engineer session on Making Open Models 10x faster and better for Modern Application Innovation: Dmytro (Dima) Dzhulgakov. It adds practical context for how teams are building and operating AI systems in production.
Play video
Stateful Agents — Full Workshop with Charles Packer of Letta and MemGPT
AI Engineer session on Stateful Agents, presented by Full Workshop with Charles Packer of Letta and MemGPT. It adds practical context for how teams are building and operating AI systems in production.
Play video
The Model Isn’t Wrong — You’re Just Bad at Prompting
AI Engineer session on The Model Isn’t Wrong, presented by You’re Just Bad at Prompting. It adds practical context for how teams are building and operating AI systems in production.
Play video
From model weights to API endpoint with TensorRT LLM: Philip Kiely and Pankaj Gupta
AI Engineer session on From model weights to API endpoint with TensorRT LLM: Philip Kiely and Pankaj Gupta. It adds practical context for how teams are building and operating AI systems in production.
Play video
Moondream: how does a tiny vision model slap so hard? — Vikhyat Korrapati
AI Engineer session on Moondream: how does a tiny vision model slap so hard?, presented by Vikhyat Korrapati. It adds practical context for how teams are building and operating AI systems in production.
Play video
Fixing bugs in Gemma, Llama, & Phi 3: Daniel Han
AI Engineer session on Fixing bugs in Gemma, Llama, & Phi 3: Daniel Han. It adds practical context for how teams are building and operating AI systems in production.
Play video
State Space Models for Realtime Multimodal Intelligence: Karan Goel
AI Engineer session on State Space Models for Realtime Multimodal Intelligence: Karan Goel. It adds practical context for how teams are building and operating AI systems in production.
Play video
Unveiling the latest Gemma model advancements: Kathleen Kenealy
AI Engineer session on Unveiling the latest Gemma model advancements: Kathleen Kenealy. It adds practical context for how teams are building and operating AI systems in production.
Play video
Multi model multimodal and multi agent innovations in Azure AI: Cedric Vidal
AI Engineer session on Multi model multimodal and multi agent innovations in Azure AI: Cedric Vidal. It adds practical context for how teams are building and operating AI systems in production.
Play video
How to build the world's fastest voice bot: Kwindla Hultman Kramer
AI Engineer session on How to build the world's fastest voice bot: Kwindla Hultman Kramer. It adds practical context for how teams are building and operating AI systems in production.
Play video
How Deep Research Works - Mukund Sridhar & Aarush Selvan, Google DeepMind
AI Engineer session on How Deep Research Works - Mukund Sridhar & Aarush Selvan, Google DeepMind. It adds practical context for how teams are building and operating AI systems in production.
Play video
Customized, production ready inference with open source models: Dmytro (Dima) Dzhulgakov
AI Engineer session on Customized, production ready inference with open source models: Dmytro (Dima) Dzhulgakov. It adds practical context for how teams are building and operating AI systems in production.
Play video
Best Practices for Evaluating Large Language Model Applications with llmeval: Niklas Nielsen
AI Engineer session on Best Practices for Evaluating Large Language Model Applications with llmeval: Niklas Nielsen. It adds practical context for how teams are building and operating AI systems in production.
Play video
System Design for Next-Gen Frontier Models — Dylan Patel, SemiAnalysis
AI Engineer session on System Design for Next-Gen Frontier Models, presented by Dylan Patel, SemiAnalysis. It adds practical context for how teams are building and operating AI systems in production.
Play video
Accelerate your AI journey with Azure AI model catalog: Sharmila Chokalingam
AI Engineer session on Accelerate your AI journey with Azure AI model catalog: Sharmila Chokalingam. It adds practical context for how teams are building and operating AI systems in production.
Play video
Building Agents with Model Context Protocol - Full Workshop with Mahesh Murag of Anthropic
AI Engineer session on Building Agents with Model Context Protocol - Full Workshop with Mahesh Murag of Anthropic. It adds practical context for how teams are building and operating AI systems in production.
Play video
Lessons from the Trenches: Building LLM Evals That Work IRL: Aparna Dhinkaran
AI Engineer session on Lessons from the Trenches: Building LLM Evals That Work IRL: Aparna Dhinkaran. It adds practical context for how teams are building and operating AI systems in production.
Play video
Evaluating Domain Specific LLMs for Real World Finance — Waseem Alshikh, Writer
AI Engineer session on Evaluating Domain Specific LLMs for Real World Finance, presented by Waseem Alshikh, Writer. It adds practical context for how teams are building and operating AI systems in production.
Play video
How to evaluate a model for your use case: Emmanuel Turlay
AI Engineer session on How to evaluate a model for your use case: Emmanuel Turlay. It adds practical context for how teams are building and operating AI systems in production.
Play video
[Full Workshop from Microsoft] Github Copilot - The World's Most Widely Adopted AI Developer Tool
AI Engineer session on [Full Workshop from Microsoft] Github Copilot - The World's Most Widely Adopted AI Developer Tool. It adds practical context for how teams are building and operating AI systems in production.
Play video
GitHub Copilot: The World's Most Widely Adopted AI Developer Tool
AI Engineer session on GitHub Copilot: The World's Most Widely Adopted AI Developer Tool. It adds practical context for how teams are building and operating AI systems in production.
Play video
Productionizing GenAI Models — Lessons from the world's best AI teams: Lukas Biewald
AI Engineer session on Productionizing GenAI Models, presented by Lessons from the world's best AI teams: Lukas Biewald. It adds practical context for how teams are building and operating AI systems in production.
Play video
WTF do people use Open Models for??
AI Engineer session on WTF do people use Open Models for??. It adds practical context for how teams are building and operating AI systems in production.
Play video
Fine tune 20 Llama Models in 5 Minutes: Santosh Radha
AI Engineer session on Fine tune 20 Llama Models in 5 Minutes: Santosh Radha. It adds practical context for how teams are building and operating AI systems in production.
Play video
o3 and o4-mini - they’re great, but easy to over-hype
This AI Explained video reviews a major AI development through the lens of benchmarks and evaluation evidence. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
‘Speaking Dolphin’ to AI Data Dominance, 4.1 + Kling 2.0: 7 Updates Critically Analysed
This AI Explained video reviews a major AI development through the lens of benchmarks and evaluation evidence. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
AI CEO: ‘Stock Crash Could Stop AI Progress’, Llama 4 Anti-climax + ‘Superintelligence in 2027’ ...
This AI Explained video reviews a major AI development through the lens of benchmarks and evaluation evidence. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
Gemini 2.5 Pro - It’s a Darn Smart Chatbot … (New Simple High Score)
This AI Explained video reviews a major AI development through the lens of benchmarks and evaluation evidence. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
OpenAI’s New ImageGen is Unexpectedly Epic … (ft. Reve, Imagen 3, Midjourney etc)
This AI Explained video reviews a major AI development through the lens of multimodal generation and provenance. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
Did AI Just Get Commoditized? Gemini 2.5, New DeepSeek V3, & Microsoft vs OpenAI
This AI Explained video reviews a major AI development through the lens of governance and responsible deployment. It is useful context for AI engineering, evaluation, governance, and operational risk.
Adversarial Machine Learning: A Taxonomy and Terminology of Attacks and Mitigations
NIST finalizes AI 100-2e2025, providing a terminology and taxonomy for adversarial machine learning across predictive and generative AI systems.
Progress from our Frontier Red Team
Anthropic shares lessons from frontier red teaming and discusses where models are showing early-warning signs of higher-risk cyber and biology capabilities.
Play video
Manus AI - The Calm Before the Hypestorm … (vs Deep Research + Grok 3)
This AI Explained video reviews a major AI development through the lens of agentic workflows and tool-use risk. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
GPT 4.5 - not so much wow
This AI Explained video reviews a major AI development through the lens of benchmarks and evaluation evidence. It is useful context for AI engineering, evaluation, governance, and operational risk.
Deep research System Card
OpenAI’s system card for deep research covers prompt injection, privacy, code execution, and external red teaming prior to release.
Play video
Claude 3.7 is More Significant than its Name Implies (ft DeepSeek R2 + GPT 4.5 coming soon)
This AI Explained video reviews a major AI development through the lens of governance and responsible deployment. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
AGI: (gets close), Humans: ‘Who Gets to Own it?’
This AI Explained video reviews a major AI development through the lens of governance and responsible deployment. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
Deep Research by OpenAI - The Ups and Downs vs DeepSeek R1 Search + Gemini Deep Research
This AI Explained video reviews a major AI development through the lens of benchmarks and evaluation evidence. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
o3-mini and the “AI War”
This AI Explained video reviews a major AI development through the lens of benchmarks and evaluation evidence. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
Nothing Much Happens in AI, Then Everything Does All At Once
This AI Explained video reviews a major AI development through the lens of governance and responsible deployment. It is useful context for AI engineering, evaluation, governance, and operational risk.
Operator System Card
The Operator system card documents red teaming and mitigation choices for a computer-using agent, with prompt injections listed as a central risk area.
Play video
Altman Expects a ‘Fast Take-off’, ‘Super-Agent’ Debuting Soon and DeepSeek R1 Out
This AI Explained video reviews a major AI development through the lens of governance and responsible deployment. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
OpenAI Backtracks, Gunning for Superintelligence: Altman Brings His AGI Timeline Closer - '25 to '29
This AI Explained video reviews a major AI development through the lens of agentic workflows and tool-use risk. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
o3 - wow
This AI Explained video reviews a major AI development through the lens of benchmarks and evaluation evidence. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
Never Browse Alone? Gemini 2 Live and ChatGPT Vision
This AI Explained video reviews a major AI development through the lens of agentic workflows and tool-use risk. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
Sora is Out, But is it a Distraction?
This AI Explained video reviews a major AI development through the lens of governance and responsible deployment. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
o1 Pro Mode – ChatGPT Pro Full Analysis (plus o1 paper highlights)
This AI Explained video reviews a major AI development through the lens of benchmarks and evaluation evidence. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
AI Breaks Its Silence: OpenAI’s ‘Next 12 Days’, Genie 2, and a Word of Caution
This AI Explained video reviews a major AI development through the lens of agentic workflows and tool-use risk. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
New Google Model Ranked ‘No. 1 LLM’, But There’s a Problem
This AI Explained video reviews a major AI development through the lens of agentic workflows and tool-use risk. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
Leak: ‘GPT-5 exhibits diminishing returns’, Sam Altman: ‘lol’
This AI Explained video reviews a major AI development through the lens of benchmarks and evaluation evidence. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
ChatGPT with Search, Altman AMA
This AI Explained video reviews a major AI development through the lens of benchmarks and evaluation evidence. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
The New Claude 3.5 Sonnet: Better, Yes, But Not Just in the Way You Might Think
This AI Explained video reviews a major AI development through the lens of agentic workflows and tool-use risk. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
AI - 2024AD: 212-page Report (from this morning) Fully Read w/ Highlights
This AI Explained video reviews a major AI development through the lens of governance and responsible deployment. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
OpenAI: ‘We Just Reached Human-level Reasoning’.
This AI Explained video reviews a major AI development through the lens of agentic workflows and tool-use risk. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
‘Advanced Voice’ ChatGPT Just Happened … But There's 3 Other Stories You Probably Shouldn’t Ignore
This AI Explained video reviews a major AI development through the lens of benchmarks and evaluation evidence. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
o1 - What is Going On? Why o1 is a 3rd Paradigm of Model + 10 Things You Might Not Know
This AI Explained video reviews a major AI development through the lens of benchmarks and evaluation evidence. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
ChatGPT o1 - In-Depth Analysis and Reaction (o1-preview)
This AI Explained video reviews a major AI development through the lens of benchmarks and evaluation evidence. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
$125B for Superintelligence? 3 Models Coming, Sutskever's Secret SSI, & Data Centers (in space)...
This AI Explained video reviews a major AI development through the lens of benchmarks and evaluation evidence. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
Grok-2 Actually Out, But What If It Were 10,000x the Size?
This AI Explained video reviews a major AI development through the lens of benchmarks and evaluation evidence. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
Was GPT-5 Underwhelming, Or Not? OpenAI Co-founder Exits, Figure02 Arrives, Character.AI Gutted
This AI Explained video reviews a major AI development through the lens of AI safety and model behavior. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
Llama 405b: Full 92 page Analysis, and Uncontaminated SIMPLE Benchmark Results
This AI Explained video reviews a major AI development through the lens of benchmarks and evaluation evidence. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
GPT-4o Mini Arrives In Global IT Outage, But How ‘Mini’ Is Its Intelligence?
This AI Explained video reviews a major AI development through the lens of benchmarks and evaluation evidence. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
How Far Can We Scale AI? Gen 3, Claude 3.5 Sonnet and AI Hype
This AI Explained video reviews a major AI development through the lens of AI safety and model behavior. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
AI Won't Be AGI, Until It Can At Least Do This (plus 6 key ways LLMs are being upgraded)
This AI Explained video reviews a major AI development through the lens of governance and responsible deployment. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
‘Everything is Going to Be Robotic’ Nvidia Promises, as AI Gets More Real
This AI Explained video reviews a major AI development through the lens of governance and responsible deployment. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
Microsoft Promises a 'Whale' for GPT-5, Anthropic Delves Inside a Model’s Mind and Altman Stumbles
This AI Explained video reviews a major AI development through the lens of AI safety and model behavior. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
GPT-4o - Full Breakdown + Bonus Details
This AI Explained video reviews a major AI development through the lens of benchmarks and evaluation evidence. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
AI Conquers Gravity: Robo-dog, Trained by GPT-4, Stays Balanced on Rolling, Deflating Yoga Ball
This AI Explained video reviews a major AI development through the lens of robotics, world models, and embodied AI. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
New OpenAI Model 'Imminent' and AI Stakes Get Raised (plus Med Gemini, GPT 2 Chatbot and Scale AI)
This AI Explained video reviews a major AI development through the lens of agentic workflows and tool-use risk. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
Storyteller: Building Multi-modal Apps with TS & ModelFusion - Lars Grammel, PhD
AI Engineer session on Storyteller: Building Multi-modal Apps with TS & ModelFusion - Lars Grammel, PhD. It adds practical context for how teams are building and operating AI systems in production.
Play video
The Code AI Maturity Model and What It Means For You: Ado Kukic
AI Engineer session on The Code AI Maturity Model and What It Means For You: Ado Kukic. It adds practical context for how teams are building and operating AI systems in production.
Play video
‘Her’ AI, Almost Here? Llama 3, Vasa-1, and Altman ‘Plugging Into Everything You Want To Do’
This AI Explained video reviews a major AI development through the lens of governance and responsible deployment. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
Udio, the Mysterious GPT Update, and Infinite Attention
This AI Explained video reviews a major AI development through the lens of benchmarks and evaluation evidence. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
Why Does OpenAI Need a 'Stargate' Supercomputer? Ft. Perplexity CEO Aravind Srinivas
This AI Explained video reviews a major AI development through the lens of multimodal generation and provenance. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
5 Key Quotes: Altman, Huang and 'The Most Interesting Year'
This AI Explained video reviews a major AI development through the lens of AI safety and model behavior. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
AI Agents Take the Wheel: Devin, SIMA, Figure 01 and The Future of Jobs
This AI Explained video reviews a major AI development through the lens of governance and responsible deployment. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
The New, Smartest AI: Claude 3 – Tested vs Gemini 1.5 + GPT-4
This AI Explained video reviews a major AI development through the lens of benchmarks and evaluation evidence. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
The AI 'Genie' is Out + Humanoid Robotics Step Closer
This AI Explained video reviews a major AI development through the lens of multimodal generation and provenance. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
Sora - Full Analysis (with new details)
This AI Explained video reviews a major AI development through the lens of multimodal generation and provenance. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
Gemini 1.5 and The Biggest Night in AI
This AI Explained video reviews a major AI development through the lens of benchmarks and evaluation evidence. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
Gemini Ultra - Full Review
This AI Explained video reviews a major AI development through the lens of scaling and compute economics. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
GPT-5: Everything You Need to Know So Far
This AI Explained video reviews a major AI development through the lens of model capability and AI systems in practice. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
Alpha Everywhere: AlphaGeometry, AlphaCodium and the Future of LLMs
This AI Explained video reviews a major AI development through the lens of scaling and compute economics. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
OpenAI Flip-Flops and '10% Chance of Outperforming Humans in Every Task by 2027' - 3K AI Researchers
This AI Explained video reviews a major AI development through the lens of governance and responsible deployment. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
AI On An Exponential? Data, Mamba, and More
This AI Explained video reviews a major AI development through the lens of scaling and compute economics. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
Midjourney v6, Altman 'Age Reversal' and Gemini 2 - Christmas Edition
This AI Explained video reviews a major AI development through the lens of multimodal generation and provenance. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
A 100T Transformer Model Coming? Plus ByteDance Saga and the Mixtral Price Drop
This AI Explained video reviews a major AI development through the lens of multimodal generation and provenance. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
Phi-2, Imagen-2, Optimus-Gen-2: Small New Models to Change the World?
This AI Explained video reviews a major AI development through the lens of benchmarks and evaluation evidence. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
Gemini Full Breakdown + AlphaCode 2 Bombshell
This AI Explained video reviews a major AI development through the lens of benchmarks and evaluation evidence. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
OpenAI Insights and Training Data Shenanigans - 7 'Complicated' Developments + Guest Star
This AI Explained video reviews a major AI development through the lens of model capability and AI systems in practice. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
Q* - Clues to the Puzzle?
This AI Explained video reviews a major AI development through the lens of governance and responsible deployment. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
Are We Back to Before? OpenAI 2.0, Inflection-2 and a Major AI Cancer Breakthrough
This AI Explained video reviews a major AI development through the lens of model capability and AI systems in practice. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
Altman@Microsoft, Shear@OpenAI, Chaos@Everywhere: Sutskever Regret and the Weekend That Changed AI
This AI Explained video reviews a major AI development through the lens of benchmarks and evaluation evidence. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
Altman Out: Reasons, Reactions and the Repercussions for the Industry
This AI Explained video reviews a major AI development through the lens of AI safety and model behavior. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
GPT 4 Turbo-Charged? Plus Custom GPTS, Grok, AGI Tier List, Vision Demos, Whisper V3 and more
This AI Explained video reviews a major AI development through the lens of model capability and AI systems in practice. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
AI Declarations and AGI Timelines – Looking More Optimistic?
This AI Explained video reviews a major AI development through the lens of governance and responsible deployment. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
State of AI 2023: Highlights of 163 Page Report + Eureka Self-Improvement, MEG, Suno AI and GPT F
This AI Explained video reviews a major AI development through the lens of governance and responsible deployment. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
Not Slowing Down: GAIA-1 to GPT Vision Tips, Nvidia B100 to Bard vs LLaVA
This AI Explained video reviews a major AI development through the lens of AI safety and model behavior. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
RT-X and the Dawn of Large Multimodal Models: Google Breakthrough and 160-page Report Highlights
This AI Explained video reviews a major AI development through the lens of multimodal generation and provenance. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
An Actually Big Week in AI: AutoGen, The A-Phone, Mistral 7B, GPT-Fathom and Meta Hunts CharacterAI
This AI Explained video reviews a major AI development through the lens of agentic workflows and tool-use risk. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
ChatGPT Fails Basic Logic but Now Has Vision, Wins at Chess and Prompts a Masterpiece
This AI Explained video reviews a major AI development through the lens of governance and responsible deployment. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
The New Bard and AI Images, Videos, and Translations
This AI Explained video reviews a major AI development through the lens of scaling and compute economics. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
9 AI Developments: HeyGen 2.0 to AjaxGPT, Open Interpreter to NExT-GPT and Roblox AI
This AI Explained video reviews a major AI development through the lens of agentic workflows and tool-use risk. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
AGI Will Not Be A Chatbot - Autonomy, Acceleration, and Arguments Behind the Scenes
This AI Explained video reviews a major AI development through the lens of governance and responsible deployment. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
SmartGPT: Major Benchmark Broken - 89.0% on MMLU + Exam's Many Errors
This AI Explained video reviews a major AI development through the lens of benchmarks and evaluation evidence. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
9 New Gemini Leaks, Code Llama and A Major AI Consciousness Paper
This AI Explained video reviews a major AI development through the lens of AI safety and model behavior. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
AI Los Alamos? + New Realistic AI Avatars
This AI Explained video reviews a major AI development through the lens of AI safety and model behavior. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
11 Major AI Developments: RT-2 to '100X GPT-4'
This AI Explained video reviews a major AI development through the lens of AI safety and model behavior. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
Llama 2: Full Breakdown
This AI Explained video reviews a major AI development through the lens of governance and responsible deployment. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
Bad AI Predictions: Bard Upgrade, 2 Years to AI Auto-Money, OpenAI Investigation and more
This AI Explained video reviews a major AI development through the lens of robotics, world models, and embodied AI. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
Time Until Superintelligence: 1-2 Years, or 20? Something Doesn't Add Up
This AI Explained video reviews a major AI development through the lens of benchmarks and evaluation evidence. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
Phi-1: A 'Textbook' Model
This AI Explained video reviews a major AI development through the lens of benchmarks and evaluation evidence. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
Google Gemini: AlphaGo-GPT?
This AI Explained video reviews a major AI development through the lens of agentic workflows and tool-use risk. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
ChatGPT's Achilles' Heel
This AI Explained video reviews a major AI development through the lens of scaling and compute economics. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
Sam Altman's World Tour, in 16 Moments
This AI Explained video reviews a major AI development through the lens of AI safety and model behavior. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
The AI News You Might Have Missed This Week - Zuckerberg to Falcon w/ SPQR
This AI Explained video reviews a major AI development through the lens of multimodal generation and provenance. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
Orca: The Model Few Saw Coming
This AI Explained video reviews a major AI development through the lens of benchmarks and evaluation evidence. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
The AI News You Might Have Missed This Week
This AI Explained video reviews a major AI development through the lens of robotics, world models, and embodied AI. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
'Show Your Working': ChatGPT Performance Doubled w/ Process Rewards (+Synthetic Data Event Horizon)
This AI Explained video reviews a major AI development through the lens of benchmarks and evaluation evidence. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
Hassabis, Altman and AGI Labs Unite - AI Extinction Risk Statement [ft. Sutskever, Hinton + Voyager]
This AI Explained video reviews a major AI development through the lens of AI safety and model behavior. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
12 New Code Interpreter Uses (Image to 3D, Book Scans, Multiple Datasets, Error Analysis ... )
This AI Explained video reviews a major AI development through the lens of agentic workflows and tool-use risk. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
GPT 4 Got Upgraded - Code Interpreter (ft. Image Editing, MP4s, 3D Plots, Data Analytics and more!)
This AI Explained video reviews a major AI development through the lens of agentic workflows and tool-use risk. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
'This Could Go Quite Wrong' - Altman Testimony, GPT 5 Timeline, Self-Awareness, Drones and more
This AI Explained video reviews a major AI development through the lens of governance and responsible deployment. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
Enter PaLM 2 (New Bard): Full Breakdown - 92 Pages Read and Gemini Before GPT 5? Google I/O
This AI Explained video reviews a major AI development through the lens of governance and responsible deployment. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
GPT 4 is Smarter than You Think: Introducing SmartGPT
This AI Explained video reviews a major AI development through the lens of agentic workflows and tool-use risk. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
What's Behind the ChatGPT History Change? How You Can Benefit + The 6 New Developments This Week
This AI Explained video reviews a major AI development through the lens of governance and responsible deployment. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
8 Signs It's The Future: Thought-to-Text, Nvidia Text-to-Video, Character AI, and P(Doom) @Ted
This AI Explained video reviews a major AI development through the lens of agentic workflows and tool-use risk. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
‘We Must Slow Down the Race’ – X AI, GPT 4 Can Now Do Science and Altman GPT 5 Statement
This AI Explained video reviews a major AI development through the lens of agentic workflows and tool-use risk. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
GPT 5 Will be Released 'Incrementally' - 5 Points from Brockman Statement [plus Timelines & Safety]
This AI Explained video reviews a major AI development through the lens of AI safety and model behavior. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
Can GPT 4 Prompt Itself? MemoryGPT, AutoGPT, Jarvis, Claude-Next [10x GPT 4!] and more...
This AI Explained video reviews a major AI development through the lens of agentic workflows and tool-use risk. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
Do We Get the $100 Trillion AI Windfall? Sam Altman's Plans, Jobs & the Falling Cost of Intelligence
This AI Explained video reviews a major AI development through the lens of governance and responsible deployment. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
'Pause Giant AI Experiments' - Letter Breakdown w/ Research Papers, Altman, Sutskever and more
This AI Explained video reviews a major AI development through the lens of AI safety and model behavior. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
How Well Can GPT-4 See? And the 5 Upgrades That Are Next
This AI Explained video reviews a major AI development through the lens of scaling and compute economics. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
'Sparks of AGI' - Bombshell GPT-4 Paper: Fully Read w/ 15 Revelations
This AI Explained video reviews a major AI development through the lens of benchmarks and evaluation evidence. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
What's Up With Bard? 9 Examples + 6 Reasons Google Fell Behind [ft. Muse, Med-PaLM 2 and more]
This AI Explained video reviews a major AI development through the lens of AI safety and model behavior. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
Google Bard - The Full Review. Bard vs Bing [LaMDA vs GPT 4]
This AI Explained video reviews a major AI development through the lens of multimodal generation and provenance. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
GPT 4: 9 Revelations (not covered elsewhere)
This AI Explained video reviews a major AI development through the lens of governance and responsible deployment. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
GPT 4: Full Breakdown (14 Details You May Have Missed)
This AI Explained video reviews a major AI development through the lens of AI safety and model behavior. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
GPT 5 is All About Data
This AI Explained video reviews a major AI development through the lens of benchmarks and evaluation evidence. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
8 New Ways to Use Bing's Upgraded 8 [now 20] Message Limit (ft. pdfs, quizzes, tables, scenarios...)
This AI Explained video reviews a major AI development through the lens of model capability and AI systems in practice. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
9 of the Best Bing (GPT 4) Prompts
This AI Explained video reviews a major AI development through the lens of model capability and AI systems in practice. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
8 Ways ChatGPT 4 [Is] Better Than ChatGPT
This AI Explained video reviews a major AI development through the lens of benchmarks and evaluation evidence. It is useful context for AI engineering, evaluation, governance, and operational risk.
Play video
GPT 4 - hype vs reality
This AI Explained video reviews a major AI development through the lens of scaling and compute economics. It is useful context for AI engineering, evaluation, governance, and operational risk.