In this article
Why This Matters
The Eval Framework Blind Spot
97M+
Python + TypeScript SDKs combined
40%
Gartner, Aug 2025
14.2%
Zhang et al., ICML 2025 Spotlight
Standing on Shoulders
What the Research Confirms
ICML 2025 SPOTLIGHT · PENN STATE / DUKE
"Which Agent Causes Task Failures and When?"
LAYER 4 · CHAIN ATTRIBUTION
MICROSOFT AI RED TEAM · WHITEPAPER 2025
Taxonomy of Failure Modes in Agentic AI Systems
SECURITY FRAMING · ALL LAYERS
arXiv · FEBRUARY 2026
"MCP Tool Descriptions Are Smelly!"
LAYER 1 · TOOL SELECTION
arXiv · SEPTEMBER 2025
Diagnosing Failure Root Causes in Agentic Platforms
LAYER 4 · ROOT CAUSE GAP
NDSS 2026 · INJECAGENT / TOOLHIJACKER
Indirect Prompt Injection Benchmarks
LAYER 3 · INJECTION SECURITY
NIST · FEBRUARY 2026
AI Agent Standards Initiative
STANDARDS · REGULATORY SIGNAL