Part 1 established the blind spot: existing eval frameworks were built for RAG, not tool execution, and the gap is both systematic and unaddressed. This part maps the 19 failure modes across four layers — and proposes three evaluation primitives to close it.
A 4-Layer Failure Taxonomy for
Agentic AI Tool Calls
Most agentic failures aren't model failures — they're architecture failures concentrated at the tool execution layer. The taxonomy below organises 19 failure types across four layers, ordered by when in the execution chain they occur.
Click each layer to explore the specific failure modes and their detection difficulty ratings.
Detection difficulty reflects ease of catching failures before they reach end users. Silent = no error thrown.
Three Evaluation Primitives
That Don't Exist Yet
The Viola Conseil methodology is built around three primitives that address the white space directly. Together, they constitute a framework that doesn't exist today as a unified methodology.