DatE

November 12, 2025

Reading Time

10 Min.

API

KI-Integration

Why Your API Confuses AI Agents – The Problem with Cognitive Load

By

Andreas Siegel

‍

AI agents are supposed to use our APIs – but why do they so often seem to fail or work inefficiently? The answer lies not in the intelligence of the models, but in a fundamental mismatch: APIs are designed for machines and systems, but in a way that humans can understand and integrate them – but not necessarily machines with limited context windows (or "comprehension capacity"). In this article, I'll show you why this is a problem.

‍

Two Perspectives, One Challenge

In discussions about AI integration, two seemingly contradictory viewpoints keep emerging. But to understand them, we first need to look more closely at what this is all about:

‍

The Context: API Operations and Tools for AI Systems

You have a REST API with its operations – individual endpoints like GET /stations, POST /bookings, or DELETE /bookings/{id} as in the Train Travel API. Each operation does something specific: read, create, update, or delete a resource.

On the other hand, you have AI systems or agents that are supposed to complete tasks for you. For them to interact with your API, they need tools – functions they can call. In the Model Context Protocol (MCP), these are called "Tools", in other frameworks also "Functions" or "Actions". Simply put: A tool is what enables an AI to do something.

So the central question is: How do you connect these two worlds? How do you turn API operations into AI tools? And it's precisely with this question that the two seemingly contradictory perspectives emerge.

‍

Perspective 1: "1:1 Mapping Doesn't Work"

It's emphasized how important it is not to simply expose API operations as MCP tools on a 1:1 basis. If you create a separate tool for each API operation, it leads to worse performance and confuses the agents. An agent then gets presented with 50 tools and would have to figure out on its own how to combine them to achieve something meaningful (valuable).

In any case, meaningful API descriptions are the foundation for derived tools. It's precisely in this context that the question behind the second perspective comes into play.

Perspective 2: "Why Not Just Use the API Directly?"

Why do you even need dedicated MCP servers and tool abstractions when you already have a well-documented API specification? Couldn't the AI simply use the existing API operations directly? Or automatically generate the tools from the OpenAPI definition?

This perspective therefore aims precisely at mapping API operations directly and 1:1 to AI tools, and thus pursuing exactly the approach that the other perspective sees as disadvantageous.

The Apparent Contradiction

So the two perspectives seem to contradict each other – but they don't. They just illuminate different aspects of the same underlying problem.

The crux lies deeper: Cognitive Load on AI Agents. Classic REST APIs are designed for human developers – they are resource-oriented (/stations, /trips, /bookings) and offer flexible, reusable CRUD operations.

This is perfect for humans: We read documentation, build mental models, and intuitively understand that "booking a train journey" in the Train Travel API means: first search for stations (GET /stations), then find connections (GET /trips), then book (POST /bookings).

For AI agents, this is a massive cognitive burden. They have no persistent mental model, no "understanding" of the domain, and a severely limited context window. Every API operation costs valuable tokens. Every decision about which operation to execute next is a potential source of error.

‍

What is Cognitive Load for AI Agents?

Large Language Models (LLMs) have a fundamental problem: a limited context window. Every piece of information you give an LLM costs tokens. And the more tokens, the worse the performance.

This isn't just a theoretical limit. Research by Liu et al. (2024) shows the "Lost in the Middle" problem: LLMs can reliably retrieve information at the beginning and end of a long context, but performance degrades significantly when relevant information has to be in the middle. This is like us humans – we remember the beginning and end of a long list better than the middle (Serial Position Effect).

Even more problematic: The more information in the context window, the greater the risk of Context Rot – the model's ability to correctly retrieve information decreases.

Practical Implications for Tool Calling:

Every tool description costs valuable tokens.
Too many tools = worse agent decisions
Too little description = incorrect tool usage
Finding the sweet spot is difficult.

The Problem with REST APIs: Train Travel API as an Example

Let's look at the Train Travel API – an OpenAPI definition deliberately developed as a best-practice example to replace the classic Pet Store:

‍

1paths:
2  /stations:
3    get:
4      summary: Get a list of train stations
5      operationId: get-stations
6      parameters:
7        - name: search
8          in: query
9          description: Search term to filter stations
10          schema:
11            type: string
12        - name: country
13          in: query
14          description: Filter stations by country code
15          schema:
16            type: string
17  /trips:
18    get:
19      summary: Get available train trips
20      operationId: get-trips
21      parameters:
22        - name: origin
23          in: query
24          required: true
25          description: The ID of the origin station
26          schema:
27            type: string
28            format: uuid
29        - name: destination
30          in: query
31          required: true
32          description: The ID of the destination station
33          schema:
34            type: string
35            format: uuid
36        - name: date
37          in: query
38          required: true
39          description: The date and time of the trip
40          schema:
41            type: string
42            format: date-time
43  /bookings:
44    get:
45      summary: List existing bookings
46      operationId: get-bookings
47    post:
48      summary: Create a booking
49      operationId: create-booking
50  /bookings/{bookingId}:
51    get:
52      summary: Get a booking
53      operationId: get-booking
54    delete:
55      summary: Delete a booking
56      operationId: delete-booking
57  /bookings/{bookingId}/payment:
58    post:
59      summary: Pay for a Booking
60      operationId: create-booking-payment

What's the Problem?

The API is resource-oriented: /stations, /trips, /bookings, /payments. For human developers who can read documentation and build mental models of entities, this is completely fine. But an AI agent only sees isolated operations without business context.

This leads to the common question:

Why not simply expose all operations as tools?

‍

The answer lies in the different design principles:

Classic APIs (Resource-First) are designed for humans:

Orientation towards entities: /stations, /trips, /bookings
CRUD operations without explicit business context
Assumption: Developers read documentation and understand relationships
Flexible and reusable
Humans have unlimited "context window" (brain + notes + IDE + experience)

‍

AI Agents need Use-Case-First:

Orientation towards concrete problems: book_train_trip, cancel_booking
Business value is explicit in the description
Assumption: An agent must understand from the description alone when to use the tool
Focused on specific workflows
AI agents have limited context window and no persistent mental models

‍

The Concrete Challenge

When a user says "I want to book a train from Chemnitz to Paris", the agent must:

Understand that it needs to orchestrate multiple API calls
Derive the correct sequence and dependencies between the calls:
- First find stations in Chemnitz (GET /stations?search=Chemnitz)
- Then find stations in Paris (GET /stations?search=Paris)
- Extract IDs from both responses
- Search for available connections (GET /trips?origin={chemnitz_id}&destination={paris_id}&date=2024-02-01)
- Create a booking for the selected trip (POST /bookings)
- Process payment for the booking (POST /bookings/{booking_id}/payment)
Reconstruct business intent from technical operations

This is massive cognitive work – and this is exactly where the crux lies.

‍

The Misunderstanding: Isn't Flexibility Good?

At this point, this question may arise:

But if I give the LLM many individual operations, it has more flexibility and can adapt to different situations. Isn't that better than rigid, predefined tools?

The short answer: "No."

Here lies a fundamental misunderstanding about how LLMs work and what they can accomplish. Let's look at this more closely:

Scenario 1: Many Individual API Operations as Tools

The agent gets 50 tools for individual API operations. For the request "Book a train journey from Chemnitz to Paris", the LLM must:

Identify the right operations from many options
Derive the correct sequence
Understand dependencies between calls
Implement error handling itself
Decide at each step what comes next

However, LLMs work based on probabilities and not deterministically. So it may be that an AI system makes the "right" decisions and chooses the "right" path, but next time decides completely differently. So what happens in reality?

The agent chooses the wrong operation.
The sequence is not optimal.
Errors are not handled.
The workflow breaks off in the middle.

The process is not reproducible – the same request can therefore also produce different results.

Scenario 2: A Use-Case-Oriented Tool `book_train_trip`

The agent gets a tool that internally orchestrates 5-7 API calls with deterministic logic. For the request "Book train journey", the LLM only needs to:

Decide: "Does book_train_trip match the request?" (Yes/No)
Determine the right parameters (start, destination, etc.)

Here, it's no longer the path to the goal that is subject to probabilities, but only the concrete design of the goal. This means:

The implementation has been tested and works reliably.
Error handling is centrally implemented.

The LLM focuses on the business decision, not on technical details. Thus, the process is reproducible – the same input also leads to the same result.

The Core Message: Shift Complexity from Probabilistic LLM to Deterministic Code

The complexity doesn't disappear – it just moves to where it belongs:

Deterministic Logic (in the tool implementation): Sequence, error handling, validation, transaction management
Business Decisions (in the LLM): "Does the user want to buy?", "Which tool fits?", "Which parameters make sense?"

The LLM can then focus on what it does well: Understanding what the user wants and deciding which tool delivers the desired business value – not on how this value is technically implemented. In other words: You want the LLM to decide "We should book a train journey" and not "Now I call GET /stations, then I take the ID from the response, then POST /bookings with this ID ..."

‍

Factors of Cognitive Load

Number of Operations: The Train Travel API requires 5 separate operations for the "book train journey" workflow.
Missing Intention: No information about "Why do I need this operation?"
Implicit Dependencies: Which calls must be made in which order?
Token Waste: Every operation needs a description in the context window – and with every additional message in the conversation, all tool descriptions are sent again.

That's why simple 1:1 mapping often doesn't work well with classic REST APIs – unless the API was already designed with AI agents in mind.

‍

Classics as Metaphors

Let me illustrate the problem with two examples you might know:

The Linux Birthday Cake

A popular meme from the Linux community shows a "birthday cake" with the headline "Happy Birthday! Compile your own cake" – an allusion to the fact that under Linux you often have to compile software yourself from source code instead of using ready-made packages. What happens when we transfer this principle to API design?

‍

Bad Design (Atomic Operations):

The agent must orchestrate 11 separate tools:

get_flour() (Get flour from the cupboard)
get_eggs() (Get eggs from the fridge)
get_sugar() (Get sugar)
get_butter() (Get butter)
mix_ingredients() (Mix ingredients)
preheat_oven() (Preheat oven)
bake_cake() (Bake cake 40 minutes)
cool_cake() (Let cake cool)
prepare_frosting() (Prepare frosting)
decorate_cake() (Decorate cake)
add_candles() (Add candles)

‍

The Problem: The agent must derive the correct sequence, understand dependencies (preheat oven before baking!), and orchestrate the complete workflow itself. That's massive cognitive load!

‍

Good Design (Use Case-Oriented):

A single tool: serve_birthday_cake(age, flavor)

What it does:

Cake is ready baked
Appropriate number of candles is on the cake
Ready to serve

The Advantage: Clear intention. Minimal cognitive load. The agent immediately knows when to use this tool.

‍

The IKEA Job Interview

Another example is the cartoon of a job interview at IKEA by Canary Pete – the applicant is asked to assemble a chair and take a seat at the beginning of the interview.

‍

The Problem: Imagine you come to a job interview and first have to assemble the chair before you can sit down. You get a box with individual parts, instructions in pictograms, and an Allen key. Only when the chair is assembled can the actual interview begin. That's absurd – no one would expect that from an applicant. The focus should be on the interview, not on furniture assembly.

The Transfer to APIs: But that's exactly what happens when we confront AI agents with resource-oriented APIs. Instead of a tool for conducting the interview, the agent gets:

Order chair parts
Open package
Read instructions
Assemble legs
Attach seat
... (11 individual steps)
Start conversation

The agent must build the entire "infrastructure" itself before it can tackle the actual task.

The Better Approach: A use-case-oriented tool like conduct_interview(candidate_name, position) abstracts all these details away. The chair is already there. The room is prepared. The interview can begin immediately. The agent can focus on the essentials – just like an applicant should focus on the conversation, not on furniture assembly.

But why do we expect this from AI agents with APIs – or simply accept this situation?

‍

Conclusion: A Fundamental Design Problem – and Why It Matters Now

The Problem is Real and Relevant

We've seen why classic REST APIs confuse and overwhelm AI agents – and it's not a theoretical problem. If you connect an AI agent to your API today, it will likely:

Potentially choose the wrong operations from dozens
Execute workflows in suboptimal order
Choose different paths for the same requests
Perhaps fail more often than succeed

This isn't due to the quality of the models. Claude, GPT-5, and other modern LLMs are impressively powerful. But they're fighting against a fundamental design mismatch: APIs were built for humans who read documentation, build mental models, and work with unlimited "context window" (brain + IDE + experience).

‍

Why Act Now?

AI integration of APIs is no dreams of the future – it's happening now. Companies are developing MCP tools and building AI agents that should use their internal and external APIs. The question is no longer "whether", but "how well".

Sure, the models continue to evolve. And we can assume that the possibilities and capabilities to deal with the challenges described in this article will also improve. However, we certainly have means and ways to approach this differently already today – and make "life" a little easier for our agents.

And here lies the opportunity: You don't have to choose between "AI-friendly" and "developer-friendly". The principles that make APIs better for AI agents – clear business context, explicit use cases, documented workflows – also make them more intuitive and easier to use for human developers.

AI agents will certainly "appreciate" that too. They thank us by consuming fewer tokens or achieving better and more reproducible results.

‍

The Core Message to Take Away

Shift complexity from the probabilistic LLM to deterministic code.

This isn't a plea against flexibility or for rigid, inflexible systems. It's a plea for clear responsibilities:

The LLM understands the user's intention and chooses the appropriate tool.
The deterministic code reliably executes the complex orchestration.
Users get reproducible, reliable results.

knowledge

What interests us

View all articles

Why Your API Confuses AI Agents – The Problem with Cognitive Load