LIMITATIONS AND ISSUES WITH USING THE LLM API IN WORKPLACE AGENT SYSTEMS
WHEN ORGANIZATIONS BUILD ENTERPRISE-WIDE WORKPLACE AGENT SYSTEMS, THERE ARE A NUMBER OF LIMITATIONS THAT MAKE IT DIFFICULT TO MEET THE NEEDS OF COMPLEX AND INTRICATE ENTERPRISE ENVIRONMENTS USING SIMPLY THE LLM API.
This analysis comprehensively examines the structural limitations that prevent LLM wrapper services from providing agentic capabilities, including token limits, context window constraints, numerical capabilities, and the complexity of function calls, along with technical limitations. In particular, it analyzes the obstacles that remain even with the additional API capabilities provided by major LLM providers such as OpenAI, Claude, and Google Gemini, and the need for agentic architectures in enterprise environments.
BASIC TECHNICAL LIMITATIONS OF THE LLM API
1. context window constraints and information processing limitations
LLM processes the entered text in tokens. These tokens can be entire words, parts of words, or individual characters. Generally speaking, one token is equivalent to about four characters in English, but this varies depending on the language structure and the model's tokenizer. This tokenization process itself is one of the structural causes that can lead to disconnection or loss of precise information.
In addition, all LLMs have predefined token limits. For example, OpenAI's GPT-4o supports a context window of up to 128k tokens, but its output is limited to 4k tokens. Anthropic's Claude can also accept up to 200k tokens, but the output is limited to the same 4k token limit.
These limitations of the context window hinder contextual reasoning in long documents or complex conversations, and are a major source of inaccuracies in the consistency and precision of information. In particular, conditions, figures, dates, and exceptions presented earlier in the document are often not accurately remembered or consistently reflected by the model at the time of output.
To solve these problems, search augmentation techniques such as Retrieval-Augmented Generation (RAG) and hybrid approaches that combine LLM and external tools are required. Simply expanding the size of the context window has its limitations, and increasing the quality and contextual relevance of information is more effective.
2. Limitations of inference and accuracy
LLM IS CURRENTLY A LIMITATION IN ENTERPRISE ENVIRONMENTS THAT REQUIRE PRECISE CALCULATIONS OR COMPLEX LOGICAL REASONING. THE LLM API ALONE IS NOT SUFFICIENTLY RELIABLE, ESPECIALLY WHEN PERFORMING TASKS WHERE ACCURACY IS ESSENTIAL, SUCH AS WORKING WITH NUMBERS, FINANCIAL ANALYSIS, STATISTICAL PROCESSING, AND ENGINEERING CALCULATIONS. THIS IS WHERE A COMPLEMENTARY APPROACH IS NEEDED.
IN FACT, SEVERAL RESEARCHERS HAVE POINTED TO THE HIGH RATE OF INCORRECT ANSWERS FOR ARITHMETIC AND MATH OPERATIONS PERFORMED BY LLMS. FURTHERMORE, WHEN DEALING WITH STRUCTURED DATA SUCH AS TABULAR OR CSV, LLMS TEND TO TREAT THIS DATA AS A SIMPLE CONTINUOUS STREAM OF TOKENS AND DO NOT HAVE A CLEAR UNDERSTANDING OF THE SPATIAL RELATIONSHIPS BETWEEN CELLS. THIS CAN LEAD TO ERRORS AND INACCURACIES IN OPERATIONS SUCH AS COMPUTING, COMPARING, AND SORTING BETWEEN DATA. AS A RESULT, A SEPARATE TECHNICAL APPROACH IS ESSENTIAL IN ENVIRONMENTS THAT REQUIRE PRECISE PROCESSING OF SUCH STRUCTURED DATA.
FURTHERMORE, ERRORS IN THE LLM'S STEP-BY-STEP REASONING PROCESS HAVE A TENDENCY TO PROPAGATE CASCADINGLY. THIS ERROR PROPAGATION IS ESPECIALLY DANGEROUS IN REAL-WORLD SETTINGS. FOR EXAMPLE, WHEN DEALING WITH COMPLEX BUSINESS LOGIC, A SMALL MISTAKE IN THE EARLY STAGES CAN COMPROMISE THE INTEGRITY OF THE ENTIRE PROCESS.
3. complexity of function calls and tool integrations
OpenAI's function call feature allows you to describe a function and its arguments in JSON format at the prompt. However, the model does not actually call or execute the function directly; instead, it returns the function it should call and the arguments to use in JSON. This can lead to hallucinations where the model creates additional arguments that don't actually exist.
CONNECTING TO EXTERNAL TOOLS VIA FUNCTION CALLS IS A MULTI-STEP PROCESS. FIRST, THE USER QUESTION MUST BE SENT AS A REQUEST TO THE MODEL, THE MODEL DETERMINES WHICH FUNCTION TO CALL, THE JSON OUTPUT MUST BE PARSED TO ACTUALLY EXECUTE THE FUNCTION, AND THEN THE REQUEST MUST BE REPEATED BY FEEDING THE RESULTS OF THE FUNCTION EXECUTION BACK INTO THE MODEL, COMPLICATING THE PROCESS. THIS MULTI-STEP PROCESS CREATES A SIGNIFICANT DEVELOPMENT AND MAINTENANCE BURDEN WHEN BUILDING INTEGRATIONS WITH VARIOUS ENTERPRISE TOOLS IN AN ENTERPRISE-WIDE SYSTEM.
The Model Context Protocol (MCP) is an innovative protocol that enables AI agents to connect to external tools and data in a standardized way, but it has a number of challenges and limitations when implemented in real-world agent systems. Furthermore, MCP is still a nascent technology and requires a cautious approach as it is not fully standardized and needs significant improvements in terms of performance optimization and ecosystem maturity before it can be adopted at scale by enterprises. A similar approach is Openai's previously released Response API.
One of the serious issues with Model Context Protocol (MCP), including latency issues, is the Tool Poisoning Attack, which works by covertly inserting malicious instructions within MCP tooltips, which remain invisible to the user but exposed to the AI model. For example, a tool that appears to the user to simply "add two numbers together" may actually be designed to access a user's sensitive configuration files, SSH private keys, etc. The key problem with these attacks is that the user interface doesn't clearly show the discrepancy between what the tool actually does and what the AI model recognizes. Even more concerning is the fact that even though some MCP clients require explicit user authorization to install a tool, MCP's packaged or server-based architecture allows for a form of "rug pull," meaning that a malicious server operator can change the tool description even after a user has authorized the tool, creating the possibility that a tool that initially appears safe could later contain malicious commands.To mitigate this threat, it is very important to use only authorized and validated MCP packages, and it is also recommended that changes to the tool's definition and metadata be disabled, or that an integrity verification scheme be in place to track changes. To mitigate this threat, it is very important to use only authorized and validated MCP packages. It is also recommended to disable changes to the tool's definitions and metadata, or have an integrity verification scheme in place to track changes.
MCP is still a nascent technology, and it's hard to imagine a mature standard that can be applied universally to all AI systems. In particular, in its early days, it didn't even have a defined authentication specification, and while OAuth-based authentication was later introduced, its implementation complexity and lack of consistency in practice led to frustration among many developers. In the absence of a clearly defined authentication method, each MCP server had to handle authentication in its own way, leading to serious security issues where some servers could access sensitive data without authentication.
Currently, MCPs offer a number of security guidelines to enhance security, such as OAuth authentication, Principle of Least Privilege, and Input Validation, and the protocol specifications are evolving rapidly. Nevertheless, it has been consistently pointed out that for enterprises to adopt MCPs at scale, they need to have sufficient confidence and stability in the maturity of the ecosystem, consistency of implementation, performance optimization, and long-term maintenance.
Structural differences between LLM Wrapper vs Agentic Architecture
Inherent limitations of LLM Wrapper
An LLM wrapper is a tool or application built on top of a pre-trained language model, such as GPT-4. These wrappers are often designed around chat interfaces that take in human-readable text and generate responses, and most commercial LLM-based public services follow this structure.
THE PROBLEM IS THAT THESE WRAPPERS SIMPLY CONNECT INPUTS AND OUTPUTS, LIMITING THE ABILITY TO TAKE ADVANTAGE OF THE MORE COMPLEX AND MEANINGFUL FEATURES THAT LLMS CAN PROVIDE. THIS IS ESPECIALLY TRUE IN REAL-WORLD BUSINESS ENVIRONMENTS THAT REQUIRE MULTI-STEP WORKFLOWS, CONTEXTUALIZED, STATE-BASED ACTION PROCESSING, AND INTEGRATION WITH EXTERNAL SYSTEMS. THESE SIMPLE INTERFACES ALSO HINDER THE USE OF LLMS IN COMPLEX DATA PIPELINES WHERE THEY SHOULD BE USED AS AN INTERMEDIATE PROCESSING STEP.
An agentic architectureorganizes the LLM into functional units, separating each task into independent agents, allowing for the systematic implementation of different levels of AI capabilities, from simple tasks to complex process automation. This architecture is well suited for collaboration between multiple agents, as well as for performing granular roles within functional units, such as document analysis, number crunching, and condition-based branching. Each agent is designed to be optimized for a specific task and can be combined as needed to form complex business logic. This allows for greater flexibility and scalability than structures that delegate all functions to a single model, while also increasing the consistency and accuracy of results.
However, these features cannot be implemented with a simple LLM wrapper service, but require complex agent design and systematic system integration. Therefore, in an enterprise environment, a structured approach based on an Agentic AI architecture is required, rather than a simple wrapper-level service.
COMPLEXITY ISSUES WITH LLM FRAMEWORKS
THE COMPLEXITY OF THE LLM FRAMEWORK FURTHER HIGHLIGHTS THE LIMITATIONS OF THE WRAPPER STRUCTURE. MOST AI APPLICATIONS REQUIRE MULTI-LAYERED FUNCTIONALITY THAT GOES BEYOND SIMPLE RESPONSE GENERATION TO INCLUDE STRUCTURED PROCESSING, INTEGRATION WITH EXTERNAL SYSTEMS, DATA CLEANSING AND ENRICHMENT, STATE-BASED REASONING, AND SECURITY CONTROLS, REQUIRING A FRAMEWORK OR CUSTOM ARCHITECTURE THAT ALLOWS FOR MORE FLEXIBLE AND PURPOSEFUL DESIGN.
FURTHERMORE, AGENTIC DESIGN IS OFTEN REQUIRED TO PERFORM BASIC FUNCTIONS, SUCH AS SIMPLE QUESTION AND ANSWER, UNIT TASKS SUCH AS SUMMARIZING DOCUMENTS, INTERPRETING DOCUMENTS IN HETEROGENEOUS FORMATS, AND PERFORMING NUMERICAL PRECISION CALCULATIONS. FOR EXAMPLE, EXTRACTING METRICS FROM A GIVEN DOCUMENT, CALCULATING THEM NUMERICALLY, AND THEN PERFORMING VARIOUS FOLLOW-UP ACTIONS BASED ON THE RESULTS IS DIFFICULT TO IMPLEMENT IN A SINGLE LLM CALL. IN OTHER WORDS, IT REQUIRES NOT ONLY SIMPLE MULTI-AGENT COLLABORATION, BUT ALSO A SOPHISTICATED AGENT STRUCTURE AT THE FUNCTIONAL LEVEL.
Conclusion
WHEN BUILDING ENTERPRISE-WIDE WORKPLACE AGENT SYSTEMS, ORGANIZATIONS FACE A NUMBER OF LIMITATIONS AND OBSTACLES WHEN USING LLM APIS ALONE. TOKEN LIMITS AND CONTEXT WINDOW CONSTRAINTS MAKE IT DIFFICULT TO PROCESS LARGE AMOUNTS OF ENTERPRISE DATA, AND LIMITED MATH REASONING CAPABILITIES CREATE RELIABILITY ISSUES FOR TASKS THAT REQUIRE PRECISE CALCULATIONS. DATA SECURITY AND PRIVACY CONCERNS ARE CRIPPLING CONSTRAINTS IN ENVIRONMENTS DEALING WITH SENSITIVE CORPORATE INFORMATION, AND DEPENDENCY ISSUES ON CERTAIN LLM SERVICES CAN THREATEN BUSINESS CONTINUITY.
AI BUSINESS AUTOMATION FOR ENTERPRISE OPERATIONS
Get an introduction to Outcode, enterprise use cases, and expert advice all in one place