Published on 00/00/0000
Last updated on 00/00/0000
Published on 00/00/0000
Last updated on 00/00/0000
Share
Share
AI/ML
10 min read
Share
In the field of artificial intelligence (AI) development, creating distributed multi-agent software offers both opportunities and challenges. Our recent project explores the use of frameworks like LangGraph, Swarm, and AutoGen to build a cohesive network of interacting agents. Central to this system is a principal service agent developed with LangGraph, which manages user interactions and coordinates subsidiary agents across different frameworks.
Despite appearing straightforward, the unpredictable nature of Large Language Models (LLMs) and the distinct behaviors of various frameworks pose significant challenges. This project aims to develop robust, scalable workflows that maximize LLM capabilities while tackling issues like non-determinism, inter-agent communication, observability, state and error management.
Our experiments reveal four key pillars for reliable agentic workflows: State management, observability, streaming and robust error handling. To create effective multi-agent systems, let’s explore the design elements of each pillar and insights we’ve gained from this project.
The architecture comprises:
For tasks like "sort numbers," service 1 initiates, service 2 writes the code, and service 3 reviews it, with revisions continuing until service 1 is satisfied or constrained.
In multi-framework applications, effective state management is crucial for ensuring seamless execution, data consistency, and efficient resource utilization across subgraphs. The principal agent coordinates multiple subgraphs built with different frameworks, such as LangGraph, AutoGen, and OpenAI Swarm, which introduces challenges, especially in tracking performance and managing state propagation.
By harmonizing framework differences with custom state channels, version-aware state propagation, and centralized memory, we ensure efficient state management and enable granular performance tracking in multi-framework systems.
Problems: Multi-framework multi-agent systems pose several challenges
Solution: In addition to providing observability at the framework level, when switching from single framework to multi-framework multi-agent systems (MF-MAS), we need to provide a unified view of the system integrated with tracing, logging and subgraph performance metrics.
Objective: Align all agents and subgraphs under a single trace in your Application Performance Monitoring (APM) or observability tool, such as OpenTelemetry (OTEL).
High-level concept
Objective: Ensure all logs from different agents/subgraphs can be tied together using a common transaction ID or correlation key. Unified logging is particularly critical in large systems where multiple microservices, agents, or subgraphs handle portions of a single task.
Principal agent A logs:
[transaction_id=12345] Starting request for user 6789...
Subgraph B logs:
[transaction_id=12345] Received request from principal agent A. Fetching data...
The logs can be aggregated by the transaction_id=12345, showing a continuous timeline across multiple services or agents.
Objective: Provide the principal agent (and system operators) with detailed performance data for each subgraph’s execution. In a user-facing application, partial or final performance data can be displayed, indicating how the system spent time (particularly useful in advanced troubleshooting scenarios).
When switching from single framework multi-agent system (like LangGraph) to multi-framework multi-agent system, we end up with fragmented traces. We needed to unify them under a single global traceid, and did so using OpenTelemetry and LangSmith. The figure below shows the approach used.
By consistently sharing trace context, unified logging information, and performance metrics, a multi-framework application can deliver end-to-end visibility, simplify debugging, and improve reliability. This holistic approach ensures each piece of the puzzle is linked together, revealing how requests flow and how they perform across distributed agents.
In LLM-driven systems, streaming allows partial outputs to be delivered in real time, enhancing user experience, especially for long or complex tasks. The challenge arises when events or messages from agents in different subgraphs need to be bubbled up through a central orchestrator to the end user. Due to differences in frameworks, custom engineering solutions are needed for real-time streaming.
Level 0 (no streaming): The user gets a complete response after processing. Simple but slow for complex tasks.
Level 1 (synthetic progress): Handcrafted updates like "Working on it…" give users a sense of progress but lack real-time LLM outputs.
Level 2 (subgraph updates): Intermediate agents send updates (e.g., “Processing…”) without streaming LLM outputs, useful for transparency.
Level 3 (full streaming): Real-time token generation from the LLM streamed to the user, offering the most interactive experience.
Recommendations for implementing streaming across frameworks using a multi-stage event propagation flow.
To enhance user experience, consider streaming based on system capabilities, from synthetic updates to real-time LLM-token streaming. Even without full streaming support, incremental updates or progress messages can provide valuable feedback. The goal is to align each layer of the system with the chosen level of streaming.
Multi-framework agentic applications, integrating LangGraph, AutoGen, and OpenAI Swarm, face challenges in error handling due to varied error representation across frameworks. Errors in tools like Coder or Reviewer must navigate these differences and cascade through interconnected systems to reach the end user without losing context and clarity. The following gives a list of key error types that are likely to arise.
LLM misinformation
When an LLM generates incorrect responses (hallucinations) in one framework, these can propagate to tools or agents in other frameworks, amplifying misinformation across the system.
Tool call issues
Incorrect tool invocations, missing calls, or malformed parameters can block workflows across frameworks. For example, if OpenAI Swarm relies on AutoGen's Reviewer Subgraph, a failed tool call can break the entire chain.
Convergence failures
Systems may fail to reach solutions due to:
Environment mismatches
Incompatible versions of libraries, APIs, or runtime environments across frameworks can cause silent failures or crashes, especially when inter-framework dependencies are critical.
Multiple agents calling shared services simultaneously can trigger rate limits or resource exhaustion, disrupting workflows across frameworks.
To mitigate these, a robust error-handling strategy is needed to normalize error data while maintaining the core issue's meaning. At the principal agent node, errors should be translated into user-friendly messages that explain the root cause and suggest next steps, ensuring a seamless and intuitive user experience.
Our work offers early insights into the operational challenges developers may face when building agentic systems over a network. A defining trait of multi-agent software is its ability to operate seamlessly across an expanding ecosystem of agent frameworks, such as LangGraph, AutoGen, Swarm, CrewAI, and others. This interoperability empowers developers to integrate and build upon agentic applications created with different frameworks—a capability that is crucial for achieving the vision of an Internet of Agents.
Get emerging insights on innovative technology straight to your inbox.
Outshift is leading the way in building an open, interoperable, agent-first, quantum-safe infrastructure for the future of artificial intelligence.
* No email required
The Shift is Outshift’s exclusive newsletter.
The latest news and updates on generative AI, quantum computing, and other groundbreaking innovations shaping the future of technology.