Nvidia Agentic Smart Router on Dell Enterprise Hub : Deepdive on Architecture,Design and Framework
Rethinking Enterprise AI Architecture with a Multi‑Framework Approach
Modern enterprise AI application deployments face two foundational architectural challenges:First, organizations need a practical way to leverage multiple large language models (LLMs) while balancing three competing priorities: performance, cost, and accuracy. As model catalogs expand and workloads diversify, choosing a single model—or manually orchestrating several—quickly becomes inefficient and costly.
Second, most enterprise AI systems are still built around a single-framework paradigm. While convenient at first, this approach often forces development teams into rigid patterns where architectural consistency outweighs choosing the best tool for each task. As applications evolve in scale and complexity, these constraints become bottlenecks, limiting innovation and adaptability.
Multi‑Framework, Agent‑Oriented Design with NVIDIA NAT
In this blog, we explore an application architecture designed to address both challenges using the NVIDIA NeMo Agent Toolkit (NAT). NAT promotes a framework‑agnostic approach, enabling teams to integrate specialized tools without being locked into one AI development ecosystem. Instead of centralizing all capabilities under a single framework, NAT allows developers to compose best-of-breed components, each selected for its strengths:
LangChain / LangGraph Ideal for agent orchestration, workflow state management, and implementing complex decision logic with fine‑grained control over how agents coordinate and communicate.
LlamaIndex Optimized for document indexing, semantic search, and retrieval‑augmented generation (RAG). It provides advanced query engines and highly efficient context‑retrieval mechanisms essential for enterprise knowledge workflows.
Solving the Multi‑Model Challenge with NVIDIA’s LLM Router
To address the first dilemma—how to efficiently leverage multiple LLMs—we integrate NVIDIA’s LLM Router. This component intelligently selects the optimal model at runtime based on factors such as:
- Task difficulty
- Latency requirements
- Cost constraints
- Accuracy needs
The LLM Router dynamically matches each request with the most suitable model, delivering the right trade‑off without manual tuning or static rules. This ensures organizations get maximum value from heterogeneous model catalogs, from lightweight task‑specific models to high‑end frontier models.
System Architecture
The Multi‑Framework Smart Router application is built on a layered architecture designed for scalability, modularity, and intelligent model selection. The diagram below represents the overall system topology and how each component collaborates to deliver a robust enterprise AI workflow.
This architecture demonstrates how the Nvidia NAT powers scalable multi‑model systems through flexible orchestration and dynamic routing.
- At the top of the stack, a LangChain‑based supervisory agent—backed by Llama‑3.3‑70B‑Instruct—acts as the system’s control plane. It manages conversational context, decides when tools should be invoked, and coordinates the execution flow across downstream components.
- To enhance domain‑specific accuracy, the system incorporates a retrieval layer, enabling the agent to ground responses in relevant enterprise knowledge via retrieval‑augmented generation (RAG).
- For intelligent model selection, the application integrates NVIDIA’s LLM Router Blueprint, which routes incoming requests between conversational models and high‑reasoning models. This ensures the system balances performance, cost, and accuracy by choosing the optimal model for each task in real time.
- Finally, Arize Phoenix provides full end‑to‑end observability. It offers insights into agent behavior, routing decisions, RAG performance, and system latency—ensuring the application is production‑ready, debuggable, and highly extensible.
Together, these layers form a cohesive architecture that delivers flexible, efficient, and enterprise‑grade AI applications.
Integrating NeMo Agent Toolkit with LLM Router
Let’s first understand the Nvidia LLM router, NVIDIA LLM Router Blueprint enables intelligent prompt routing by selecting the most appropriate LLM based on task complexity. It balances reasoning quality, latency, and cost by avoiding unnecessary use of large, resource-intensive models for simple requests.
LLM Router Integration with Nemo Agent Toolkit
The NAT Smart Router leverages a sophisticated plugin system that enables extensibility without compromising the core architecture. The integration is implemented at the toolkit level, making LLM Router functionality available to all workflows and agent implementations.
A custom LLM provider named llm_router is registered within the toolkit's plugin system.
This provider implements the standard LLM interface while adding router-specific functionality. Developers can reference this provider in any workflow configuration file config.yml in this path, making it seamlessly accessible across the entire application
NeMo-Agent-Toolkit/examples/frameworks/multi_frameworks_llm_router/src/nat_multi_frameworks_llm_router/configs/
# Example config.yaml : configuration file that you can modify to customize the application
general:
use_uvloop: true
telemetry:
logging:
console:
_type: console
level: WARN
file:
_type: file
path: /tmp/multi_frameworks_llm_router.log
level: DEBUG
tracing:
phoenix:
_type: phoenix
endpoint: http://phoenix:6006/v1/traces
project: multi_frameworks_llm_router
functions:
llama_index_rag2:
_type: llama_index_rag2
llm_name: nim_llm
model_name : meta/llama-3.3-70b-instruct
embedding_name : nim_embedder
data_dir : ./examples/frameworks/multi_frameworks_llm_router/README.md
llm_router_tool:
_type: llm_router_tool
llm_name: llm_router
llms:
nim_llm:
_type: nim
model_name : meta/llama-3.3-70b-instruct
temperature: 0.0
llm_router:
_type: llm_router
api_key: 'XXX'
base_url: http://ROUTER-CONTROLLER-HOST-IP
policy: task_router
routing_strategy: triton
embedders:
nim_embedder:
_type: nim
model_name: nvidia/nv-embedqa-e5-v5
truncate: END
workflow:
_type: multi_frameworks_llm_router
llm : nim_llm
data_dir : ./examples/frameworks/multi_frameworks_llm_router/README.md
rag_tool: llama_index_rag2
llm_router_tool: llm_router_tool
Conclusion : A Blueprint for Modern AI Systems
As AI systems evolve and new models emerge, the ability to dynamically route each request to the most suitable model will become essential. The NAT Smart Router lays this foundation by enabling a flexible, multi‑model architecture rather than relying on a single, monolithic LLM.
- Its plugin‑based design makes routing a native capability: a custom
llm_routerprovider extends the standard LLM interface at the toolkit layer, allowing any workflow to enable routing through simple configuration. - By combining LangChain for agent orchestration, LlamaIndex for retrieval‑augmented generation, and NVIDIA’s LLM Router for intelligent model selection, the system delivers higher accuracy, better performance, and lower cost without architectural trade‑offs.
- For organizations operating AI at scale, this architecture offers a practical, production‑ready path forward, demonstrating that advanced capabilities and operational efficiency can coexist in a unified, multi‑framework design.
- Nvidia Agentic Smart Router application is available today in the Dell Enterprise Hub's App Catalog, deploy it on-premises with just few helm commands https://dell.huggingface.co/authenticated/apps/agentic-smart-router
- Learn more about Nvidia Nemo Agent Toolkit here https://developer.nvidia.com/nemo-agent-toolkit




