Nvidia Agentic Smart Router on Dell Enterprise Hub : Deepdive on Architecture,Design and Framework

Community Article Published February 3, 2026

Rethinking Enterprise AI Architecture with a Multi‑Framework Approach

Modern enterprise AI application deployments face two foundational architectural challenges:

  • First, organizations need a practical way to leverage multiple large language models (LLMs) while balancing three competing priorities: performance, cost, and accuracy. As model catalogs expand and workloads diversify, choosing a single model—or manually orchestrating several—quickly becomes inefficient and costly.

  • Second, most enterprise AI systems are still built around a single-framework paradigm. While convenient at first, this approach often forces development teams into rigid patterns where architectural consistency outweighs choosing the best tool for each task. As applications evolve in scale and complexity, these constraints become bottlenecks, limiting innovation and adaptability.

Multi‑Framework, Agent‑Oriented Design with NVIDIA NAT

In this blog, we explore an application architecture designed to address both challenges using the NVIDIA NeMo Agent Toolkit (NAT). NAT promotes a framework‑agnostic approach, enabling teams to integrate specialized tools without being locked into one AI development ecosystem. Instead of centralizing all capabilities under a single framework, NAT allows developers to compose best-of-breed components, each selected for its strengths:

  • LangChain / LangGraph Ideal for agent orchestration, workflow state management, and implementing complex decision logic with fine‑grained control over how agents coordinate and communicate.

  • LlamaIndex Optimized for document indexing, semantic search, and retrieval‑augmented generation (RAG). It provides advanced query engines and highly efficient context‑retrieval mechanisms essential for enterprise knowledge workflows.

Solving the Multi‑Model Challenge with NVIDIA’s LLM Router

To address the first dilemma—how to efficiently leverage multiple LLMs—we integrate NVIDIA’s LLM Router. This component intelligently selects the optimal model at runtime based on factors such as:

  • Task difficulty
  • Latency requirements
  • Cost constraints
  • Accuracy needs

The LLM Router dynamically matches each request with the most suitable model, delivering the right trade‑off without manual tuning or static rules. This ensures organizations get maximum value from heterogeneous model catalogs, from lightweight task‑specific models to high‑end frontier models.

System Architecture

The Multi‑Framework Smart Router application is built on a layered architecture designed for scalability, modularity, and intelligent model selection. The diagram below represents the overall system topology and how each component collaborates to deliver a robust enterprise AI workflow.

hfblog1

This architecture demonstrates how the Nvidia NAT powers scalable multi‑model systems through flexible orchestration and dynamic routing.

  • At the top of the stack, a LangChain‑based supervisory agent—backed by Llama‑3.3‑70B‑Instruct—acts as the system’s control plane. It manages conversational context, decides when tools should be invoked, and coordinates the execution flow across downstream components.
  • To enhance domain‑specific accuracy, the system incorporates a retrieval layer, enabling the agent to ground responses in relevant enterprise knowledge via retrieval‑augmented generation (RAG).
  • For intelligent model selection, the application integrates NVIDIA’s LLM Router Blueprint, which routes incoming requests between conversational models and high‑reasoning models. This ensures the system balances performance, cost, and accuracy by choosing the optimal model for each task in real time.
  • Finally, Arize Phoenix provides full end‑to‑end observability. It offers insights into agent behavior, routing decisions, RAG performance, and system latency—ensuring the application is production‑ready, debuggable, and highly extensible.

hfblog5

Together, these layers form a cohesive architecture that delivers flexible, efficient, and enterprise‑grade AI applications.

Integrating NeMo Agent Toolkit with LLM Router

Let’s first understand the Nvidia LLM router, NVIDIA LLM Router Blueprint enables intelligent prompt routing by selecting the most appropriate LLM based on task complexity. It balances reasoning quality, latency, and cost by avoiding unnecessary use of large, resource-intensive models for simple requests.

hfblog2

LLM Router Integration with Nemo Agent Toolkit

The NAT Smart Router leverages a sophisticated plugin system that enables extensibility without compromising the core architecture. The integration is implemented at the toolkit level, making LLM Router functionality available to all workflows and agent implementations.

hfblog3

A custom LLM provider named llm_router is registered within the toolkit's plugin system.

hfblog4

This provider implements the standard LLM interface while adding router-specific functionality. Developers can reference this provider in any workflow configuration file config.yml in this path, making it seamlessly accessible across the entire application

 NeMo-Agent-Toolkit/examples/frameworks/multi_frameworks_llm_router/src/nat_multi_frameworks_llm_router/configs/ 

# Example config.yaml : configuration file that you can modify to customize the application

general:
  use_uvloop: true
  telemetry:
    logging:
      console:
        _type: console
        level: WARN
      file:
        _type: file
        path: /tmp/multi_frameworks_llm_router.log
        level: DEBUG
    tracing:
      phoenix:
        _type: phoenix
        endpoint: http://phoenix:6006/v1/traces
        project: multi_frameworks_llm_router

functions:
  llama_index_rag2:
    _type: llama_index_rag2
    llm_name: nim_llm
    model_name : meta/llama-3.3-70b-instruct
    embedding_name : nim_embedder
    data_dir : ./examples/frameworks/multi_frameworks_llm_router/README.md  
  llm_router_tool:
    _type: llm_router_tool
    llm_name: llm_router

llms:
  nim_llm:
    _type: nim
    model_name : meta/llama-3.3-70b-instruct
    temperature: 0.0
  llm_router:
    _type: llm_router
    api_key: 'XXX'
    base_url: http://ROUTER-CONTROLLER-HOST-IP 
    policy: task_router
    routing_strategy: triton

embedders:
  nim_embedder:
    _type: nim
    model_name: nvidia/nv-embedqa-e5-v5
    truncate: END

workflow:
  _type: multi_frameworks_llm_router
  llm : nim_llm  
  data_dir : ./examples/frameworks/multi_frameworks_llm_router/README.md
  rag_tool: llama_index_rag2  
  llm_router_tool: llm_router_tool

    

Conclusion : A Blueprint for Modern AI Systems

As AI systems evolve and new models emerge, the ability to dynamically route each request to the most suitable model will become essential. The NAT Smart Router lays this foundation by enabling a flexible, multi‑model architecture rather than relying on a single, monolithic LLM.

  • Its plugin‑based design makes routing a native capability: a custom llm_router provider extends the standard LLM interface at the toolkit layer, allowing any workflow to enable routing through simple configuration.
  • By combining LangChain for agent orchestration, LlamaIndex for retrieval‑augmented generation, and NVIDIA’s LLM Router for intelligent model selection, the system delivers higher accuracy, better performance, and lower cost without architectural trade‑offs.
  • For organizations operating AI at scale, this architecture offers a practical, production‑ready path forward, demonstrating that advanced capabilities and operational efficiency can coexist in a unified, multi‑framework design.
  • Nvidia Agentic Smart Router application is available today in the Dell Enterprise Hub's App Catalog, deploy it on-premises with just few helm commands https://dell.huggingface.co/authenticated/apps/agentic-smart-router
  • Learn more about Nvidia Nemo Agent Toolkit here https://developer.nvidia.com/nemo-agent-toolkit

Community

Excellent deep dive into multi-framework architecture! The way you've visualized the layered system topology with LangChain orchestration and LlamaIndex retrieval is exactly what teams need to understand these complex systems. For anyone looking to document similar architectures, I've been using InfraSketch (https://www.infrasketch.net/) which lets you describe systems in plain English and generates architecture diagrams in seconds—super helpful for communicating multi-model designs like the LLM Router setup you've described here. The conversational refinement feature has been a game-changer for iterating on system designs with the team.

Sign up or log in to comment