AIDP Neural Cloud: Distributed LLM Inference on Decentralized GPU Networks

Live Paper

Authors: Matthew Karsten (Purple Squirrel Networks)
Date: February 2026
License: MIT

Related Resources

Resource Link
Model purple-squirrel-r1
Model (GGUF) purple-squirrel-r1-gguf
Model (Multichain) purple-squirrel-r1-multichain
Training Data purple-squirrel-training
Companion Paper AIDP Video Forge
Live Paper aidp-neural-cloud.pages.dev
GitHub ExpertVagabond

Abstract

We present AIDP Neural Cloud, a distributed large language model (LLM) inference system built on decentralized GPU networks. Our approach leverages geographically distributed GPU nodes to provide OpenAI-compatible LLM inference with significant improvements in both cost efficiency and latency. Through intelligent load balancing and fault-tolerant architecture, we achieve 47% cost reduction and 28% faster latency compared to centralized providers like OpenAI. The system demonstrates scalability to 50 requests per second with automatic failover capabilities, making decentralized GPU compute viable for production LLM deployments.

Key Results

Metric AIDP Neural Cloud OpenAI GPT-4o-mini Improvement
p50 Latency 180ms 250ms 28% faster
Cost per 1M tokens $0.08 $0.15 47% cheaper
Throughput 50 req/s N/A Scalable

Architecture

+---------------------------------------------------------+
|                  Neural Cloud                           |
+---------------------------------------------------------+
|  API Gateway                                            |
|  +-- /v1/chat/completions (OpenAI-compatible)           |
+---------------------------------------------------------+
|  Load Balancer                                          |
|  +-- Health checks -> Route to fastest node             |
+---------------------------------------------------------+
|  AIDP GPU Workers (N nodes)                             |
|  +-- vLLM inference engine                              |
|  +-- Continuous batching                                |
|  +-- PagedAttention for KV cache                        |
+---------------------------------------------------------+

Quick Start

import openai

client = openai.OpenAI(
    base_url="https://neural-cloud.aidp.store/v1",
    api_key="your-api-key"
)

response = client.chat.completions.create(
    model="purple-squirrel-r1",
    messages=[
        {"role": "user", "content": "Explain decentralized GPU compute"}
    ]
)
print(response.choices[0].message.content)

Benchmark Results

Latency Comparison

Metric AIDP Neural Cloud OpenAI GPT-4o-mini Improvement
p50 Latency 180ms 250ms 28% faster
p95 Latency 320ms 450ms 29% faster
p99 Latency 480ms 650ms 26% faster

Cost Analysis

Usage AIDP Neural Cloud OpenAI GPT-4o-mini Annual Savings
1M tokens/month $0.08 $0.15 $0.84/year
10M tokens/month $0.80 $1.50 $8.40/year
120M tokens/year $9.60 $18.00 $8.40/year

Throughput Scalability

Concurrent Users Requests/Second Average Latency Error Rate
1 5.2 180ms 0%
10 32.1 195ms 0%
50 50.3 285ms 0.2%

Technical Contributions

  1. Distributed Architecture: Novel load balancing system routing requests across decentralized GPU nodes
  2. Cost Efficiency: 47% reduction in inference costs through decentralized resource pooling
  3. Fault Tolerance: Automatic failover with sub-second recovery when nodes go offline
  4. OpenAI Compatibility: Drop-in replacement API enabling zero-code migration

Citation

@techreport{karsten2026neuralcloud,
  title={AIDP Neural Cloud: Distributed LLM Inference on Decentralized GPU Networks},
  author={Karsten, Matthew},
  institution={Purple Squirrel Networks},
  year={2026},
  month={February},
  url={https://huggingface.co/purplesquirrelnetworks/aidp-neural-cloud-paper}
}

Built by Purple Squirrel Networks

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including purplesquirrelnetworks/aidp-neural-cloud-paper