Appearance
Hub Architecture
The Orchestration Core of the Planetary Compute Network
The Forge Hub is the central orchestration service responsible for distributing, coordinating, securing, and validating all compute workloads across the planetary compute fabric. It is the control plane of the system — scheduling tasks, managing Agents, aggregating results, enforcing quotas, and guaranteeing reproducibility.
The Hub never executes compute.
Its purpose is to make compute globally coordinated, deterministic, and highly scalable.
1. Roles & Responsibilities
The Hub provides the following core functions:
• Request Handling
Receives client API calls, validates payloads, authenticates keys.
• Shard Planning
Splits workloads into executable units (shards) appropriate for distributed Agents.
• Scheduling & Dispatch
Assigns shards to Agents based on:
- CPU/GPU capability
- historical reliability
- network performance
- load balancing
- project quota
- fairness scoring
• Transport Management
Maintains QUIC sessions with thousands of Agents, multiplexing streams efficiently.
• Verification
Runs redundant shard checks to detect errors, misbehaving Agents, or inconsistent outputs.
• Reduction
Aggregates partial results into deterministic final outputs.
• State & Metadata
Stores job state, timing, routing, Agent health, internal metrics.
• Provider Credits & Accounting
Calculates consumption, cost, and payouts for compute providers.
• Explicit Boundaries
- Hub never executes compute.
- Hub never requires inbound connections to Agents.
- Hub never stores raw provider secrets.
2. Internal Architecture
┌──────────────────────────┐
│ Client/API │
└─────────────┬────────────┘
↓
┌─────────────────────────────────┐
│ Hub Ingress │
│ - auth │
│ - payload validation │
└─────────────────┬───────────────┘
↓
┌────────────────────┐
│ Shard Planner │
│ (model-specific) │
└──────────┬─────────┘
↓
┌─────────────────┐
│ Scheduler │
└───────┬─────────┘
↓
┌────────────────────┐
│ QUIC Dispatcher │
└──────────┬─────────┘
↓
Agents Pool
↓
┌────────────────────┐
│ Aggregation │
└──────────┬─────────┘
↓
┌────────────────────┐
│ Final Response │
└────────────────────┘The Hub integrates multiple internal subsystems:
- Ingress Layer
- Shard Planner
- Scheduler
- QUIC Dispatcher
- Verification Engine
- Reduction/Aggregation Layer
- Storage and Metadata Services
- Telemetry Pipeline
3. Shard Planning (Workload Decomposition)
The Hub transforms a client request into distributed tasks.
Examples:
- Monte Carlo → iteration blocks
- BLAS → tile matrix multiply
- FFmpeg → time-based video segments
- Climate PCA → ensemble members
- CAT modeling → track perturbations
Shard planners are adapter-specific modules that:
- compute shard boundaries
- estimate cost
- prepare shard payloads
- ensure determinism
- attach seed offsets
- enforce project quota
4. Scheduling Model
The Scheduler optimizes shard placement based on:
- Agent latency
- CPU/GPU capability
- historical completion rate
- failure patterns
- geographic distribution
- fairness and credit weighting
Scheduling is rolling, adaptive, and non-blocking, meaning:
- Agents may drop in or out mid-job
- Hub automatically reassigns shards
- Real-time scoring redistributes work
5. Transport Layer (QUIC Integration)
Hub maintains thousands of concurrent QUIC channels:
- each Agent gets its own session
- shards are streamed as independent QUIC streams
- results are returned via bidirectional streams
- loss recovery and congestion control handled automatically
QUIC enables:
- low latency
- stable throughput under packet loss
- efficient multiplexing
- no TCP head-of-line blocking
6. Result Verification
The Hub supports several verification strategies:
• Redundant Minority Shards
5–10% of shards are duplicated and validated.
• Statistical Consistency Checks
Used for Monte Carlo workloads.
• Structural Verification
Matrix multiplication tile validation, dimension checks, etc.
• Media Checks
Segment alignment, timestamp boundary correctness.
Agents returning suspicious results are:
- downscored,
- sandboxed with reduced workload,
- or removed from future scheduling.
7. Reduction & Aggregation
The Aggregation Layer performs:
- summation of partial contributions
- merging of histograms
- variance and quantile estimation
- tensor/tile stitching
- video/audio concatenation
- PCA reconstruction
Aggregation is deterministic and stable:
- floating-point consistency
- predictable iteration order
- reproducible outputs for identical seeds
8. State Management & Metadata
Hub maintains:
- job registry
- shard state
- Agent state
- timing profiles
- compute cost and credits
- model-specific metadata
Backed by:
- KV for lightweight state
- VMem for numeric mid-state
- Blob for large objects
9. Reliability & Fault Recovery
Hub is resilient to:
- Agent disappearance
- partial results
- network failures
- long-tail slow Agents
Mechanisms:
- shard timeout detection
- dynamic rebalancing
- health scoring
- cross-shard recovery
- auto-retry
No single Agent failure can compromise a job.
10. Security Considerations
- Agents never receive raw credentials
- All communication uses QUIC TLS
- Blob access uses signed URLs
- All API calls authenticated
- Project isolation enforced
- Full audit logging available
11. Observability
Hub emits:
- shard execution logs
- latency distribution
- throughput statistics
- Agent health metrics
- scheduler decisions
- verify-step failures
- per-job billing metrics
This enables operators to:
- debug workloads
- identify slow or unreliable Agents
- monitor cost and compute time
- ensure reproducibility
