Skip to content

Hub Architecture

The Orchestration Core of the Planetary Compute Network

The Forge Hub is the central orchestration service responsible for distributing, coordinating, securing, and validating all compute workloads across the planetary compute fabric. It is the control plane of the system — scheduling tasks, managing Agents, aggregating results, enforcing quotas, and guaranteeing reproducibility.

The Hub never executes compute.
Its purpose is to make compute globally coordinated, deterministic, and highly scalable.


1. Roles & Responsibilities

The Hub provides the following core functions:

• Request Handling

Receives client API calls, validates payloads, authenticates keys.

• Shard Planning

Splits workloads into executable units (shards) appropriate for distributed Agents.

• Scheduling & Dispatch

Assigns shards to Agents based on:

  • CPU/GPU capability
  • historical reliability
  • network performance
  • load balancing
  • project quota
  • fairness scoring

• Transport Management

Maintains QUIC sessions with thousands of Agents, multiplexing streams efficiently.

• Verification

Runs redundant shard checks to detect errors, misbehaving Agents, or inconsistent outputs.

• Reduction

Aggregates partial results into deterministic final outputs.

• State & Metadata

Stores job state, timing, routing, Agent health, internal metrics.

• Provider Credits & Accounting

Calculates consumption, cost, and payouts for compute providers.

• Explicit Boundaries

  • Hub never executes compute.
  • Hub never requires inbound connections to Agents.
  • Hub never stores raw provider secrets.

2. Internal Architecture

      ┌──────────────────────────┐
      │        Client/API        │
      └─────────────┬────────────┘

    ┌─────────────────────────────────┐
    │          Hub Ingress            │
    │  - auth                         │
    │  - payload validation           │
    └─────────────────┬───────────────┘

          ┌────────────────────┐
          │   Shard Planner    │
          │ (model-specific)   │
          └──────────┬─────────┘

            ┌─────────────────┐
            │   Scheduler     │
            └───────┬─────────┘

        ┌────────────────────┐
        │   QUIC Dispatcher  │
        └──────────┬─────────┘

              Agents Pool

        ┌────────────────────┐
        │    Aggregation     │
        └──────────┬─────────┘

        ┌────────────────────┐
        │   Final Response   │
        └────────────────────┘

The Hub integrates multiple internal subsystems:

  • Ingress Layer
  • Shard Planner
  • Scheduler
  • QUIC Dispatcher
  • Verification Engine
  • Reduction/Aggregation Layer
  • Storage and Metadata Services
  • Telemetry Pipeline

3. Shard Planning (Workload Decomposition)

The Hub transforms a client request into distributed tasks.

Examples:

  • Monte Carlo → iteration blocks
  • BLAS → tile matrix multiply
  • FFmpeg → time-based video segments
  • Climate PCA → ensemble members
  • CAT modeling → track perturbations

Shard planners are adapter-specific modules that:

  • compute shard boundaries
  • estimate cost
  • prepare shard payloads
  • ensure determinism
  • attach seed offsets
  • enforce project quota

4. Scheduling Model

The Scheduler optimizes shard placement based on:

  • Agent latency
  • CPU/GPU capability
  • historical completion rate
  • failure patterns
  • geographic distribution
  • fairness and credit weighting

Scheduling is rolling, adaptive, and non-blocking, meaning:

  • Agents may drop in or out mid-job
  • Hub automatically reassigns shards
  • Real-time scoring redistributes work

5. Transport Layer (QUIC Integration)

Hub maintains thousands of concurrent QUIC channels:

  • each Agent gets its own session
  • shards are streamed as independent QUIC streams
  • results are returned via bidirectional streams
  • loss recovery and congestion control handled automatically

QUIC enables:

  • low latency
  • stable throughput under packet loss
  • efficient multiplexing
  • no TCP head-of-line blocking

6. Result Verification

The Hub supports several verification strategies:

• Redundant Minority Shards

5–10% of shards are duplicated and validated.

• Statistical Consistency Checks

Used for Monte Carlo workloads.

• Structural Verification

Matrix multiplication tile validation, dimension checks, etc.

• Media Checks

Segment alignment, timestamp boundary correctness.

Agents returning suspicious results are:

  • downscored,
  • sandboxed with reduced workload,
  • or removed from future scheduling.

7. Reduction & Aggregation

The Aggregation Layer performs:

  • summation of partial contributions
  • merging of histograms
  • variance and quantile estimation
  • tensor/tile stitching
  • video/audio concatenation
  • PCA reconstruction

Aggregation is deterministic and stable:

  • floating-point consistency
  • predictable iteration order
  • reproducible outputs for identical seeds

8. State Management & Metadata

Hub maintains:

  • job registry
  • shard state
  • Agent state
  • timing profiles
  • compute cost and credits
  • model-specific metadata

Backed by:

  • KV for lightweight state
  • VMem for numeric mid-state
  • Blob for large objects

9. Reliability & Fault Recovery

Hub is resilient to:

  • Agent disappearance
  • partial results
  • network failures
  • long-tail slow Agents

Mechanisms:

  • shard timeout detection
  • dynamic rebalancing
  • health scoring
  • cross-shard recovery
  • auto-retry

No single Agent failure can compromise a job.


10. Security Considerations

  • Agents never receive raw credentials
  • All communication uses QUIC TLS
  • Blob access uses signed URLs
  • All API calls authenticated
  • Project isolation enforced
  • Full audit logging available

11. Observability

Hub emits:

  • shard execution logs
  • latency distribution
  • throughput statistics
  • Agent health metrics
  • scheduler decisions
  • verify-step failures
  • per-job billing metrics

This enables operators to:

  • debug workloads
  • identify slow or unreliable Agents
  • monitor cost and compute time
  • ensure reproducibility

Related Documentation