Skip to content

Blob Storage Architecture

Large-Object Storage for Planetary Compute Workloads

Forge Blob is the large-scale binary object storage layer of the Forge Pool architecture.
It provides efficient storage and retrieval of:

  • model inputs
  • media segments
  • scientific datasets
  • intermediate artifacts
  • video/audio chunks
  • adapter-specific binary payloads

Blob Storage is optimized for workloads requiring MB–GB scale data, complementing KV (small metadata) and VMem (medium-size numeric memory).


1. Purpose & Role in the Architecture

Blob Storage:

  • stores large binary objects
  • enables distributed adapters (FFmpeg, BLAS, PCA, CAT)
  • supports chunked upload/download
  • ensures deterministic availability
  • integrates directly with Hub routing
  • minimizes memory pressure on Agents
  • supports streaming and partial range reads

It is the persistent binary backbone of the planetary compute network.


2. Architecture Overview

Forge Blob consists of:

1. Object Store

Backed by a pluggable storage backend (S3, MinIO, GCS, Azure, filesystem).

2. Blob Gateway

Hub-facing REST interface for:

  • write operations
  • read operations
  • multi-part upload
  • signed URLs
  • range reads

3. Chunk Manager

Splits large files into deterministic chunks for distributed processing.

4. Metadata Index

Stores:

  • object size
  • hash checksums
  • content-type
  • adapter association
  • lifecycle/expiration

5. Security Layer

Per-project access controls with signed URLs.


3. Blob Lifecycle

  1. Client uploads object (single or multi-part)
  2. Hub records metadata + checksum
  3. Object is stored in Backend Store
  4. Adapters request chunks through signed URLs
  5. Agents stream chunk data via QUIC
  6. Results reference Blob IDs, not raw data
  7. Blob may auto-expire according to lifecycle policy

This enables memory-efficient distributed compute —
Agents never load full objects unless required.


4. API Endpoints

POST /v1/blob/upload

json
{
  "object_name": "video/segment_00013.ts",
  "content_type": "video/mp2t"
}

Returns signed URL for uploading.


GET /v1/blob/{id}

Provides metadata and access instructions.


GET /v1/blob/download?id=...

Returns signed URL for download.


Range Requests

Agents or adapters may request:

Range: bytes=2000000-4000000

Enabling partial consumption of large datasets.


5. Deterministic Storage Guarantees

Forge Blob enforces:

  • content-hash verification
  • object immutability
  • consistent regional replication (optional)
  • identity-bound access
  • audit logging

Blob IDs are globally unique and derived from project scope and content hash, with system-level entropy to prevent collisions.


6. Blob Usage Across Adapters

FFmpeg Adapter

  • segment ingestion
  • chunked transcoding
  • deterministic stitching

BLAS / Scientific Compute

  • matrix tiles
  • dataset partitions
  • PCA coefficient storage

PCA / Climate

  • anomaly fields
  • eigenvector chunks

ForgeCAT UTC

  • cyclone track files
  • ensemble members

ETA / Logistics

  • historical baseline files
  • weather-conditioning datasets

Blob Storage is deeply integrated with the entire compute ecosystem.


7. Performance Expectations

OperationTypical Runtime
Metadata fetch1–3 ms
Chunk download10–40 ms
Signed URL issue<1 ms
Multi-part commit5–20 ms

Actual throughput depends on backend storage layer.


8. Security Model

Blob Storage enforces:

  • per-project ACLs
  • time-limited signed URLs
  • mandatory TLS
  • identity verification at Hub
  • blocklist/allowlist support

Agents never receive raw credentials — all access is delegated via signed URLs.


9. Limitations

  • Not suitable for very small data (use KV)
  • Not intended for in-memory numeric structures (use VMem)
  • Uploading very large files depends on backend throughput
  • Cross-region transfer may incur latency

Related Documentation