docs

architecture

Coordinator (Rust) talks to nodes (Python) over gRPC + mTLS. Nodes run vLLM and execute sandboxed tasks. Nodes only talk to the coordinator.

coordinator
 |     |     |
node  node  node

coordinator

Tracks nodes, queues tasks, dispatches work, verifies results.

gRPC

service PhageCoordinator {
  rpc Register(NodeInfo)
      returns (RegistrationResponse);
  rpc Heartbeat(HeartbeatRequest)
      returns (HeartbeatResponse);
  rpc FetchTask(TaskRequest)
      returns (TaskAssignment);
  rpc SubmitResult(TaskResult)
      returns (ResultAck);
}

Register: node sends GPU info, gets node ID + mTLS cert.

Heartbeat: every 10s. GPU util, VRAM, temp, loaded model. Three misses = dead.

FetchTask: node pulls work. Coordinator picks based on VRAM and loaded model.

SubmitResult: output + sandbox attestation. Verified before accepted.

node daemon

phage-node does four things:

  1. Manages local vLLM (start, monitor, model loading)
  2. Runs tasks in gVisor sandbox (no network, read-only root)
  3. Reports metrics via heartbeat
  4. Pulls and executes tasks in a loop

task lifecycle

submitted > queued > dispatched > running > verified

Tasks include a prompt, optional files, and a verifier (shell command, exit 0 = pass). For best-of-N, the same task goes to N nodes. Coordinator keeps the passing result with the lowest token count.

model serving

One vLLM instance per GPU. Sticky routing: once a node loads a model, it keeps getting tasks for that model. The coordinator won't assign a model that doesn't fit in VRAM.

modelvram
Qwen2.5-Coder-7B~14 GB
DeepSeek-Coder-V2-Lite~32 GB
Qwen2.5-72B-AWQ~40 GB

security

Transport: mTLS. Coordinator runs its own CA. Nodes get certs at registration.

Sandbox: gVisor. No network. Read-only root. Writable scratch only. Time limit enforced.

Attestation: signed blob per result. Hashes of sandbox config, input, output. Signed by node's mTLS key.

API

POST /api/tasks       submit a task
GET  /api/tasks/{id}  status + results
GET  /api/status      nodes, vram, queue
GET  /api/nodes       node list
WS   /api/feed        live events