Coordinator (Rust) talks to nodes (Python) over gRPC + mTLS. Nodes run vLLM and execute sandboxed tasks. Nodes only talk to the coordinator.
coordinator
| | |
node node nodeTracks nodes, queues tasks, dispatches work, verifies results.
service PhageCoordinator {
rpc Register(NodeInfo)
returns (RegistrationResponse);
rpc Heartbeat(HeartbeatRequest)
returns (HeartbeatResponse);
rpc FetchTask(TaskRequest)
returns (TaskAssignment);
rpc SubmitResult(TaskResult)
returns (ResultAck);
}Register: node sends GPU info, gets node ID + mTLS cert.
Heartbeat: every 10s. GPU util, VRAM, temp, loaded model. Three misses = dead.
FetchTask: node pulls work. Coordinator picks based on VRAM and loaded model.
SubmitResult: output + sandbox attestation. Verified before accepted.
phage-node does four things:
submitted > queued > dispatched > running > verifiedTasks include a prompt, optional files, and a verifier (shell command, exit 0 = pass). For best-of-N, the same task goes to N nodes. Coordinator keeps the passing result with the lowest token count.
One vLLM instance per GPU. Sticky routing: once a node loads a model, it keeps getting tasks for that model. The coordinator won't assign a model that doesn't fit in VRAM.
| model | vram |
|---|---|
| Qwen2.5-Coder-7B | ~14 GB |
| DeepSeek-Coder-V2-Lite | ~32 GB |
| Qwen2.5-72B-AWQ | ~40 GB |
Transport: mTLS. Coordinator runs its own CA. Nodes get certs at registration.
Sandbox: gVisor. No network. Read-only root. Writable scratch only. Time limit enforced.
Attestation: signed blob per result. Hashes of sandbox config, input, output. Signed by node's mTLS key.
POST /api/tasks submit a task
GET /api/tasks/{id} status + results
GET /api/status nodes, vram, queue
GET /api/nodes node list
WS /api/feed live events