DigitalOcean — Droplet Compute / AI Infrastructure
- Own the compute control plane for the Droplet platform — ~80 HTTP and gRPC endpoints across 11 services that provision, schedule, and manage the capacity and lifecycle of a multi-tenant fleet of 2M+ active instances and 500K+ users, sustaining 650 RPS (56M req/day).
- Built capacity controls that keep the fleet ahead of demand — proactive quota and capacity-limit signals for enterprise customers (including GPU / AI-infrastructure capacity), surfacing risk before it impacts workloads.
- Led migration of Droplet infrastructure from direct PostgreSQL/CockroachDB access to a centralized gRPC control plane on Kubernetes — decoupling provisioning and lifecycle services, improving fleet-wide agility, isolation, and fault-tolerance.
- Lead P0/P1 incident response across the compute estate — root-cause analysis, postmortems, and metrics-based alerting (Prometheus, Grafana, OpenSearch) that measurably reduced MTTR.