Staff Engineer, Distributed Storage and HPC & AI Infrastructure
Togetherai
San Francisco$250k – $300k6d ago
Looking for more like this? See all Senior Software Engineer jobs.
About the role
About the Role
In this role, you will design and deliver multi-petabyte storage systems purpose-built for the world’s largest AI training and inference workloads. You’ll architect high-performance parallel filesystems and object stores, evaluate and integrate cutting-edge technologies such as WekaFS, Ceph, and Lustre, and drive aggressive cost optimization-routinely achieving 30-50% savings through intelligent tiering, lifecycle policies, capacity forecasting, and right-sizing.
You will also build Kubernetes-native storage operators and self-service platforms that provide automated provisioning, strict multi-tenancy, performance isolation, and quota enforcement at cluster scale. Day-to-day, you’ll optimize end-to-end data paths for 10-50 GB/s per node, design multi-tier caching architectures, implement intelligent prefetching and model-weight distribution, and tune parallel filesystems for AI worklo
More at Togetherai
- Technical Account Manager (TAM), GPU ClusterSan Francisco
- Manager, Infrastructure Strategy & OperationsSan Francisco
- Customer Support Engineer (Inference)San Francisco, CA · $160k – $230000k
- Senior Technical Recruiter, AI/ML ResearchSan Francisco · $165k – $210k
- Engineering Manager, Site Reliability EngineeringSan Francisco
- Junior Technical Program Manager — Infrastructure OperationsSan Francisco · $150k – $175k