Member of Technical Staff (AI Infrastructure Engineer)
Perplexity
San FranciscoFull-time18d ago
Looking for more like this? See all DevOps Engineer jobs.
About the role
We are looking for an AI Infra engineer to join our growing team. We work with Kubernetes, Slurm, Python, C++, PyTorch, and primarily on AWS. As an AI Infrastructure Engineer, you will be partnering closely with our Inference and Research teams to build, deploy, and optimize our large-scale AI training and inference clusters
RESPONSIBILITIES
- Design, deploy, and maintain scalable Kubernetes clusters for AI model inference and training workloads
- Manage and optimize Slurm-based HPC environments for distributed training of large language models
- Develop robust APIs and orchestration systems for both training pipelines and inference services
- Implement resource scheduling and job management systems across heterogeneous compute environments
- Benchmark system performance, diagnose bottlenecks, and implement improvements across both training and inference infrastructure
- Build monitoring, alerting, and observability solutions tailored to ML workloads running on Kubernetes
More at Perplexity
- Member of Technical Staff (Software Engineer, Computer Monetization)San Francisco
- Member of Technical Staff (Software Engineer, Monetization)San Francisco
- Member of Technical Staff (Software Engineer, Computer)San Francisco
- Product Manager (Builder)San Francisco
- Engineering Manager (Agents)San Francisco
- Engineering Manager (AI Research & Model Training)San Francisco