Skip to content

Software Engineer, Inference - Performance Optimization

Openai

San FranciscoFull-time5d ago
Looking for more like this? See all Software Engineer jobs.

About the role

About the Team Our team analyzes inference stack performance across the application, model, and fleet layers to identify bottlenecks and drive faster, cheaper inference. We combine systems profiling, benchmarking, and analysis to understand where time and cost are spent, then turn that understanding into performance optimizations and models that project performance and capacity needs for future launches. About the Role In this role, you will model inference performance across application, model, and fleet layers with higher fidelity. You will build cost-to-serve estimates from microbenchmarks and create tools that help cross-functional teams reason about latency, capacity, utilization, and cost tradeoffs. In this role, you will: - Build and refine performance models that translate microbenchmark results into cost-to-serve estimates. - Analyze inference workloads end to end across applications, models, and fleet infrastructure. - Enhance tooling to identify bottlenecks across la

More at Openai