Software Engineer, Inference - Performance Optimization
Openai
San FranciscoFull-time5d ago
Looking for more like this? See all Software Engineer jobs.
About the role
About the Team
Our team analyzes inference stack performance across the application, model, and fleet layers to identify bottlenecks and drive faster, cheaper inference. We combine systems profiling, benchmarking, and analysis to understand where time and cost are spent, then turn that understanding into performance optimizations and models that project performance and capacity needs for future launches.
About the Role
In this role, you will model inference performance across application, model, and fleet layers with higher fidelity. You will build cost-to-serve estimates from microbenchmarks and create tools that help cross-functional teams reason about latency, capacity, utilization, and cost tradeoffs.
In this role, you will:
- Build and refine performance models that translate microbenchmark results into cost-to-serve estimates.
- Analyze inference workloads end to end across applications, models, and fleet infrastructure.
- Enhance tooling to identify bottlenecks across la
More at Openai
- GTM Strategy & Operations Lead, EnterpriseSan Francisco
- GTM Strategy & Operations, Strategic ProgramsSan Francisco
- Software Engineer, Productivity - Model PerformanceSan Francisco
- Field EngineerSan Francisco
- Program Manager, M&A Recruiting | PeopleSan Francisco
- GTM Strategy & Growth Lead, CodexSan Francisco