Member of Technical Staff (Data Scientist, Evals)
Perplexity
San FranciscoFull-time6d ago
Looking for more like this? See all Data Scientist jobs.
About the role
Perplexity serves tens of millions of users daily with reliable, high-quality answers grounded in an LLM-first search engine and our specialized data sources. We aim to use the latest models as they are released, but the intelligence frontier is a jagged one, and popular benchmarks do not effectively cover our use cases. In this role, you will build specialized evals to improve answer quality across Perplexity, covering search-based LLM answers and other scenarios popular with our users.
RESPONSIBILITIES
- Architect and maintain automated evaluation pipelines to assess answer quality across Perplexity's products, ensuring high standards for accuracy and helpfulness
- Design evaluation sets and methods specifically to measure the impact of tool calls (particularly web search retrieval) on the final answer's quality
- Develop VLM-based solutions to programmatically evaluate how final answers render visually across different platforms and devices
- Continuously review public b