Member of Engineering (Pre-training and inference fault tolerance)
poolside
About this role
The Member of Engineering focused on Pre-training and Inference Fault Tolerance at Poolside plays a vital role in enhancing the reliability and fault tolerance of Large Language Models (LLMs) during distributed training. Responsibilities include troubleshooting hardware issues, minimizing GPU idle time, developing recovery tools, and improving checkpointing performance through high-quality coding in Python, C/C++, and CUDA. The position requires a strong background in engineering, a solid understanding of distributed systems and LLM fundamentals, and proficiency with frameworks like Torch. Candidates should possess strong algorithmic skills and be comfortable working within Linux environments.
Skills
About poolside
poolside.aiPoolside is a foundation model company dedicated to infusing intelligence into the workplace, with the mission of driving abundance for humanity through the development of artificial general intelligence. By engaging in cutting-edge research, Poolside aims to transform frontier research into practical operational intelligence solutions. The company focuses on making advanced AI tools accessible across various domains of work.
Recent company news
AI startup Poolside teams up with CoreWeave on 2GW data center in Texas
Oct 16, 2025
Nvidia to invest up to $1 billion in AI startup Poolside, Bloomberg News reports
Oct 30, 2025
How poolside pioneers AI assisted software development on AWS
Apr 25, 2025
CoreWeave Partners with Poolside to Deliver AI Cloud Services
Oct 16, 2025
Nvidia to Invest Up to $1 Billion in AI Startup Poolside
Oct 30, 2025
About poolside
Headquarters
San Francisco, CA
Company Size
201-500 employees
Founded
2018
Industry
Technology
Glassdoor Rating
4.2 / 5
Leadership Team
Sarah Johnson
Chief Executive Officer
Michael Chen
Chief Technology Officer
Emily Williams
VP of Engineering
David Rodriguez
VP of Product
Jessica Thompson
Chief Financial Officer
Andrew Park
VP of Sales
Unlock Company Insights
View leadership team, funding history,
and employee contacts for poolside.
Salary
$69k – $93k
per year
More jobs at poolside
Similar Jobs
Head of Inference Kernels
Etched
Senior Deep Learning Architect, LLM Inference
NVIDIA
Senior Deep Learning Software Engineer, Inference and Model Optimization
NVIDIA
Software Engineer, LLM Inference
NVIDIA
Manager, Large Language Model Inference
NVIDIA
Senior Deep Learning Software Engineer, Inference and Model Optimization
NVIDIA