Principal, Customer Reliability Engineer
Crusoe(7 hours ago)
About this role
Crusoe is seeking a Principal Customer Reliability Engineer to define and implement the technical reliability strategy for their cloud infrastructure focused on AI/ML workloads. The role involves systems architecture, incident management, and influencing the company's engineering roadmap, with a focus on high-performance GPU infrastructure, Kubernetes, and distributed systems.
Required Skills
- Kubernetes
- Linux Internals
- Infiniband
- RDMA
- GPU Performance Engineering
- Distributed Systems
- Cloud Infrastructure
- Reliability Engineering
- Systems Architecture
- AI/ML
About Crusoe
crusoe.aiCrusoe is a leading provider of next-generation AI infrastructure that focuses on renewable-powered cloud computing solutions. By employing an energy-first approach, Crusoe enables businesses to deploy AI workloads at scale while ensuring reliable performance and round-the-clock support. The company is committed to advancing sustainable technology, making it a strategic partner for organizations looking to enhance their AI capabilities in an environmentally conscious manner.
View more jobs at Crusoe →Apply instantly with AI
Let ApplyBlast auto-apply to jobs like this for you. Save hours on applications and land your dream job faster.
More jobs at Crusoe
Similar Jobs
Senior Systems Engineer - AI Infrastructure
Clockwork.io(1 month ago)
Systems Engineer - AI Infrastructure
Clockwork.io(30 days ago)
Principal Software Engineer – Scale-Up Networking (GPU-Centric)
Hewlett Packard Enterprise(23 days ago)
Senior AI Performance and Efficiency Engineer
NVIDIA(1 month ago)
Engineering Manager, HPC Kubernetes Platform
NorthMark Strategies(3 months ago)
Site Reliability Engineer, AI/ML Infrastructure
Boson AI(25 days ago)