Operations Engineering Manager, Fleet Reliability
CoreWeave(2 days ago)
About this role
CoreWeave is a cloud platform specialized for AI, providing infrastructure, tools, and expertise to facilitate AI development and scaling. The company serves AI labs, startups, and global enterprises, focusing on performance and technical support to accelerate AI breakthroughs.
Required Skills
- Automation
- Observability
- Incident Management
- Change Management
- Networking
- Server Hardware
- Process Improvement
- Leadership
- Reliability Engineering
- Troubleshooting
About CoreWeave
coreweave.comCoreWeave is a cloud provider purpose-built for GPU-accelerated AI and high-performance compute workloads, positioning itself as "The Essential Cloud for AI." It offers on-demand and dedicated GPU infrastructure (bare metal, virtual machines, and Kubernetes), high-performance networking and storage, and managed services to support large-scale training, inference, and graphics rendering. CoreWeave emphasizes performance, cost-efficiency, and operational support so enterprises and research teams can deploy and scale AI workloads with predictable performance and security.
View more jobs at CoreWeave →Apply instantly with AI
Let ApplyBlast auto-apply to jobs like this for you. Save hours on applications and land your dream job faster.
More jobs at CoreWeave
Similar Jobs
AI Infrastructure Operations Engineer
Cerebras Systems(13 days ago)
Site Manager, Datacenter Operations
Fluidstack(17 days ago)
Security Operations Center Manager
Cerebras Systems(5 days ago)
IT Support Manager
Nebius(18 days ago)
Member of Technical Staff - Reliability Engineering
Modal(1 month ago)
Director, Reliability Engineering
HubSpot(1 month ago)