Distinguished Engineer, GPU Fleet Operations Automation
NVIDIA(1 month ago)
About this role
A senior technology leader responsible for defining and driving DGX Cloud strategy for GPU fleet lifecycle, health, observability, utilization monitoring, and remediation across multiple environments. The role guides technical strategy and architecture, collaborates with customers and partners, and leads full software and system lifecycle efforts to deliver highly available accelerated computing infrastructure. It focuses on implementing auto-remediation and operational excellence for enterprise, public cloud, and high-security deployments.
Required Skills
- Cloud Infrastructure
- Fleet Management
- Observability
- Auto-Remediation
- Kubernetes
- Bare Metal
- Virtualization
- Containerization
- AI/ML Platforms
- Architecture
+2 more
Qualifications
- BS/MS or Higher in Systems or Software Engineering
About NVIDIA
nvidia.comNVIDIA invents the GPU and drives advances in AI, HPC, gaming, creative design, autonomous vehicles, and robotics.
View more jobs at NVIDIA →Apply instantly with AI
Let ApplyBlast auto-apply to jobs like this for you. Save hours on applications and land your dream job faster.
More jobs at NVIDIA
Similar Jobs
R&D Principal Software Engineer - Graphics Engineer
Broadcom(29 days ago)
Staff Site Reliability Engineer - Platform
IonQ(5 months ago)
Director of GPU Fleet Operations
Gruve(13 days ago)
Software Engineer, Fleet Management
OpenAI(3 months ago)
Tech Lead Manager, Fleet Management
Collaborative Robotics(3 months ago)
GPU Software Engineer
KLA(5 days ago)