Product Manager, Health Automation and Resilience
NVIDIA(1 month ago)
About this role
A technical Product Manager on NVIDIA DGX Cloud responsible for defining the product vision for Health Automation and Resilience for GPU fleets. The role focuses on building foundational automation and resilience capabilities that ensure consistent performance of large AI clusters and improving provider and customer experience. You will work closely with engineering, cloud partners, and operators to drive product direction and delivery across observability, telemetry, and distributed systems.
Required Skills
- Health Automation
- Resilience Engineering
- Fault Detection
- Failure Classification
- Repair Automation
- Telemetry
- Observability
- Distributed Systems
- Cloud Infrastructure
- Product Strategy
+2 more
Qualifications
- Bachelor’s Degree in Computer Science or Engineering
About NVIDIA
nvidia.comNVIDIA invents the GPU and drives advances in AI, HPC, gaming, creative design, autonomous vehicles, and robotics.
View more jobs at NVIDIA →Apply instantly with AI
Let ApplyBlast auto-apply to jobs like this for you. Save hours on applications and land your dream job faster.
More jobs at NVIDIA
Similar Jobs
Senior Manager, Software Engineering, DevOps (People Leader)
Capital(1 month ago)
DevOps Network & Security Engineer F/H
NTT DATA,(3 months ago)
Software Engineer - AI Infra Visibility
Clockwork.io(1 month ago)
Director of GPU Fleet Operations
Gruve(12 days ago)
Senior Platform Engineer
Pluralis Research(2 months ago)
Director, Product Management, DevX - Network Observability
Capital(1 month ago)