Senior Site Reliability Engineer - HPC
NVIDIA(1 day ago)
About this role
NVIDIA is seeking a Senior Site Reliability Engineer to help build and operate its global service platform, with a focus on ensuring high availability and operational excellence. The role involves designing scalable solutions in hybrid multi-cloud environments, automating infrastructure provisioning, and collaborating across teams to maintain critical systems that support NVIDIA's innovative technologies in AI and high-performance computing.
Required Skills
- Kubernetes
- IaC
- Monitoring
- Scripting
- Observability
- Cloud Infrastructure
- Capacity Planning
- Automation
- Reliability Engineering
- Data-Driven Operations
About NVIDIA
nvidia.comNVIDIA invents the GPU and drives advances in AI, HPC, gaming, creative design, autonomous vehicles, and robotics.
View more jobs at NVIDIA →Apply instantly with AI
Let ApplyBlast auto-apply to jobs like this for you. Save hours on applications and land your dream job faster.
More jobs at NVIDIA
Similar Jobs
Site Reliability Engineer
AWE plc(14 days ago)
Site Reliability Engineer - Deployment & Monitoring
Workday(8 days ago)
Site Reliability Engineer
BRKZ(3 months ago)
Principal Site Reliability Engineer
Zscaler(16 days ago)
Site Reliability Engineer
DevRev(1 month ago)
Sr. Site Reliability Engineer III (6367)
MetroStar(16 days ago)