NVIDIA

Senior Site Reliability Engineer - Observability and Telemetry Platform

NVIDIA(1 month ago)

United States, California, Santa Clara, CAOnsiteFull TimeSenior$176,000 - $333,500Engineering
Apply Now

About this role

A Site Reliability Engineer at NVIDIA is responsible for ensuring high availability and efficient operation of large-scale GPU cloud services. The role focuses on designing and improving production systems and observability platforms to support performance, capacity, and developer velocity. It emphasizes automation, reliability engineering practices, and continuous system improvement in a collaborative engineering environment.

View Original Listing

Required Skills

  • Observability
  • Telemetry
  • Monitoring
  • Logging
  • Alerting
  • Automation
  • Capacity Management
  • Linux
  • Networking
  • Containers

+10 more

Qualifications

  • BS in Computer Science or Related Field
NVIDIA

About NVIDIA

nvidia.com

NVIDIA invents the GPU and drives advances in AI, HPC, gaming, creative design, autonomous vehicles, and robotics.

View more jobs at NVIDIA

ApplyBlast uses AI to match you with the right jobs, tailor your resume and cover letter, and apply automatically so you can land your dream job faster.

© All Rights Reserved. ApplyBlast.com