Senior Site Reliability Engineer, Observability
NVIDIA(1 month ago)
About this role
A Site Reliability Engineer role at NVIDIA focused on the global telemetry and observability backbone for AI and data platforms. The role is part of the data and observability teams that support large-scale AI, data, and platform services and contributes to the design and evolution of NVIDIA’s telemetry systems. This position operates at the intersection of AI infrastructure and platform engineering, supporting visibility across metrics, logs, traces, and profiling data.
Required Skills
- Observability
- Prometheus
- Thanos
- Mimir
- Loki
- OpenSearch
- Tempo
- Jaeger
- OpenTelemetry
- Python
+18 more
Qualifications
- Bachelor's Degree in Computer Science or Related Field
About NVIDIA
nvidia.comNVIDIA invents the GPU and drives advances in AI, HPC, gaming, creative design, autonomous vehicles, and robotics.
View more jobs at NVIDIA →Apply instantly with AI
Let ApplyBlast auto-apply to jobs like this for you. Save hours on applications and land your dream job faster.
More jobs at NVIDIA
Similar Jobs
Observability Engineer
LSEG(4 months ago)
Senior Site Reliability Engineer (Observability)
Iterable(2 months ago)
Observability Engineer, Grafana & Azure
NTT DATA,(21 days ago)
Observability Engineer (Cloud Engineer)
Fair Isaac Corporation(2 months ago)
Observability & FinOps Engineer
Leonardo(1 month ago)
Observability Engineer
LSEG(1 month ago)