Nebius

Senior Site Reliability Engineer (Compute Node Team)

Nebius(8 days ago)

HybridFull TimeSenior$159,164 - $213,802 (estimated)Site Reliability Engineering
Apply Now

About this role

A Senior Site Reliability Engineer on the Compute Node team at Nebius will help design and operate the cluster scheduling layers and compute nodes that run virtual machines across regions. The role centers on Linux systems engineering, virtualization and operational reliability, working close to the operating system and hypervisor. The engineer will shape how reliability and observability are embedded into the compute platform to support Nebius AI Cloud.

View Original Listing

Required Skills

  • Linux
  • Kernel
  • Virtualization
  • QEMU/KVM
  • Containerization
  • Cgroups
  • Observability
  • Monitoring
  • Incident Response
  • Debugging

+2 more

Nebius

About Nebius

nebius.com

Nebius is a cloud platform for AI explorers that provides GPU‑accelerated infrastructure to build, tune, and run machine learning models and applications. It offers access to top‑tier NVIDIA GPUs and tooling designed to maximize efficiency and performance for training, fine‑tuning, and inference. Nebius focuses on simplifying ML workflows so researchers, developers, and teams can iterate faster without managing hardware.

ApplyBlast uses AI to match you with the right jobs, tailor your resume and cover letter, and apply automatically so you can land your dream job faster.

© All Rights Reserved. ApplyBlast.com