Senior Site Reliability Engineer (Compute Node Team)
Nebius(8 days ago)
About this role
A Senior Site Reliability Engineer on the Compute Node team at Nebius will help design and operate the cluster scheduling layers and compute nodes that run virtual machines across regions. The role centers on Linux systems engineering, virtualization and operational reliability, working close to the operating system and hypervisor. The engineer will shape how reliability and observability are embedded into the compute platform to support Nebius AI Cloud.
Required Skills
- Linux
- Kernel
- Virtualization
- QEMU/KVM
- Containerization
- Cgroups
- Observability
- Monitoring
- Incident Response
- Debugging
+2 more
About Nebius
nebius.comNebius is a cloud platform for AI explorers that provides GPU‑accelerated infrastructure to build, tune, and run machine learning models and applications. It offers access to top‑tier NVIDIA GPUs and tooling designed to maximize efficiency and performance for training, fine‑tuning, and inference. Nebius focuses on simplifying ML workflows so researchers, developers, and teams can iterate faster without managing hardware.
Apply instantly with AI
Let ApplyBlast auto-apply to jobs like this for you. Save hours on applications and land your dream job faster.
More jobs at Nebius
Similar Jobs
Production Engineer, Compute
Crusoe(1 month ago)
Staff Site Reliability Engineer, Compute
Crusoe(1 month ago)
Senior Site Reliability Engineer, Compute
Crusoe(1 month ago)
Systems Software Engineer
Crusoe(2 months ago)
Software Engineer, Sandboxing (Systems)
Anthropic(1 month ago)
Software Engineer
Anduril Industries(16 hours ago)