Nebius

Senior Site Reliability Engineer — Token Factory (Inference Platform)

Nebius

9 months ago
Amsterdam, Netherlands
Hybrid
Full Time
Senior
0 applicants
View Job Listing
Nebius
Apply to 100+ jobs

About this role

A Reliability Engineer on Nebius Cloud's Token Factory will own the reliability, performance, and observability of the inference stack powering large-scale multimodal AI models. The role focuses on scaling the platform to meet aggressive cost and reliability targets while ensuring fast, reliable inference across a global GPU fleet. You'll collaborate across engineering teams and drive automation, runbooks, and post‑mortems to maintain a self‑healing production environment.

Skills

Nebius

About Nebius

nebius.com

Nebius is a cloud platform for AI explorers that provides GPU‑accelerated infrastructure to build, tune, and run machine learning models and applications. It offers access to top‑tier NVIDIA GPUs and tooling designed to maximize efficiency and performance for training, fine‑tuning, and inference. Nebius focuses on simplifying ML workflows so researchers, developers, and teams can iterate faster without managing hardware.

About Nebius

Headquarters

San Francisco, CA

Company Size

201-500 employees

Founded

2018

Industry

Technology

Glassdoor Rating

4.2 / 5

Leadership Team

Sarah Johnson

Chief Executive Officer

Michael Chen

Chief Technology Officer

Emily Williams

VP of Engineering

David Rodriguez

VP of Product

Jessica Thompson

Chief Financial Officer

Andrew Park

VP of Sales

Unlock Company Insights

View leadership team, funding history,
and employee contacts for Nebius.

Reveal Company Insights

ApplyBlast uses AI to match you with the right jobs, tailor your resume and cover letter, and apply automatically so you can land your dream job faster.

© All Rights Reserved. ApplyBlast.com