HPC System Engineer
Nebius(1 month ago)
About this role
The Systems Engineer (Cloudmeter) will support benchmarking and evaluation of GPU platforms for machine learning and AI workloads, enabling data-driven optimization of hardware and software stacks. The role collaborates with hardware and development teams to perform acceptance testing, run experiments across diverse GPU configurations, and guide platform decisions for performance and scalability. The position contributes to next-generation hardware development by validating performance, stability, and compatibility of GPU clusters.
Required Skills
- Unix/Linux
- Python
- Bash
- CUDA
- NCCL
- Drivers
- Troubleshooting
- Docker
- Kubernetes
- GPU Benchmarking
+6 more
About Nebius
nebius.comNebius is a cloud platform for AI explorers that provides GPU‑accelerated infrastructure to build, tune, and run machine learning models and applications. It offers access to top‑tier NVIDIA GPUs and tooling designed to maximize efficiency and performance for training, fine‑tuning, and inference. Nebius focuses on simplifying ML workflows so researchers, developers, and teams can iterate faster without managing hardware.
Apply instantly with AI
Let ApplyBlast auto-apply to jobs like this for you. Save hours on applications and land your dream job faster.
More jobs at Nebius
Similar Jobs
Member of Technical Staff (GPU Engineer)
Reka(26 days ago)
Senior Systems Engineer - AI Infrastructure
Clockwork.io(6 days ago)
Machine Learning Engineer – HPC
Meshy(5 months ago)
Software Engineer, Hardware
OpenAI(3 months ago)
Sr. Software Engineer - Perf and Benchmarking
CoreWeave(1 month ago)
Senior HPC Operations Engineer
Lambda(2 months ago)