Systems Engineer - AI Infrastructure
Clockwork.io(6 days ago)
About this role
Senior Systems Software Engineer at Clockwork Systems focused on building infrastructure for fault-tolerant, high-performance distributed GPU training. The role sits at the intersection of GPU systems, high-speed networking, and distributed coordination to design and implement systems that run at scale. It emphasizes digging into internals to understand failure modes and delivering robust, production-grade system solutions.
Required Skills
- C/C++
- Concurrency
- Memory Models
- Distributed Systems
- CUDA
- GPU Programming
- PyTorch Internals
- NCCL
- RDMA
- InfiniBand
+5 more
About Clockwork.io
clockwork.ioClockwork is a digital product studio that designs and builds web and mobile applications, blending product strategy, UX/UI design, and engineering. They partner with startups and enterprises to accelerate development and scale products using modern tech stacks and iterative delivery. Clockwork emphasizes craftsmanship, measurable outcomes, and close collaboration to turn ideas into polished, production-ready experiences.
Apply instantly with AI
Let ApplyBlast auto-apply to jobs like this for you. Save hours on applications and land your dream job faster.
More jobs at Clockwork.io
Similar Jobs
Senior HPC Cluster Engineer
Nebius(10 months ago)
System Engineer (Token Factory)
Nebius(21 days ago)
Member of Engineering (Pre-training and inference fault tolerance)
poolside(2 months ago)
HPC System Engineer
Nebius(1 month ago)
Senior HPC Cluster Engineer
Nebius(1 year ago)
Senior ML Engineer (Token Factory)
Nebius(13 days ago)