Senior Site Reliability Engineer, Compute
Crusoe(1 month ago)
About this role
A Senior Site Reliability Engineer, Compute at Crusoe is responsible for enhancing the company’s AI-first cloud infrastructure by optimizing and supporting virtualization and performance at the kernel level. This role focuses on developing automation and monitoring tools, scaling virtualization stacks using technologies like KVM and QEMU, and resolving performance issues in AI and HPC workloads. The engineer collaborates closely with Linux kernel and hardware teams to improve system performance, manage resource allocation, and implement enhancements for emerging compute hardware.
Required Skills
- Compute Infrastructure
- Site Reliability Engineering
- Linux Kernel Internals
- Virtualization Technologies
- Performance Optimization
- Automation Tools
- Observability Tools
- System-Level Debugging
- Infrastructure as Code
- CI/CD Practices
+11 more
Qualifications
- 5+ years of professional experience in Compute SRE, Linux system engineering, or compute infrastructure roles.
- Strong proficiency in Linux kernel internals, with exposure to scheduler, memory allocation, and driver subsystems.
- Experience with virtualization architectures and technologies such as KVM, Xen, QEMU, or VMware.
- Familiarity with SmartNICs/DPUs (e.g., NVIDIA CX6/7, BlueField-3) and kernel bypass techniques.
- Expert-level skills in at least one programming language: Go, C or Rust.
- Experience with system-level debugging, including kdump, kexec, and kernel panic analysis.
- Proficiency in Infrastructure as Code tooling and CI/CD practices for bare-metal or cloud infrastructure.
- Strong understanding of compute scheduling, resource management, and high-throughput networking.
About Crusoe
crusoe.aiCrusoe is a leading provider of next-generation AI infrastructure that focuses on renewable-powered cloud computing solutions. By employing an energy-first approach, Crusoe enables businesses to deploy AI workloads at scale while ensuring reliable performance and round-the-clock support. The company is committed to advancing sustainable technology, making it a strategic partner for organizations looking to enhance their AI capabilities in an environmentally conscious manner.
Apply instantly with AI
Let ApplyBlast auto-apply to jobs like this for you. Save hours on applications and land your dream job faster.
More jobs at Crusoe
Similar Jobs
Systems Engineer, Kernel (Networking)
CoreWeave(7 days ago)
Senior Hypervisor Engineer
Nebius(10 months ago)
Senior Hypervisor Engineer
Nebius(1 year ago)
System Engineer (Compute Node)
Nebius(1 year ago)
Senior Site Reliability Engineer (Compute Node Team)
Nebius(8 days ago)
Systems Engineer, Kernel (Performance)
CoreWeave(7 days ago)