Senior+ Site Reliability Engineer
Crusoe(1 month ago)
About this role
A Senior Site Reliability Engineer at Crusoe is responsible for ensuring the stability, resilience, and performance of the company's GPU cloud platform. The role involves collaborating with cross-functional teams to define availability metrics, participating in incident response and post-incident reviews, and improving operational procedures to minimize manual interventions. Candidates should have over 5 years of experience in cloud operations, knowledge of monitoring tools like Prometheus and Grafana, and a commitment to operational excellence within a fast-paced, distributed systems environment.
Required Skills
- Cloud Operations
- Site Reliability Engineering
- Incident Response
- Reliability Metrics
- Automation Development
- Tooling
- Infrastructure Monitoring
- Communication Skills
- Problem Solving
- Continuous Improvement
+12 more
About Crusoe
crusoe.aiCrusoe is a leading provider of next-generation AI infrastructure that focuses on renewable-powered cloud computing solutions. By employing an energy-first approach, Crusoe enables businesses to deploy AI workloads at scale while ensuring reliable performance and round-the-clock support. The company is committed to advancing sustainable technology, making it a strategic partner for organizations looking to enhance their AI capabilities in an environmentally conscious manner.
Apply instantly with AI
Let ApplyBlast auto-apply to jobs like this for you. Save hours on applications and land your dream job faster.
More jobs at Crusoe
Similar Jobs
Site Reliability Engineer (EMEA)
Ditto(2 months ago)
Site Reliability Engineer
Mindvalley(29 days ago)
Senior Site Reliability Engineer, APAC
Ditto(2 months ago)
Lead Site Reliability Engineer
GetGround(1 month ago)
Senior Site Reliability Engineer (SRE) - (Dublin, CA)
Articul8(29 days ago)
Staff Site Reliability Engineer
Replit(3 months ago)