Crusoe

Senior+ Site Reliability Engineer

Crusoe(1 month ago)

San Francisco, CA, United StatesOnsiteFull TimeSenior$124,563 - $167,502 (estimated)Cloud Engineering
Apply Now

About this role

A Senior Site Reliability Engineer at Crusoe is responsible for ensuring the stability, resilience, and performance of the company's GPU cloud platform. The role involves collaborating with cross-functional teams to define availability metrics, participating in incident response and post-incident reviews, and improving operational procedures to minimize manual interventions. Candidates should have over 5 years of experience in cloud operations, knowledge of monitoring tools like Prometheus and Grafana, and a commitment to operational excellence within a fast-paced, distributed systems environment.

View Original Listing

Required Skills

  • Cloud Operations
  • Site Reliability Engineering
  • Incident Response
  • Reliability Metrics
  • Automation Development
  • Tooling
  • Infrastructure Monitoring
  • Communication Skills
  • Problem Solving
  • Continuous Improvement

+12 more

Crusoe

About Crusoe

crusoe.ai

Crusoe is a leading provider of next-generation AI infrastructure that focuses on renewable-powered cloud computing solutions. By employing an energy-first approach, Crusoe enables businesses to deploy AI workloads at scale while ensuring reliable performance and round-the-clock support. The company is committed to advancing sustainable technology, making it a strategic partner for organizations looking to enhance their AI capabilities in an environmentally conscious manner.

ApplyBlast uses AI to match you with the right jobs, tailor your resume and cover letter, and apply automatically so you can land your dream job faster.

© All Rights Reserved. ApplyBlast.com