Lambda

Senior HPC Operations Engineer

Lambda(2 months ago)

HybridFull TimeSenior$207,000 - $401,000Engineering
Apply Now

About this role

The Senior HPC Operations Engineer at Lambda is responsible for the remote deployment and configuration of large-scale HPC clusters tailored for AI workloads. This role involves manual and automated installation of operating systems, software, and networking components, as well as troubleshooting issues in collaboration with on-site teams. The engineer will also mentor junior members, maintain Standard Operating Procedures, and contribute to improvements in operational efficiency while staying abreast of the latest HPC/AI technologies. This position requires extensive experience in HPC cluster management, strong technical skills in network fabrics, and familiarity with job scheduling systems like SLURM and Kubernetes.

View Original Listing

Required Skills

  • HPC Clusters
  • Remotely Deploy
  • Operating Systems
  • Firmware
  • Software
  • Networking
  • Troubleshooting
  • Standard Operating Procedures
  • Mentoring
  • AI Technologies

+32 more

Qualifications

  • Bachelors degree in EE, CS, Physics, Mathematics, or equivalent work experience
Lambda

About Lambda

lambda.ai

Lambda is a cutting-edge cloud computing platform specializing in AI infrastructure, referred to as the "Superintelligence Cloud." It offers gigawatt-scale AI GPU cloud services with on-demand and reserved NVIDIA GPUs, designed specifically for AI training and inference. Lambda's solutions include private cloud options, one-click clusters for streamlined AI training, and orchestration tools for efficient workload management. The company is aimed at AI teams needing scalable solutions to develop and deploy advanced machine learning applications.

ApplyBlast uses AI to match you with the right jobs, tailor your resume and cover letter, and apply automatically so you can land your dream job faster.

© All Rights Reserved. ApplyBlast.com