TensorWave

ML Cluster Operations Engineer

TensorWave

4 months ago
Las Vegas, Nevada
Hybrid
Full Time
Senior
0 applicants
View Job Listing
TensorWave
Apply to 100+ jobs

About this role

The ML Cluster Operations Engineer at TensorWave is responsible for managing and optimizing containerized Slurm and Kubernetes solutions for distributed machine learning workloads. This senior-level role requires extensive experience in HPC and cloud infrastructure, focusing on cluster health, uptime, automated node management, and performance profiling. The engineer will collaborate with the development team to implement CI automation, establish best practices for job execution at scale, and mentor other ML engineers. Key technologies include Slurm, Kubernetes, and distributed ML frameworks such as Python and PyTorch.

Skills

Qualifications

5+ years of experience in cloud infrastructure, HPC, or machine learning rolesSignificant hands-on experience with Slurm in production HPC/ML environmentsStrong knowledge of distributed ML languages and frameworks, such as Python, PyTorch, Megatron, c10d, MPIDeep understanding of security, compliance, and resilience in containerized workloads3+ years of hands-on Kubernetes experience
TensorWave

About TensorWave

tensorwave.com

TensorWave is a cloud computing platform specializing in artificial intelligence (AI) and high-performance computing (HPC) services powered by AMD Instinct™ GPUs. The company provides a scalable and memory-optimized infrastructure designed to facilitate the deployment and management of demanding AI workloads, including low-latency inference and large language models. With a focus on efficiency and cost-effectiveness, TensorWave's offerings include bare-metal solutions and managed inference services, tailored to meet the needs of enterprises looking to harness the power of next-generation AI technologies.

About TensorWave

Headquarters

San Francisco, CA

Company Size

201-500 employees

Founded

2018

Industry

Technology

Glassdoor Rating

4.2 / 5

Leadership Team

Sarah Johnson

Chief Executive Officer

Michael Chen

Chief Technology Officer

Emily Williams

VP of Engineering

David Rodriguez

VP of Product

Jessica Thompson

Chief Financial Officer

Andrew Park

VP of Sales

Unlock Company Insights

View leadership team, funding history,
and employee contacts for TensorWave.

Reveal Company Insights

ApplyBlast uses AI to match you with the right jobs, tailor your resume and cover letter, and apply automatically so you can land your dream job faster.

© All Rights Reserved. ApplyBlast.com