Cohere

Staff Software Engineer, GPU Infrastructure (HPC)

Cohere(2 months ago)

HybridFull TimeSenior$160,258 - $214,982 (estimated)Product
Apply Now

About this role

As a Staff Software Engineer in GPU Infrastructure (HPC) at Cohere, you will design and operate ML-optimized, Kubernetes-based GPU/TPU superclusters across multiple clouds, focusing on stability, scalability, and performance for AI workloads. Your responsibilities include optimizing infrastructure with cloud providers, troubleshooting complex issues, and enabling researchers with self-service tools to manage their AI training jobs. You will collaborate closely with AI researchers to innovate and implement solutions that enhance machine learning infrastructure while championing best practices like automation and observability.

View Original Listing

Required Skills

  • ML Infrastructure
  • Kubernetes
  • HPC Infrastructure
  • GPU Clusters
  • Distributed Training
  • Python
  • Go
  • Linux
  • RDMA Networking
  • Performance Optimization

+7 more

Cohere

About Cohere

cohere.com

Sanity is a platform that provides flexible content management solutions tailored for developers, marketers, and content creators. By utilizing a real-time collaborative editor and structured content, it allows users to build and manage high-performance applications and websites. Sanity’s APIs and flexible data model enable seamless integration with various frameworks and technologies, empowering users to deliver customized content experiences. With features like query-driven content fetching and an extensible plugin system, Sanity is designed to enhance productivity and scalability for teams of all sizes.

ApplyBlast uses AI to match you with the right jobs, tailor your resume and cover letter, and apply automatically so you can land your dream job faster.

© All Rights Reserved. ApplyBlast.com