Staff Software Engineer, GPU Infrastructure (HPC)
Cohere
About this role
As a Staff Software Engineer in GPU Infrastructure (HPC) at Cohere, you will design and operate ML-optimized, Kubernetes-based GPU/TPU superclusters across multiple clouds, focusing on stability, scalability, and performance for AI workloads. Your responsibilities include optimizing infrastructure with cloud providers, troubleshooting complex issues, and enabling researchers with self-service tools to manage their AI training jobs. You will collaborate closely with AI researchers to innovate and implement solutions that enhance machine learning infrastructure while championing best practices like automation and observability.
Skills
About Cohere
cohere.comCohere is an AI company that builds large language models and enterprise AI platforms for businesses and developers.
Recent company news
Nvidia-Backed Cohere Forms AI Alliance With Telecom Firm BCE
15 hours ago
Enterprise AI startup Cohere tops revenue target as momentum builds to IPO: Investor memo
1 month ago
Cohere joins Aston Martin Aramco as Official Generative AI Partner to help accelerate AI innovation
2 weeks ago
The AI Model Race May Have Slowed Down for Cohere
Nov 17, 2025
Cohere Technologies drives ahead with innovation vision
1 week ago
About Cohere
Headquarters
San Francisco, CA
Company Size
201-500 employees
Founded
2018
Industry
Technology
Glassdoor Rating
4.2 / 5
Leadership Team
Sarah Johnson
Chief Executive Officer
Michael Chen
Chief Technology Officer
Emily Williams
VP of Engineering
David Rodriguez
VP of Product
Jessica Thompson
Chief Financial Officer
Andrew Park
VP of Sales
Unlock Company Insights
View leadership team, funding history,
and employee contacts for Cohere.
Salary
$160k – $215k
per year
More jobs at Cohere
Similar Jobs
AI and ML HPC Cluster Engineer
NVIDIA
HPC Kubernetes Architect
NorthMark Strategies
Engineering Manager, HPC Kubernetes Platform
NorthMark Strategies
Engineering Manager, Internal GPU and HPC Computing Clusters
NVIDIA
Lead Solution Architect
Hewlett Packard Enterprise
Senior HPC Developer - GPU and Networking
Clockwork.io