Member of Technical Staff, Model Efficiency
Cohere
About this role
A Member of Technical Staff for Model Efficiency at Cohere focuses on enhancing the performance of large language models (LLMs) by implementing optimizations that improve inference speed, latency, and throughput. The role involves deep technical work across the inference stack, diagnosing bottlenecks, and collaborating with modeling and systems teams to deploy performance improvements. Candidates should have strong programming skills in C++ or Python, experience with LLM inference ecosystems, and a background in performance optimization, particularly with GPUs and distributed systems.
Skills
About Cohere
cohere.comCohere is an AI company that builds large language models and enterprise AI platforms for businesses and developers.
Recent company news
Nvidia-Backed Cohere Forms AI Alliance With Telecom Firm BCE
15 hours ago
Enterprise AI startup Cohere tops revenue target as momentum builds to IPO: Investor memo
1 month ago
Cohere joins Aston Martin Aramco as Official Generative AI Partner to help accelerate AI innovation
2 weeks ago
The AI Model Race May Have Slowed Down for Cohere
Nov 17, 2025
Cohere Technologies drives ahead with innovation vision
1 week ago
About Cohere
Headquarters
San Francisco, CA
Company Size
201-500 employees
Founded
2018
Industry
Technology
Glassdoor Rating
4.2 / 5
Leadership Team
Sarah Johnson
Chief Executive Officer
Michael Chen
Chief Technology Officer
Emily Williams
VP of Engineering
David Rodriguez
VP of Product
Jessica Thompson
Chief Financial Officer
Andrew Park
VP of Sales
Unlock Company Insights
View leadership team, funding history,
and employee contacts for Cohere.
Salary
$206k – $274k
per year
More jobs at Cohere
Similar Jobs
Senior Software Engineer - Model Performance
Inference
Member of Engineering (Pre-training and inference fault tolerance)
poolside
AI / ML Platform Engineer
Whatnot
Principal Software Engineer - AI Inference
NVIDIA
Manager, Large Language Model Inference
NVIDIA
Engineering Manager, Inference Routing and Performance
Anthropic