Member of Technical Staff, Model Efficiency
Cohere(2 months ago)
About this role
A Member of Technical Staff for Model Efficiency at Cohere focuses on enhancing the performance of large language models (LLMs) by implementing optimizations that improve inference speed, latency, and throughput. The role involves deep technical work across the inference stack, diagnosing bottlenecks, and collaborating with modeling and systems teams to deploy performance improvements. Candidates should have strong programming skills in C++ or Python, experience with LLM inference ecosystems, and a background in performance optimization, particularly with GPUs and distributed systems.
Required Skills
- High-Performance Code
- C++
- Python
- Large Language Models
- Performance Bottlenecks
- GPU Programming
- CUDA
- Systems Optimization
- Language Modeling
- Distributed Systems
+1 more
About Cohere
cohere.comSanity is a platform that provides flexible content management solutions tailored for developers, marketers, and content creators. By utilizing a real-time collaborative editor and structured content, it allows users to build and manage high-performance applications and websites. Sanity’s APIs and flexible data model enable seamless integration with various frameworks and technologies, empowering users to deliver customized content experiences. With features like query-driven content fetching and an extensible plugin system, Sanity is designed to enhance productivity and scalability for teams of all sizes.
Apply instantly with AI
Let ApplyBlast auto-apply to jobs like this for you. Save hours on applications and land your dream job faster.
More jobs at Cohere
Similar Jobs
Neural Network Optimization Engineer
Recraft(2 months ago)
Lead ML Inference Engineer, Advertising
Roku(1 month ago)
Member of Technical Staff, GPU Optimization
Mirage(2 months ago)
Machine Learning Engineer – HPC
Meshy(5 months ago)
SWE, Inference Performance, Onboard
Wayve(11 months ago)
Senior ML Engineer (Token Factory)
Nebius(13 days ago)