Manager, Large Language Model Inference
NVIDIA(1 month ago)
About this role
A hands-on Engineering Manager at NVIDIA leading the development of next-generation LLM/VLM inference software for the TensorRT platform. The role combines technical ownership and people leadership to architect and ship production-grade inference runtimes across enterprise and edge GPUs. It involves close collaboration with researchers, GPU architects, and cross-functional teams to accelerate AI deployment and performance.
Required Skills
- Kernel Development
- Runtime Optimization
- C++
- Python
- CUDA
- GPU Architecture
- Performance Tuning
- LLM Inference
- API Design
- Team Leadership
+1 more
Qualifications
- MS in Computer Science, Computer Engineering, AI or related field
- PhD in Computer Science, Computer Engineering, AI or related field
- Equivalent Experience
About NVIDIA
nvidia.comNVIDIA invents the GPU and drives advances in AI, HPC, gaming, creative design, autonomous vehicles, and robotics.
View more jobs at NVIDIA →Apply instantly with AI
Let ApplyBlast auto-apply to jobs like this for you. Save hours on applications and land your dream job faster.
More jobs at NVIDIA
Similar Jobs
AI Infrastructure Engineer
NIO(5 months ago)
Senior Software Engineer - Model Performance
Inference(1 month ago)
ML Platform Engineer
eBay(26 days ago)
System Engineer (Token Factory)
Nebius(7 months ago)
ML Engineer, Large Language Models (LLM Training & Inference Optimization)
Nebius(10 months ago)
AI Software Engineer
Zoom(23 days ago)