Senior Software Engineer - Model Performance
Inference(24 days ago)
About this role
Inference.net is seeking a technical expert to optimize and accelerate AI inference systems using GPU and CUDA technologies. The role involves deep technical work on the full inference stack, aiming to improve performance, latency, throughput, and cost efficiency of large language model serving. It offers an opportunity to work on cutting-edge AI infrastructure in a collaborative startup environment.
Required Skills
- CUDA
- GPU Programming
- Inference Optimization
- PyTorch
- TensorRT
- Quantization
- Speculative Decoding
- GPU Profiling
- Model Serving
- Distributed Inference
About Inference
inference.netInference.net is an innovative platform that specializes in AI inference solutions, enabling businesses to effectively train and host custom large language models tailored to their specific needs. The company offers a range of services, including serverless API and batch inference capabilities, designed to deliver improved performance and cost-efficiency compared to traditional models. With a focus on reducing latency and enhancing model accuracy, Inference.net empowers organizations to leverage AI technologies across various modalities such as text, image, and video. Their mission is to provide high-quality, reliable AI solutions that optimize deployment processes and drive operational excellence for their clients.
View more jobs at Inference →Apply instantly with AI
Let ApplyBlast auto-apply to jobs like this for you. Save hours on applications and land your dream job faster.
More jobs at Inference
Similar Jobs
Staff AI Software Engineer - Edge Model Optimization & Deployment
Field AI(12 days ago)
Member of Technical Staff - Efficient ML
Moonlake AI(2 months ago)
Neural Network Optimization Engineer
Recraft(2 months ago)
Member of Technical Staff, GPU Optimization
Mirage(3 months ago)
Machine Learning Engineer – HPC
Meshy(5 months ago)
Principal Machine Learning Engineer
BJAK(3 days ago)