Machine Learning Intern - Dynamic KV-Cache Modeling for Efficient LLM Inference

d-Matrix(16 days ago)

RemoteInternshipIntern$43,428 - $61,624 (estimated)R&D - SW Kernels & Workloads

About this role

d-Matrix is seeking a Machine Learning Intern to develop a dynamic Key-Value cache solution for Large Language Model inference, focusing on improving memory utilization and execution efficiency on their hardware. The role involves modeling within PyTorch and researching existing inference mechanisms.

View Original Listing

Required Skills

PyTorch
Deep Learning
CUDA
Python
Model Optimization
Memory Management
Hardware Acceleration
Tensor
Compute Graphs
Inference Optimization

About d-Matrix

www.d-matrix.ai

d-Matrix is revolutionizing generative AI with its cutting-edge inference platform, Corsair™, designed for ultra-low latency and high throughput in data centers. The platform integrates memory-compute technology, enabling speeds of 60,000 tokens per second with just 1ms latency for advanced models, making it both efficient and sustainable. With a focus on scalability, d-Matrix's products cater to a wide range of enterprise needs, advancing the accessibility and performance of AI technologies. Additionally, d-Matrix is committed to sustainability, allowing organizations to achieve impressive performance while minimizing energy consumption.

View more jobs at d-Matrix →