d-Matrix

Machine Learning Intern - Dynamic KV-Cache Modeling for Efficient LLM Inference

d-Matrix(16 days ago)

RemoteInternshipIntern$43,428 - $61,624 (estimated)R&D - SW Kernels & Workloads
Apply Now

About this role

d-Matrix is seeking a Machine Learning Intern to develop a dynamic Key-Value cache solution for Large Language Model inference, focusing on improving memory utilization and execution efficiency on their hardware. The role involves modeling within PyTorch and researching existing inference mechanisms.

View Original Listing

Required Skills

  • PyTorch
  • Deep Learning
  • CUDA
  • Python
  • Model Optimization
  • Memory Management
  • Hardware Acceleration
  • Tensor
  • Compute Graphs
  • Inference Optimization
d-Matrix

About d-Matrix

www.d-matrix.ai

d-Matrix is revolutionizing generative AI with its cutting-edge inference platform, Corsair™, designed for ultra-low latency and high throughput in data centers. The platform integrates memory-compute technology, enabling speeds of 60,000 tokens per second with just 1ms latency for advanced models, making it both efficient and sustainable. With a focus on scalability, d-Matrix's products cater to a wide range of enterprise needs, advancing the accessibility and performance of AI technologies. Additionally, d-Matrix is committed to sustainability, allowing organizations to achieve impressive performance while minimizing energy consumption.

View more jobs at d-Matrix

ApplyBlast uses AI to match you with the right jobs, tailor your resume and cover letter, and apply automatically so you can land your dream job faster.

© All Rights Reserved. ApplyBlast.com