Site Reliability Engineer, AI/ML Infrastructure
Boson AI(6 days ago)
About this role
A Senior Site Reliability Engineer at NVIDIA manages and optimizes sophisticated GPU clusters, including HPC infrastructure with large-scale storage and networking. The role involves automation, troubleshooting, and working closely with engineering and research teams to ensure system reliability and scalability.
Required Skills
- Linux
- Kubernetes
- Ceph
- Python
- Bash
- Infrastructure as Code
- Ansible
- Terraform
- GitOps
- GPUDirect
About Boson AI
boson.aiBoson AI builds conversational and audio-generation AI focused on making interaction with machines "as easy, natural and fun as talking to a human." Their platform offers high‑fidelity, open‑source voice synthesis and multi‑speaker dialog generation, plus promptable audio (including sound effects) and emotional voice rendering. Boson provides APIs, demos and developer tools so teams can embed natural spoken interfaces into products. The company targets developers and businesses creating conversational experiences across products and platforms.
View more jobs at Boson AI →Apply instantly with AI
Let ApplyBlast auto-apply to jobs like this for you. Save hours on applications and land your dream job faster.
More jobs at Boson AI
Similar Jobs
Senior Storage and Networking Product Engineer
NVIDIA(1 month ago)
Senior AI and ML HPC Cluster Engineer
NVIDIA(1 month ago)
HPC Solutions Architect
Lavendo(18 days ago)
Senior Storage Production Engineer - DGX Cloud
NVIDIA(1 month ago)
Site Reliability Engineer, UK
Partly(2 months ago)
Senior HPC DevOps Engineer
NVIDIA(2 months ago)