Site Reliability Engineer, AI/ML Infrastructure
Boson AI(6 days ago)
About this role
A Senior Site Reliability Engineer at this organization is responsible for managing and optimizing a high-performance GPU cluster, ensuring its infrastructure runs smoothly and efficiently. The role involves automation, troubleshooting, capacity planning, and working closely with research and engineering teams to support advanced AI and HPC workloads.
Required Skills
- Linux
- Kubernetes
- Ceph
- Python
- Bash
- Terraform
- Helm
- RDMA
- TensorFlow
- PyTorch
About Boson AI
boson.aiBoson AI builds conversational and audio-generation AI focused on making interaction with machines "as easy, natural and fun as talking to a human." Their platform offers high‑fidelity, open‑source voice synthesis and multi‑speaker dialog generation, plus promptable audio (including sound effects) and emotional voice rendering. Boson provides APIs, demos and developer tools so teams can embed natural spoken interfaces into products. The company targets developers and businesses creating conversational experiences across products and platforms.
View more jobs at Boson AI →Apply instantly with AI
Let ApplyBlast auto-apply to jobs like this for you. Save hours on applications and land your dream job faster.
More jobs at Boson AI
Similar Jobs
Senior Advance Computer Software & Platform Engineer
Leonardo(1 month ago)
Site Reliability Engineer - AI & ML Infrastructure (Kubernetes & Terraform)
Deepgram(6 days ago)
Senior HPC Cluster Engineer
Nebius(1 year ago)
Senior Systems Engineer - AI Infrastructure
Clockwork.io(26 days ago)
AI and ML HPC Cluster Engineer
NVIDIA(1 month ago)
Senior HPC and AI Networking Performance Research and Analysis Engineer
NVIDIA(1 month ago)