Boson AI

Site Reliability Engineer, AI/ML Infrastructure

Boson AI(22 days ago)

Santa Clara, CAOnsiteFull TimeSenior$160,769 - $217,668 (estimated)Site Reliability Engineering
Apply Now

About this role

The Senior Site Reliability Engineer oversees and scales a large GPU-based HPC cluster in the Toronto datacenter, including NVIDIA H100/A100 resources, multi-petabyte Ceph storage, and high-speed networking. The role spans the full lifecycle of infrastructure from planning and deployment to reliability and performance optimization. It involves close collaboration with engineering and research teams to support their workloads and future capacity needs. The position also includes evaluating and integrating new technologies as the environment grows.

View Original Listing

Required Skills

  • HPC Operations
  • Cluster Management
  • Infrastructure As Code
  • Linux Administration
  • Kubernetes
  • Container Orchestration
  • Ceph Storage
  • Performance Monitoring
  • Troubleshooting
  • Automation Development

+12 more

Boson AI

About Boson AI

boson.ai

Boson AI builds conversational and audio-generation AI focused on making interaction with machines "as easy, natural and fun as talking to a human." Their platform offers high‑fidelity, open‑source voice synthesis and multi‑speaker dialog generation, plus promptable audio (including sound effects) and emotional voice rendering. Boson provides APIs, demos and developer tools so teams can embed natural spoken interfaces into products. The company targets developers and businesses creating conversational experiences across products and platforms.

ApplyBlast uses AI to match you with the right jobs, tailor your resume and cover letter, and apply automatically so you can land your dream job faster.

© All Rights Reserved. ApplyBlast.com