Site Reliability Engineer, AI/ML Infrastructure

Boson AI

2 months ago

Toronto, Ontario

Onsite

Full Time

Senior

0 applicants

View Job Listing

Apply to 100+ jobs

About this role

This Senior Site Reliability Engineer role focuses on operating and scaling a large GPU-based HPC datacenter in Toronto. The position spans the full lifecycle of high-performance infrastructure from planning and deployment to ongoing reliability and performance optimization. The engineer will partner closely with ML and research teams to ensure the cluster meets evolving compute and storage needs while evaluating new technologies as the environment grows.

Skills

Qualifications

5+ years SRE or HPC Operations Experience

About Boson AI

boson.ai

Boson AI builds conversational and audio-generation AI focused on making interaction with machines "as easy, natural and fun as talking to a human." Their platform offers high‑fidelity, open‑source voice synthesis and multi‑speaker dialog generation, plus promptable audio (including sound effects) and emotional voice rendering. Boson provides APIs, demos and developer tools so teams can embed natural spoken interfaces into products. The company targets developers and businesses creating conversational experiences across products and platforms.

Recent company news

MarkTechPost

Boson AI Introduces Higgs Audio Understanding and Higgs Audio Generation: An Advanced AI Solution with Real-Time Audio Reasoning and Expressive Speech Synthesis for Enterprise Applications

Apr 10, 2025