Principal Cluster Engineer, Training Infrastructure
verda
About this role
Verda is hiring a Principal Cluster Engineer to own and evolve our InfiniBand-connected GPU training infrastructure, building and operating large-scale AI and HPC clusters for next-generation machine learning workloads. You will collaborate with ML researchers, cloud platform teams, datacenter operations, and procurement to ensure the GPU infrastructure is fast, reliable, and scalable. The role emphasizes architecture, automation, observability, and defining technical standards to translate customer and product requirements into robust infrastructure capabilities.
Skills
About verda
verda.comVerda (formerly DataCrunch) is a European AI cloud provider that offers on-demand GPU instances, autoscaling clusters, managed endpoints and serverless inference to host and deploy models in production. It supplies self-service instances and clusters powered by the latest NVIDIA hardware (B200, H200, H100, A100, L40S, RTX series) and tooling to start, stop, or hibernate via dashboard or API for cost-efficient, high-performance deployments. Verda is ISO27001-certified, GDPR-compliant, runs on 100% renewable energy, and provides engineer support via in-dashboard chat. Backed by in-house AI R&D, it targets AI teams seeking secure, EU-based GPU infrastructure and managed inference.
About verda
Headquarters
San Francisco, CA
Company Size
201-500 employees
Founded
2018
Industry
Technology
Glassdoor Rating
4.2 / 5
Leadership Team
Sarah Johnson
Chief Executive Officer
Michael Chen
Chief Technology Officer
Emily Williams
VP of Engineering
David Rodriguez
VP of Product
Jessica Thompson
Chief Financial Officer
Andrew Park
VP of Sales
Unlock Company Insights
View leadership team, funding history,
and employee contacts for verda.