Senior Reliaibility Engineer - Technology
Truelogic Software(1 month ago)
About this role
The Senior Reliability Engineer at Truelogic is responsible for enhancing the reliability and observability of distributed systems running on AWS and Kubernetes. This role focuses on analyzing system behavior in production, implementing automated operational responses, and collaborating with engineering teams to improve resilience through metrics, alerts, and incident management. The engineer also maintains core platform components, evolves observability practices, and applies SRE principles to ensure high operational standards.
Required Skills
- Site Reliability Engineering
- Observability Strategies
- System Behavior Analysis
- AWS CDK
- Kubernetes Operations
- SLIs
- SLOs Definition
- Automated Responses
- Incident Investigation
- CI/CD Pipelines
+7 more
Qualifications
- 5+ years of experience in Site Reliability Engineering, Platform Engineering, or Infrastructure roles
- Experience with AWS services such as VPC, IAM, RDS, MSK, S3, and CloudWatch
- Experience with Kubernetes components like Helm, RBAC, and ServiceAccounts
- Fluency in Python
- Experience with Infrastructure-as-Code using AWS CDK, CDK8s, or equivalent frameworks
- Strong understanding of Prometheus, Grafana, alert tuning, alert fatigue reduction, and incident-driven monitoring improvements
- Experience designing reusable infrastructure or observability patterns
- Experience supporting Spark on Kubernetes, Argo, or Kafka-based batch pipelines (nice to have)
About Truelogic Software
www.truelogic.ioTruelogic Software is a leading nearshore software development company focused on providing tailored solutions for American businesses. With over 21 years of experience, the company specializes in staff augmentation, managed teams, and consulting, effectively integrating elite tech and creative talent from Latin America into client teams. Truelogic is dedicated to accelerating digital transformation through agile methodologies, innovative design, and data-driven insights, serving a wide array of industries including finance, healthcare, and entertainment. Their commitment to quality and collaboration has made them a preferred partner for organizations ranging from startups to Fortune 500 companies.
Apply instantly with AI
Let ApplyBlast auto-apply to jobs like this for you. Save hours on applications and land your dream job faster.
More jobs at Truelogic Software
Controller - Marketing
Truelogic Software(27 days ago)
Remote - Senior QA Automation Engineer - Hospitality / Marketplace - Dominican Republic
Truelogic Software(27 days ago)
Remote - Senior QA Automation Engineer - Hospitality / Marketplace - Uruguay
Truelogic Software(27 days ago)
Remote - Senior QA Automation Engineer - Hospitality / Marketplace - Mexico
Truelogic Software(27 days ago)
Similar Jobs
Staff Software Engineer, Reliability
Veeam Software(5 months ago)
Site Reliability Engineer
MarketAxess(13 days ago)
Senior Cloud Operations Engineer
Intrinsic(1 month ago)
Staff SRE, Performance & Reliability
Fastly(1 month ago)
Site Reliability Engineer
Unison Group(1 year ago)
Senior Site Reliability Engineer
Reddit(1 year ago)