Senior Site Reliability Engineer (Colombia)

Senior Site Reliability Engineer (Colombia)

27 may
|
Importante empresa
|
Colombia

27 may

Importante empresa

Colombia

EPAM is a leading general provider of digital platform engineering and development services. We are committed to having a positive impact on our customers, our employees, and our communities. We embrace a dynamic and inclusive culture. Here you will collaborate with multi-national teams, contribute to a myriad of innovative projects that deliver the most creative and cutting-edge solutions, and have an opportunity to continuously learn and grow. No matter where you are located, you will join a dedicated, creative, and diverse community that will help you discover your fullest potential.
We are looking for a skilled Senior Site Reliability Engineer to deliver advanced support and reliability engineering for critical cloud-based systems. The role focuses on ensuring reliability, performance and observability across AWS environments, with strong emphasis on Kubernetes, advanced monitoring, database expertise and distributed systems such as Kafka. The position involves incident response, proactive reliability improvements,



automation and collaboration with engineering teams to strengthen system resilience.
Responsibilities
Design, implement and maintain observability for AWS Cloud and Kubernetes workloads using Prometheus, Grafana, Open Telemetry, Fluent Bit, OpenSearch, CloudWatch, CloudTrail, Athena and other modern tooling
Monitor and troubleshoot EKS, Aurora RDS (Postgres) and other AWS infrastructure at an advanced level
Implement automated remediations and self-healing mechanisms
Participate in incident response, root-cause analysis and postmortems
Implement security measures impacting cluster reliability (IAM, network policies, Config)
Support and maintain current AWS infrastructure
Collaborate with L3 teams to escalate, troubleshoot and resolve operational issues
Requirements
3+ years of experience in site reliability engineering or advanced support roles
Expert-level proficiency in Grafana, Prometheus and OpenSearch
Expertise in Open Telemetry, Fluent Bit, Cloud

📌 Senior Site Reliability Engineer (Colombia)
🏢 Importante empresa
📍 Colombia

Postulate a este anuncio

Muestra tus habilidades a la empresa, rellenar el formulario y deja un toque personal en la carta, ayudará el reclutador en la elección del candidato.

Suscribete a esta alerta:
Escribe tu dirección de correo electrónico, te permitirá de estar al tanto de los últimos empleos por: senior site reliability engineer (colombia) / colombia
Suscribete a esta alerta:
Escribe tu dirección de correo electrónico, te permitirá de estar al tanto de los últimos empleos por: senior site reliability engineer (colombia) / colombia