19 abr
|
Empresa líder
|
Colombia
19 abr
Empresa líder
Colombia
Postúlate en Kit Empleo: kitempleo.com.co/empleo/18dn5f
EPAM is a leading integral provider of digital platform engineering and development services. We are committed to having a positive impact on our customers, our employees, and our communities. We embrace a dynamic and inclusive culture. Here you will collaborate with multi-national teams, contribute to a myriad of innovative projects that deliver the most creative and cutting-edge solutions, and have an opportunity to continuously learn and grow. No matter where you are located, you will join a dedicated, creative, and diverse community that will help you discover your fullest potential.
We just launched services for our client in Azure, and service health is our top priority. As we build our brand through reliable, high-performing services, we are seeking a Senior SRE who can immediately contribute to incident response, troubleshooting, and the ongoing improvement of our cloud reliability. This is a hands-on role for someone who thrives in high-stakes environments, can operate with minimal SRE process maturity, and is passionate about both firefighting and building for the future.
Responsibilities
Develop and automate operational processes to improve system reliability, scalability, and performance
Collaborate with development and operations teams to embed reliability best practices into the SDLC
Rapidly respond to and resolve service incidents in our Azure environment, minimizing downtime and customer impact
Lead root cause analysis and post-incident reviews, driving actionable improvements
Design, implement, and maintain robust monitoring, alerting, and observability solutions for all critical services
Proactively identify and address reliability risks before they impact customers
Help establish and mature SRE practices, including incident management, blameless postmortems, and SLO/SLI definition
Mentor and upskill team members in SRE principles and Azure best practices
Analyze trends in incidents and outages to drive long-term improvements
Champion a culture of re
Postúlate en Kit Empleo: kitempleo.com.co/empleo/18dn5f
📌 Senior Site Reliability Engineer (Colombia)
🏢 Empresa líder
📍 Colombia