Talent Job Seeker

Senior Site Reliability Engineer

About the position

Job Title: Senior Site Reliability Engineer (SRE)

Experience: 5+ years Location: Mexico/LATAM

Engagement Type: Full-Time/contractual, Fully Remote

Job Description:

We are seeking a skilled Senior Site Reliability Engineer (SRE) to join our offshore team. In this role, you

will be responsible for ensuring the reliability, performance, and scalability of our critical systems. You'll

develop automation, build monitoring solutions, lead incident response, and work closely with

engineering teams to implement infrastructure as code, CI/CD, and cloud-native tools.

Job Responsibilities:

● Maintain the reliability, availability, and performance of critical systems

● Develop and maintain automation scripts and tools to streamline operations

● Develop and maintain monitoring dashboards and alerts

● Lead incident response, conduct post-mortem analysis, and implement preventative measures

● Optimize system performance and scalability

● Implement and maintain security best practices

● Create and maintain comprehensive system and process documentation

● Participate in on-call rotations for 24/7 critical system support

Must Haves:

● Kubernetes (hands-on experience) – managing and deploying workloads

● AWS Cloud Platform – deep understanding and production experience

● Infrastructure as Code (IaC) – using tools like Terraform (or CloudFormation/Ansible)

● Scripting/Programming – Proficiency in Python or Go

● Monitoring & Alerting – Experience with Prometheus, Grafana

● CI/CD Pipelines – Jenkins, GitLab CI, or similar

● Incident Management – Proven experience in responding to and analyzing outages

● Linux Systems & Networking – Strong fundamentals

Good to Haves:

● ArgoCD, Linkerd, Karpenter, or other Kubernetes-related tools

● Logging tools – Loki, ELK Stack

● Security best practices – Cloud and container security knowledge

● Leadership/Mentorship – Experience guiding junior engineers

● Post-mortem writing & RCA – Comfortable documenting incidents and learnings

● Experience in distributed systems or high-availability architectures

Recruitment Process:

● AI-based online screening test

● Assignment

● 2 client interviews

● CEO Discussion

● Offer: Successful candidates will receive an offer to join the team.

Soft Skills

● Excellent verbal and written communication skills in English - Must

● Strong problem-solving ability with a customer-first mindset

● Accountability – Takes ownership of reliability and incident outcomes.

● Demonstrated ability to operate in high-pressure, multitasking environments independently

● Passion for supporting and helping others


Place of work

Talent Job Seeker
Mexico
app.general.countries.Mexico

About the company

Identifica el mejor Talento con Talent Job Seeker



Job ID: 9607332 / Ref: fd8813fc2c3b961c1650b1fd9f211244

Open application open_in_new

Talent Job Seeker