Senior Site Reliability Engineer
Software Engineering
Pune, Maharashtra, India
Job Title: Senior DevOps Engineer (SRE Focus)
Experience: 7 to 10 Years
Location: Pune Kharadi (5 Days Work from Office)
About the Rol
eWe are looking for a highly skilled Senior DevOps Engineer with strong Site Reliability Engineering (SRE) experience to join our growing engineering team. The ideal candidate should have hands-on expertise in AWS cloud infrastructure, Kubernetes, Infrastructure as Code, CI/CD automation, and production reliability engineering
.This role requires someone who can design, automate, deploy, monitor, and optimize large-scale cloud-native environments while ensuring high availability, scalability, security, and operational excellence
.Key Responsibilitie
- sDesign, deploy, and manage highly available cloud infrastructure on AWS
- .Build and maintain scalable Kubernetes environments
- .Implement Infrastructure as Code using Terraform and CloudFormation
- .Develop and manage CI/CD pipelines using Jenkins
- .Drive automation across infrastructure provisioning, deployments, monitoring, and incident management
- .Work closely with engineering teams to improve system reliability, performance, and scalability
- .Manage production incidents, root cause analysis, and postmortems
- .Implement monitoring, alerting, logging, and observability solutions
- .Optimize cloud infrastructure for cost, security, and performance
- .Support containerized applications and microservices architectures
- .Maintain and improve deployment workflows using Git and Bitbucket
- .Contribute to SRE best practices including SLIs, SLOs, and error budgets
.Mandatory Skill
- s7 to 10 years of DevOps / SRE experience
- .Strong hands-on experience with AWS
- .Production-grade Kubernetes (K8s) experience
- .Expertise in Terraform
- .Experience with AWS CloudFormation
- .Strong Jenkins pipeline creation and administration experience
- .CI/CD implementation and automation expertise
- .Linux administration and troubleshooting
- .Shell Scripting, Bash, or Python
- .Monitoring and logging tools such as ELK, Prometheus, Grafana, CloudWatch, etc
- .Experience handling production environments and critical incidents
.Preferred Skill
- sSite Reliability Engineering (SRE) experience
- .Bitbucket administration and pipeline integration
- .Docker and containerization technologies
- .Azure cloud exposure
- .Security, compliance, and infrastructure governance experience
- .Experience working in Agile environments
.Ideal Candidate Profil
- eStrong hands-on engineer, not a people manager
- .Experience supporting large-scale production environments
- .Proven expertise in reliability engineering and operational excellence
- .Startup or product company experience preferred
- .Excellent troubleshooting and problem-solving skills
- .Strong communication and stakeholder management abilities
.Nice to Hav
- eAWS Certifications
- .Kubernetes Certifications
- .Azure exposure
- .Experience in healthcare, fintech, SaaS, or product-based organizations