Lead Software Engineer, Cloud Site Reliability (SRE)

Icertis

Icertis

Software Engineering

Pune, Maharashtra, India

Posted on May 6, 2026

Role Responsibilities:

  • Lead 24x7 NOC operations with mandatory rotational shifts ensuring system availability and SLA adherence

  • Act as Major Incident Manager (P1/P2 incidents), driving triage, war room coordination, and stakeholder communication

  • Implement and enhance observability practices across logs, metrics, and traces

  • Work with tools like Datadog and Azure Monitor for monitoring and alerting

  • Drive proactive monitoring, alert tuning, anomaly detection, and AIOps initiatives

  • Manage Azure infrastructure and AKS clusters, including troubleshooting, scaling, and performance tuning

  • Build automation and self-healing workflows using Terraform, ARM, Helm, Power Automate, and scripting

  • Collaborate with engineering teams to improve reliability, deployment pipelines, and cloud-native architecture

  • Develop dashboards and reports using Power BI and ServiceNow

  • Handle Monthly Business reviews and leadership reporting

  • Mentor team members and drive process standardization and operational excellence


Icertis is the global leader in AI-powered contract intelligence. The Icertis platform revolutionizes contract management, equipping customers with powerful insights and automation to grow revenue, control costs, mitigate risk, and ensure compliance - the pillars of business success. Today, more than one third of the Fortune 100 trust Icertis to realize the full intent of millions of commercial agreements in 90+ countries.


Who we are: Icertis is the only contract intelligence platform companies trust to keep them out in front, now and in the future. Our unwavering commitment to contract intelligence is grounded in our FORTE values—Fairness, Openness, Respect, Teamwork and Execution—which guide all our interactions with employees, customers, partners, and stakeholders. Because in our mission to be the contract intelligence platform of the world, we believe how we get there is as important as the destination.

Icertis, Inc. provides Equal Employment Opportunity to all employees and applicants for employment without regard to race, color, religion, gender identity or expression, sex, sexual orientation, national origin, age, disability, genetic information, marital status, amnesty, or status as a covered veteran in accordance with applicable federal, state and local laws. Icertis, Inc. complies with applicable state and local laws governing non-discrimination in employment in every location in which the company has facilities. If you are in need of accommodation or special assistance to navigate our website or to complete your application, please send an e-mail with your request to careers@icertis.com or get in touch with your recruiter.



About CloudOps Team: CloudOps team is responsible for availability, reliability, performance, monitoring, emergency response, and capacity planning of Icertis SaaS applications and related services. CloudOps executes infra & access provisioning, upgrades, deployments, and change management to drive faster time to market. This team plays a critical role in building and executing the cloud strategy for the company, driving architectural improvements to enhance scalability and optimize overall cost.

Preferred Skills:

  • Experience in multi-cloud environments (AWS/GCP)

  • Exposure to AIOps / predictive monitoring / self-healing systems

  • Azure / Kubernetes certifications


Required Skills:

  • 7–12 years of experience in CloudOps / SRE / NOC environments (24x7 operations)

  • Strong expertise in Azure Infrastructure (VMs, Networking, Storage)

  • Hands-on experience with Azure Kubernetes Service (AKS), Kubernetes, Docker

  • Strong experience with monitoring and observability tools (Datadog, Azure Monitor, Prometheus, Grafana)

  • Proven experience in Incident Management / Major Incident Handling, Monthly reporting

  • Experience with Infrastructure as Code (Terraform, ARM templates, Helm)

  • Scripting skills in PowerShell, Python, or Bash

  • Experience with ServiceNow (Incident, Problem, Change modules and dashboards)

  • Strong reporting and analytics experience using Power BI and exposure to tools like Power Automate

  • Good understanding of distributed systems and cloud-native architecture

  • Excellent communication, leadership, and problem-solving skills