Gurugram, Haryana, India
Finance & Banking
Full-Time
Accolite
Overview
ob Title: Site Reliability Engineer (SRE)
Job Type: Full-time
Role Overview
As a Site Reliability Engineer, you will be responsible for managing and optimizing our production systems, with a focus on automation, scalability, and system performance. You'll work closely with development, operations, and other teams to maintain high availability and efficient workflows. The ideal candidate will have hands-on experience with cloud platforms, container orchestration, and infrastructure automation.
Key Responsibilities
Job Type: Full-time
Role Overview
As a Site Reliability Engineer, you will be responsible for managing and optimizing our production systems, with a focus on automation, scalability, and system performance. You'll work closely with development, operations, and other teams to maintain high availability and efficient workflows. The ideal candidate will have hands-on experience with cloud platforms, container orchestration, and infrastructure automation.
Key Responsibilities
- Manage and optimize cloud infrastructure using AWS/GCP to ensure system reliability, scalability, and security.
- Develop and maintain infrastructure as code (IaC) using Terraform to automate provisioning and management of resources.
- Administer containerized environments using Docker and Kubernetes for deployment and scaling of applications.
- Ensure high availability and reliability of services, implementing monitoring, alerting, and logging solutions.
- Collaborate with developers to ensure applications are optimized for reliability, performance, and scalability.
- Automate the deployment pipeline using ArgoCD for continuous delivery of applications.
- Troubleshoot, identify, and resolve performance issues across the infrastructure and applications.
- Develop and maintain scripts using Python or other languages for automation and management tasks.
- Participate in incident response, performing root cause analysis, and driving post-mortem processes.
- Maintain and enhance CI/CD pipelines to improve deployment speed and reliability.
- Work with cross-functional teams to define and meet service-level objectives (SLOs) and service-level agreements (SLAs).
- Bachelor’s degree in Computer Science, Information Systems, or related field, or equivalent experience.
- 3-8+ years of experience in a Site Reliability Engineer (SRE) role or similar.
- Strong experience with cloud platforms like AWS and GCP.
- Hands-on expertise with Linux systems administration.
- Experience with Kubernetes for container orchestration and scaling.
- Proficiency in Docker for containerization.
- Solid understanding of monitoring and observability practices, including alerting and troubleshooting systems.
- Strong experience in Terraform for Infrastructure-as-Code (IaC).
- Proficiency in Python for automation and scripting tasks.
- Experience with ArgoCD for continuous deployment and managing application releases.
- Experience with CI/CD tools like Jenkins, GitLab CI, or CircleCI.
- Familiarity with Prometheus, Grafana, and other monitoring and alerting tools.
- Be part of a dynamic, innovative team working on cutting-edge technologies.
- Opportunities for professional growth and career advancement.
- Competitive salary and benefits package.
- Flexible work hours and remote work options.
Similar Jobs
View All
Talk to us
Feel free to call, email, or hit us up on our social media accounts.
Email
info@antaltechjobs.in