Human Rights & Ethics in Tech
Full-Time
SurveySparrow

Overview
Job Title: Cloud Engineer
Location: Chennai (Work from Office)
Job Summary:
We are seeking a highly skilled Cloud Engineer to maintain the performance, stability, and reliability of our production systems. The ideal candidate should have strong troubleshooting, monitoring, and problem-solving skills, along with a deep understanding of cloud-based environments. This role involves close collaboration with DevOps, SRE, and product engineering teams to enhance system scalability, resilience, and efficiency.
Key Responsibilities:
Production Support & Monitoring:
- Continuously monitor production environments to ensure system uptime, performance, and stability.
- Diagnose and resolve application, infrastructure, and network-related issues in real time.
- Conduct root cause analysis (RCA) for recurring incidents and implement long-term fixes.
- Utilize monitoring and observability tools such as Prometheus, Grafana, New Relic, and ELK Stack to track system health and detect anomalies.
- Implement and improve alerting, logging, and monitoring strategies to proactively identify and resolve issues.
Incident Management & Collaboration:
- Provide Tier 2/Tier 3 support, troubleshoot complex issues, and escalate as needed.
- Participate in on-call rotations to promptly address critical incidents and minimize system downtime.
- Collaborate with SRE, DevOps, and engineering teams to enhance system reliability and scalability.
- Offer technical guidance and mentorship to junior engineers, fostering a culture of continuous learning.
- Maintain comprehensive documentation of issues, troubleshooting steps, and best practices for future reference.
Automation & Process Improvement:
- Identify and automate recurring operational tasks to improve efficiency and reduce manual intervention.
- Deploy hotfixes, patches, and updates to enhance system security and optimize performance.
- Improve incident response processes to minimize escalations and reduce resolution times.
- Work closely with development teams to understand new feature functionalities and ensure seamless deployments.
Requirements:
- 1-4 years of experience in production support, cloud operations, or system administration.
- Hands-on experience with cloud platforms such as AWS, Azure, or GCP and troubleshooting cloud-based applications.
- Strong problem-solving and analytical skills with a proactive approach.
- Proficiency in application monitoring tools like Prometheus, Grafana, and New Relic.
- Experience with incident management and ticketing systems (e.g., JIRA, Zendesk).
- Working knowledge of databases
- Familiarity with scripting languages and automation tools is a plus.
- Excellent communication and collaboration skills to effectively work with global teams.
Similar Jobs
View All
Talk to us
Feel free to call, email, or hit us up on our social media accounts.
Email
info@antaltechjobs.in