Noida, Uttar Pradesh, India
Information Technology
Full-Time
Redpin
Overview
We are seeking a skilled and motivated Cloud Engineer (Observability) to join our team and enhance the reliability, performance, and visibility of our applications. The ideal candidate will have hands-on experience with observability tools like Prometheus, Grafana, OpenTelemetry, and Coralogix, as well as a deep understanding of application performance monitoring (APM) for modern cloud environments. You will play a crucial role in designing and implementing monitoring and observability strategies for Java-based backend services, React-based frontends, and containerized applications deployed on AWS EC2 and Kubernetes platforms.
Responsibilities
Responsibilities
- Design and implement robust observability solutions using Prometheus, Grafana, OpenTelemetry, and Coralogix for real-time monitoring and logging.
- Configure and maintain Application Performance Monitoring (APM) solutions to optimize the performance of Java and React applications.
- Build and maintain custom dashboards, alerts, and metrics to monitor the health, performance, and availability of applications and infrastructure.
- Collaborate with development and SRE teams to integrate observability best practices into CI/CD pipelines and deployment workflows.
- Establish logging, tracing, and metrics collection standards to ensure end-to-end visibility of distributed systems.
- Work on anomaly detection and root cause analysis using observability data and tools.
- Automate incident detection and response processes to reduce Mean Time to Detection (MTTD) and Mean Time to Resolution (MTTR).
- Continuously optimize monitoring strategies to minimize overhead while maintaining comprehensive visibility.
- Troubleshoots complex performance issues across application layers, including backend, frontend, and infrastructure.
- Ensure compliance with security and operational best practices when handling observability data.
- 3-7 years of experience in a similar role with a focus on observability and monitoring.
- Proficiency with tools like Prometheus, Grafana, OpenTelemetry, and APM solutions.
- Strong knowledge of cloud environments, particularly AWS, including services like EC2 EKS, CloudWatch, and RDS.
- Hands-on experience monitoring and optimizing Java backend services and React-based frontend applications.
- Expertise in monitoring containerized applications using Kubernetes, Docker, or equivalent platforms.
- Knowledge of distributed tracing and metrics collection to analyze complex application behaviors.
- Experience in implementing log and trace instrumentation, as well as utilizing aggregation and analysis tools to centralize data and extract actionable insights.
- Familiarity with CI/CD pipelines and integrating observability workflows into automated deployments.
- Excellent problem-solving skills, with the ability to identify and resolve performance bottlenecks in large-scale systems.
- Strong communication skills to collaborate across teams and translate technical insights into actionable recommendations.
- Experience implementing Coralogix observability solutions for AWS environments.
- Exposure to DevOps and SRE principles, including incident management and reliability engineering.
- Knowledge of DORA metrics to monitor and improve deployment practices.
- Hands-on scripting or development skills in Python, Bash, or similar languages.
Similar Jobs
View All
Talk to us
Feel free to call, email, or hit us up on our social media accounts.
Email
info@antaltechjobs.in