Chennai, Tamil Nadu, India
Information Technology
Other
Infosys Limited

Overview
Key Responsibilities:
- Improve reliability quality and time to market of our suite of products applications
- Define suitable metrics for system with SLO SLI and setup observability mechanism to track it
- Define error budget as per the SLO
- Define strategy and setup up High Availability and Load Balancer based architecture
- Drive a metrics driven culture and software delivery process using data to measure overall system quality and reliability
- Balance feature development speed and reliability with well defined service level objectives
- Provide primary operational support and engineering for products applications
- Partner with solution architect and development teams to improve services reliability
- Participate in system design infra management and capacity planning
- Participate in optimizing code automating operational tasks and toil reduction
- Provide solutions for performance management disaster recovery monitoring and observability
- Work with business users to understand issues develop root cause analysis and work with the development team for enhancements fixes
- Working on distributed traces to visualize the entire workflow and analyze the cause of problems incidents
- Improve security and performance of infrastructure and applications
- Provide support improve and implement infrastructure as code
- Define evangelize and maintain SRE best practices
- Solutionize and implement DevSecOps best practices
- Improve automation including system s self healing capability
Technical Requirements:
- As a Site Reliability Engineer you will play a critical role in supporting application developers by providing expert guidance on Application and infrastructure best practices from reliability perspective
- Your role covers the entire life cycle of a product application
- Your primary focus will be Automation Observability reliability and Release management with CICD with an emphasis on solving operations issues
- At least 3 years of SRE experience in large programs with focus on release engineering observability tasks and reliability
- Must have good understanding of Site Reliability Engineering SRE and release management processes
- should possess strong analytical and troubleshooting skills
- Should be a strong team player and enjoy collaborating with different people and profiles as well as share knowledge and strive for continuous development and learning
- Excellent communication skills along with leadership skills
Preferred Skills:
Technology->DevOps->Continuous integration - Others
Similar Jobs
View All
Talk to us
Feel free to call, email, or hit us up on our social media accounts.
Email
info@antaltechjobs.in