Kolkata, West Bengal, India
Information Technology
Full-Time
Infosys
Overview
As a Site Reliability Engineer, you will play a critical role in supporting application developers by providing expert guidance on Application and infrastructure best practices from reliability perspective. Your role covers the entire life cycle of a product/application. Your primary focus will be Automation, Observability, reliability and Release management with CICD with an emphasis on solving operations issues At least 3+ years of SRE experience in large programs with focus on release engineering, observability tasks and reliability Must have good understanding of Site Reliability Engineering (SRE) and release management processes should possess strong analytical and troubleshooting skills Should be a strong team player and enjoy collaborating with different people and profiles as well as share knowledge and strive for continuous development and learning. Excellent communication skills along with leadership skills
Improve reliability, quality, and time-to-market of our suite of products/applications. Define suitable metrics for system with SLO/SLI and setup observability mechanism to track it Define error budget as per the SLO Define strategy and setup up High Availability and Load Balancer based architecture Drive a metrics-driven culture and software delivery process using data to measure overall system quality and reliability. Balance feature development speed and reliability with well-defined service level objectives Provide primary operational support and engineering for products/applications Partner with solution architect and development teams to improve services reliability Participate in system design, infra management and capacity planning Participate in optimizing code, automating operational tasks and toil reduction Provide solutions for performance management, disaster recovery, monitoring and observability Work with business users to understand issues, develop root cause analysis and work with the development team for enhancements/fixes Working on distributed traces to visualize the entire workflow and analyze the cause of problems/incidents Improve security and performance of infrastructure and applications Provide support, improve, and implement infrastructure as code Define, evangelize, and maintain SRE best practices Solutionize and implement DevSecOps best practices Improve automation including system’s self-healing capability
Improve reliability, quality, and time-to-market of our suite of products/applications. Define suitable metrics for system with SLO/SLI and setup observability mechanism to track it Define error budget as per the SLO Define strategy and setup up High Availability and Load Balancer based architecture Drive a metrics-driven culture and software delivery process using data to measure overall system quality and reliability. Balance feature development speed and reliability with well-defined service level objectives Provide primary operational support and engineering for products/applications Partner with solution architect and development teams to improve services reliability Participate in system design, infra management and capacity planning Participate in optimizing code, automating operational tasks and toil reduction Provide solutions for performance management, disaster recovery, monitoring and observability Work with business users to understand issues, develop root cause analysis and work with the development team for enhancements/fixes Working on distributed traces to visualize the entire workflow and analyze the cause of problems/incidents Improve security and performance of infrastructure and applications Provide support, improve, and implement infrastructure as code Define, evangelize, and maintain SRE best practices Solutionize and implement DevSecOps best practices Improve automation including system’s self-healing capability
Similar Jobs
View All
Talk to us
Feel free to call, email, or hit us up on our social media accounts.
Email
info@antaltechjobs.in