Overview
L1 Engineer (EXP 1-2 Years) ● Monitor Kubernetes system health using tools like Prometheus, Grafana, or OpenShift’s internal monitoring tools. ● Handle common user queries, provide basic troubleshooting, and resolve low-complexity issues (e.g., pod failures, application issues, etc.). ● Log tickets, classify issues, and prioritize tasks as per SLA ● Monitor alerts and logs, acknowledging incidents and escalating to L2/L3 if required. ● Respond to system alerts regarding resource utilization (CPU, memory, disk), pod restarts, and other environment-related issues ● Perform basic health checks on Kubernetes/OpenShift clusters, pods, and applications. ● Follow standardized troubleshooting guides and workflows to document resolution steps. ● Provide feedback on recurring issues to the L2/L3 teams for further refinement of procedures or troubleshooting guides. ● Assist developers, system administrators, and other users with Kubernetes/OpenShift-related queries. ● Offer guidance on accessing the platform, using kubectl or oc commands, and interacting with Kubernetes/OpenShift resources. L2 Engineer (2-5 Years) ● Respond to incidents and service requests from the L1 team or other stakeholders. ● Work with the L1 team to diagnose issues and escalate complex cases. ● On-boarding micro-services/container images in Kubernetes platform. ● Perform upgrades and patch management for Kubernetes/OpenShift environments and supporting components like Ingress, Registry. ● Ensure proper configuration of networking, storage, and security policies within cluster ● Perform routine checks for high availability, scaling, and backup. ● Investigate and resolve issues related to pod failures, container issues, networking, resource limits, pod scheduling, image pulling, etc. ● Review logs and metrics to identify potential issues before they escalate. ● Assist in Conducting root cause analysis for recurring incidents and suggest improvements. ● Create and maintain documentation related to troubleshooting steps, cluster configuration, and best practices. ● Create and Update runbooks of issues faced ● Ensure the high availability of clusters and implement auto-scaling as necessary. ● Optimize resource usage and cost. ● Implementing context-based Routing as and when Required ● Alert Setup and actions based on alerts ● Engage required specialized team (e.g., Network, Dev, L3 etc.) ● Work with OEM Support and ensure resolution of issues reported ● Problem management through analysis of Incident Records, operational logs etc. ● Remediate security violations as identified by the customer's security team. ● Assist in Implementing and enforce security policies and standards & Ensure compliance with industry regulations L3 Engineer: (5+ Years) ● Tune the performance of Kubernetes clusters. ● Provide deep technical expertise to resolve complex issues not addressed by the L2 team. ● Lead troubleshooting of Kubernetes/OpenShift cluster failures, networking issues, and performance bottlenecks. ● Address issues related to multi-cluster setups, persistent storage, and integrations with other services. ● Design and optimize Kubernetes/OpenShift architectures for scalability, high availability, and fault tolerance. ● Recommend and implement improvements to cluster performance, security, and reliability. ● Manage multi-cloud or hybrid cloud Kubernetes environments and optimize cross-platform operations. ● Ensure compliance with industry standards, best practices, and internal security guidelines. ● Implement advanced automation for provisioning, scaling, and managing Kubernetes/OpenShift environments using tools like Terraform, Ansible, etc. ● Design and optimize CI/CD pipeline configurations for Kubernetes/OpenShift environments. ● Take the lead on major incidents and outages, perform root cause analysis (RCA), and propose long-term solutions to prevent recurrence. ● Provide detailed post-incident reports and collaborate with teams to ensure timely resolution. ● Quarterly Health Check Audits ● Optimization based on Health Checks ● Participate in the planning and re-design of Kubernetes architecture. ● Check compatibility of various Kubernetes components and create an upgrade/migration plan. Certifications (Good to Have) ● Certified Kubernetes Administrator ● Red Hat Openshift Administrator
Job Type: Full-time
Pay: Up to ₹2,000,000.00 per year
Benefits:
- Provident Fund
Schedule:
- Day shift
Work Location: In person