Hyderabad, Telangana, India
Information Technology
Full-Time
Nanonets
Overview
Responsibilities
- Manage and optimize Kubernetes clusters (EKS) and Karpenter for efficient resource allocation and scaling.
- Improve performance and cost efficiency of GPU-heavy deep learning workloads.
- Maintain and enhance CI/CD pipelines using Jenkins and ArgoCD for seamless deployments.
- Optimize and manage AWS services (EKS, EC2 RDS, S3 OpenSearch, IAM, SQS, etc. ) for reliable infrastructure.
- Enhance observability and incident response with Prometheus, ELK Stack, and Grafana to reduce downtime.
- Expertise in Kubernetes (EKS, Karpenter) and AWS services (mandatory).
- Experience in ML Ops for managing GPU workloads and deploying ML/LLM models (mandatory).
- Strong cloud cost optimization skills for resource-efficient scaling (preferred).
- Proficiency in CI/CD tools like Jenkins and ArgoCD (preferred).
- Strong troubleshooting, problem-solving, and observability skills with monitoring tools (preferred).
- Kubernetes for deployments.
- Jenkins for CI/CD.
- AWS - EKS, RDS, S3 EC2 Lambda, Cloudfront, ECR etc.
- Cassandra DB and RDS.
- Prometheus for Monitoring.
- Golang for API and other microservices.
- Python for Machine learning (Tensorflow, Pytorch).
- React for front end.
- ELK for logging.
Similar Jobs
View All
Talk to us
Feel free to call, email, or hit us up on our social media accounts.
Email
info@antaltechjobs.in