Overview
Job Description
Location: [Ahmedabad]
Employment Type: Full-Time
About the Role
We are seeking a Senior DevOps & MLOps Engineer to lead and optimize our cloud infrastructure, CI/CD pipelines, and ML model deployment strategies. This role focuses on ensuring scalability, reliability, and security for both software and machine learning workloads, leveraging AWS services and best practices. The ideal candidate will have hands-on experience in AWS, Kubernetes, Terraform, CI/CD automation, and MLOps frameworks.
Key Responsibilities
DevOps Responsibilities:
- Architect, implement, and manage AWS cloud infrastructure with best security and cost-efficiency practices.
- Design and optimize CI/CD pipelines for microservices, APIs, and full-stack applications.
- Automate cloud resource provisioning using Terraform, CloudFormation, or AWS CDK.
- Implement Kubernetes (EKS), Docker, and container orchestration strategies for scalable deployments.
- Develop and enforce infrastructure-as-code (IaC) and version-controlled infrastructure strategies.
- Monitor and optimize cloud workloads using AWS CloudWatch, Prometheus, Grafana, and ELK Stack.
- Ensure security best practices with IAM policies, security groups, and compliance frameworks (SOC2, ISO 27001, HIPAA, etc.).
- Automate backup, disaster recovery, and high-availability strategies across AWS services.
MLOps Responsibilities:
- Architect and deploy end-to-end MLOps pipelines for ML model training, validation, deployment, and monitoring.
- Manage ML model lifecycle with MLflow, Kubeflow, SageMaker, or TFX.
- Optimize model training and inference workloads using AWS SageMaker, Lambda, Step Functions, and Batch Processing.
- Automate feature engineering pipelines with AWS Glue, Apache Spark, or Airflow.
- Implement real-time model serving strategies using SageMaker, TensorFlow Serving, TorchServe, or Triton Inference Server.
- Design and enforce CI/CD for ML models, integrating with version control tools (Git, DVC).
- Develop monitoring frameworks to track model drift, data drift, and performance degradation.
- Ensure compliance with data governance and security best practices in ML workflows.
Required Skills & Qualifications
- AWS Certified Solutions Architect / AWS Certified DevOps Engineer / AWS Certified Machine Learning – Specialty (Preferred).
- 5+ years of experience in DevOps, MLOps, or Cloud Infrastructure Engineering.
- Strong expertise in AWS services such as EKS, EC2, S3, Lambda, Step Functions, RDS, DynamoDB, SageMaker, IAM, CloudFormation, and VPC networking.
- Hands-on experience with Kubernetes, Docker, Helm, and container security.
- Strong programming skills in Python, Go, or Bash for automation and scripting.
- Experience with CI/CD tools such as Jenkins, GitHub Actions, GitLab CI/CD, ArgoCD, or CircleCI.
- Deep understanding of Terraform, Ansible, or Pulumi for infrastructure automation.
- Expertise in logging and monitoring tools such as Prometheus, Grafana, ELK Stack, or AWS CloudWatch.
- Hands-on experience with MLOps frameworks (Kubeflow, MLflow, TensorFlow Extended, or AWS SageMaker Pipelines).
- Experience with Apache Airflow, Prefect, or Dagster for workflow automation.
- Strong knowledge of security best practices, IAM, and compliance in AWS environments.
- Experience with API Gateway, FastAPI, Flask, or GraphQL for model serving.
Preferred Qualifications
- Experience with serverless architectures (AWS Lambda, Step Functions, Fargate, API Gateway).
- Familiarity with big data tools (Apache Spark, Kafka, Dask, Snowflake, Databricks).
- Exposure to AIOps and monitoring automation for cloud environments.
- Strong problem-solving, communication, and collaboration skills.
Job Type: Full-time
Pay: ₹589,495.55 - ₹1,758,878.81 per year
Schedule:
- Day shift
Work Location: In person
Expected Start Date: 18/03/2025