Overview
Responsibilities
Design, develop, and maintain scalable and efficient data pipelines using industry-standard tools and technologies.
Implement and optimize ETL processes to ensure data quality, reliability, and performance.
Collaborate with data scientists, analysts, and other stakeholders to understand data requirements and deliver robust data solutions.
Utilize Apache Spark for large-scale data processing and analysis.
Manage containerized applications using Docker and orchestrate them with Kubernetes.
Automate workflows and data pipelines using Apache Airflow.
Leverage Azure cloud services for data storage, processing, and analytics.
Design and maintain SQL Server databases, ensuring high availability and performance.
Implement data quality checks using Great Expectations to ensure data accuracy and consistency.
Contribute to the overall system design and architecture, ensuring scalability and maintainability.
Troubleshoot and resolve data-related issues, ensuring minimal downtime and data loss.
Stay current with emerging trends and technologies in data engineering and propose new solutions to enhance the data infrastructure.
Requirements
6+ years of experience in data engineering or a related field.
Proficient in Python for data manipulation and automation.
Hands-on experience with Apache Spark for big data processing.
Extensive experience with Docker for containerization and Kubernetes for orchestration.
Proficient in using Apache Airflow for workflow automation.
Strong experience with Azure cloud services (e.g., Azure Data Lake, Azure SQL Database, Azure Databricks).
Expertise in SQL Server databases, including design, optimization, and management.
Familiarity with data quality tools like Great Expectations.
Solid understanding of system design principles and best practices.
Excellent problem-solving skills and attention to detail.
Strong communication skills and the ability to work collaboratively in a team environment.
Nice to Have
Knowledge of Laboratory Information Management Systems (LIMS) databases.
Experience with data science tools and techniques, including Scikit-learn, decision tree algorithms, similarity matching algorithms, SHAP values, and anomaly detection.
Job Type: Full-time
Pay: ₹446,732.35 - ₹1,738,642.18 per year
Benefits:
- Health insurance
- Leave encashment
- Provident Fund
- Work from home
Schedule:
- Day shift
Work Location: In person