Overview
Job Description: Senior Data Engineer
Position: Senior Data Engineer (GCP)
Experience Required: 7+ years of relevant experience
Location: Bangalore
Notice Period: Immediate
Role Overview:
We are seeking a highly skilled Senior Data Engineer with a strong background in PySpark development, distributed systems, and cloud platforms (GCP/AWS/Azure). The ideal candidate will design, build, and optimize scalable data pipelines, enabling the integration and consumption of data for machine learning models. This role demands expertise in coding, data processing, and cloud computing technologies, along with proficiency in workflow scheduling tools and big data platforms.
Key Responsibilities:
- Data Preparation & Optimisation:
- Clean, prepare, and optimise data at scale for ingestion and consumption by machine learning models.
- Perform data profiling and analysis to design scalable solutions.
- Data Architecture & Pipeline Development:
- Lead the implementation of new data management projects and re-structure current data architecture.
- Design and develop reusable components, frameworks, and libraries for machine learning products.
- Workflow Automation:
- Implement complex automated workflows using tools like Airflow, Kubeflow, or other workflow schedulers.
- Build and maintain continuous integration, test-driven development, and production deployment frameworks.
- Collaboration & Mentorship:
- Conduct collaborative reviews of designs, code, and test plans to uphold data engineering standards.
- Mentor and guide junior engineers in adopting best practices.
- Problem-Solving:
- Troubleshoot complex data issues and perform root cause analysis.
- Identify and resolve issues concerning data quality and management.
- Stakeholder Collaboration:
- Design and implement product features in collaboration with business and technology stakeholders.
- Communicate effectively with team members and stakeholders, both verbally and in writing.
Qualifications & Skills: Mandatory Skills:
- Programming & Development:
- Strong expertise in PySpark, Spark (Scala/Python/Java), and Scala.
- Proficient in Python and shell scripting languages.
- Strong understanding of data structures and algorithms.
- Cloud Platforms:
- Extensive experience in Google Cloud Platform (GCP) tools: BigQuery, Dataflow, Dataproc, AI Building Blocks, Looker, Cloud Data Fusion, and Dataprep.
- Exposure to other cloud platforms (AWS, Azure) and their respective data services.
- Data Engineering Tools:
- Proficient in relational SQL and GitHub for source control.
- Hands-on experience with workflow scheduling tools like Airflow/Kubeflow.
- Big Data Technologies:
- Hands-on experience with Databricks, distributed platforms (e.g., Hadoop Cloudera, AWS EMR, Azure HD Insights), and tools like Presto.
Desirable Skills:
- Familiarity with Kubernetes, Google Cloud Functions, and machine learning or deep learning concepts.
- Experience in data modeling, data warehousing, and streaming architectures.
- Knowledge of microservices and serverless architecture design.
Experience Requirements:
- GCP Exposure: Minimum 3 years (up to 7 years).
- Spark/PySpark Expertise: Minimum 5 years (up to 9 years).
- Relational SQL, Shell Scripting, Python: Minimum 4 years (up to 8 years).
- Workflow Scheduling Tools (Airflow/Kubeflow): Minimum 3 years (up to 7 years).
- Scala, Databricks, Kubernetes, Google Cloud Functions: 2+ years preferred.
Soft Skills:
- Strong analytical and problem-solving skills.
- Excellent communication and collaboration abilities.
- Proactive and able to work in a fast-paced environment.
Join us to innovate and build next-generation data engineering solutions!
Job Types: Full-time, Permanent
Pay: ?3,000,000.00 - ?3,500,000.00 per year
Benefits:
- Health insurance
- Provident Fund
Schedule:
- Day shift
- Monday to Friday
Application Question(s):
- Are you comfortable with 4 days work from office?
Experience:
- total work: 7 years (Required)
Location:
- Bangalore, Karnataka (Required)
Work Location: In person
Expected Start Date: 03/02/2025