Overview
Job Title: Data Lake Developer( Data Engineer)
Location: Remote
Years of Exp: 5 - 10 years
Job Overview:
We are looking for a skilled and motivated Senior Data Engineer to join our dynamic team. This is an exciting opportunity for a data-driven professional with 5-10 years of experience to lead data architecture, integration, and pipeline development initiatives. You will work on scalable data solutions, ensuring high-performance data flows across cloud and on-premises environments to support analytics, machine learning, and business intelligence efforts.
Key Responsibilities:
● Design, develop, and maintain scalable data pipelines for ETL/ELT to process large datasets from diverse sources.
● Architect and optimize data workflows and storage solutions for performance, scalability, and reliability.
● Develop and manage data storage solutions across cloud-based (AWS, GCP, Azure) and on-premises environments.
● Build and maintain data lakes, data warehouses, and real-time streaming architectures to enable efficient querying and analytics.
● Optimize SQL queries and database performance for fast and efficient data retrieval.
● Lead and mentor junior data engineers, providing guidance on best practices, debugging, and technical problem-solving.
● Ensure data security, integrity, and compliance with industry standards and best practices.
● Troubleshoot data issues, monitor pipeline performance, and proactively resolve discrepancies.
● Participate in data architecture discussions, contributing to the continuous improvement of the data engineering landscape.
● Work closely with data scientists, analysts, and cross-functional teams to gather and implement data requirements.
Required Skills & Experience:Technical Skills:
● 5-10 years of experience in data engineering with expertise in data lake architectures.
● Proficiency in programming languages: Python, Java, or Scala.
● Strong experience with cloud platforms: AWS, GCP, or Azure.
● Expertise in cloud-based data engineering tools, such as:
○ AWS: Redshift, Athena, Glue, EMR, Kinesis
○ GCP: BigQuery, DataProc
○ Azure: Synapse Analytics, Data Factory
○ Other: Snowflake, Apache Spark, Apache Kafka
● ETL & Data Pipeline Tools: Experience with Apache Airflow, dbt, or similar tools.
● Distributed Computing & Big Data Technologies: Strong understanding of Hadoop, Spark, Kafka, Flink, or Redpanda.
● Database Expertise:
○ SQL & NoSQL Databases: MySQL, PostgreSQL, MongoDB, DynamoDB, ClickHouse
● Frameworks & Libraries:
○ DRF, FastAPI, Flask, Celery, RabbitMQ, Redpanda
● Containerization & CI/CD:
○ Experience with Docker, Kubernetes (kubectl), and CI/CD pipelines is a plus.
● Version Control & DevOps Tools:
○ Git, Apache Airflow, Snowflake, VS Code, Postman
Soft Skills:
● Strong problem-solving and analytical skills with a keen focus on performance optimization.
● Excellent communication and collaboration skills in cross-functional environments.
● Ability to work in an agile environment, ensuring timely delivery of scalable data solutions.
If you're a data-driven problem solver with a passion for building scalable and reliable data architectures, we’d love to hear from you!
## Interested Candidates Kindly share your cv to sales@cosmoops.com
Job Types: Full-time, Permanent, Contractual / Temporary
Contract length: 6 months
Schedule:
- Monday to Friday
Work Location: Remote