
Overview
This position will primarily be responsible for designing, developing, and maintaining robust ETL/ELT pipelines for data ingestion, transformation, and storage. The position will also be responsible for the design and development of data solutions.
The person will work with a team that is responsible for ensuring the availability, reliability, and performance of data systems. The candidate will design, develop, and maintain scalable data pipelines and infrastructure on cloud platforms (Azure or AWS, or GCP).
The person will also work with a collaborative team responsible for driving client performance by combining data-driven insights and strategic thinking to solve client business challenges. The candidate should have strong organizational, critical thinking, and communication skills that allow them to regularly interact with various stakeholders.
Responsibilities:
- Design, build, and optimize data ingestion pipelines using Azure Data Factory or AWS Glue, or Google Cloud Data Fusion
- Ensure reliable extraction, transformation, and loading of large datasets from various sources
- Implement and manage data storage solutions like Databricks, ensuring high performance and scalability
- Collaborate with database administrators and data architects to optimize database schemas
- Develop and maintain complex BI reporting systems to provide actionable insights
- Work closely with business stakeholders to understand requirements and translate them into technical specifications
- Monitor data quality and integrity across pipelines
- Maintain comprehensive documentation for data processes, pipelines, and reporting systems
- Collaborate with cross-functional teams to streamline data operations
- Stay updated on emerging data engineering technologies and best practices to continually enhance system performance
The ideal candidate for the position should have the following skills and experience:
Technical Qualifications:
- Experience in data ETL tools and cloud platforms like Azure Data Factory, Azure Databricks, or AWS Glue, Amazon EMR, Databricks on AWS, or Google Cloud Data Fusion, Databricks on Google Cloud
- Experience in SQL for data manipulation and querying
- Experience with at least one of the BI tools like Power BI, Tableau, or Google Looker
- Experience with relational databases (e.g., SQL Server, MySQL, PostgreSQL) and NoSQL databases (MongoDB)
- Proficiency in Python programming
- Proficiency in developing and maintaining data visualization and reporting dashboards
- Familiarity with data warehousing solutions and big data frameworks (Apache Spark)
Personal Skills:
- Strong analytical skills: can read business requirements, analyze problems, and propose solutions to ensure successful implementation
- Ability to identify alternatives and find the optimal way to implement a solution
- Ability to follow through and ensure the correct logic is applied
- Ability to quickly learn new concepts and software
- Ability to work in a team environment
- Ability to manage time across multiple tasks and juggle competing deadlines
- Effective written and verbal communication skills
- Ability to quickly learn new concepts and tools
Education and Work Experience:
- Background in Computer Science, Information Technology, Data Science, or a related scientific discipline is preferred
- The candidate should have a minimum of 3 years total experience with at least 2 years of relevant experience in data engineering and data pipeline development
- Certification in Databricks, Azure Data Engineering, or any related data technology is an added advantage