Overview
Infusion 51a is looking for a talented and detail-oriented Data Engineer to help us build the infrastructure that powers our data-driven decision-making process. The ideal candidate will be responsible for designing, building, and maintaining scalable data pipelines and storage systems that ensure data is accessible, reliable, and optimized for performance. You’ll work closely with our analytics, marketing, and product teams to deliver insights that drive business growth and efficiency.
As a Data Engineer, you will have a critical role in transforming raw data into actionable insights and will help build the architecture that enables us to unlock the full potential of our data. If you are passionate about data, problem-solving, and working in a collaborative, fast-paced environment, we’d love to hear from you!
Key Responsibilities:
• Data Pipeline Development:
• Design, develop, and maintain robust, scalable, and high-performance data pipelines to collect, process, and store large volumes of data from various sources.
• Implement data integration solutions to consolidate data from multiple platforms and systems (CRM, web analytics, social media, marketing automation, etc.).
• Ensure data accuracy and consistency across systems by establishing monitoring and validation protocols.
• Database Management & Optimization:
• Create and manage large-scale databases and data warehouses optimized for performance and reliability.
• Build and maintain data storage solutions (SQL, NoSQL) and ensure that they are secure and well-structured for efficient querying and reporting.
• Optimize queries and databases for speed and scalability.
• Collaboration with Cross-functional Teams:
• Work closely with the analytics, marketing, and product teams to understand their data needs and deliver scalable data solutions.
• Assist the data science and analytics teams by providing the clean, well-organized datasets they need to generate insights.
• Collaborate with stakeholders to design and implement data models that support business intelligence and analytics needs.
• Data Governance & Security:
• Implement data governance best practices to ensure data integrity, security, and compliance with applicable regulations.
• Establish and maintain data quality standards, and monitor data pipelines for issues related to performance, security, and data loss.
• Automation & Reporting:
• Automate repetitive data processes to increase efficiency and reduce manual intervention.
• Design and implement tools for data monitoring and alerting to ensure the reliability of data systems.
• Support the development of real-time reporting tools that provide key business insights.
Skills & Qualifications:
• Bachelor’s degree in Computer Science, Engineering, Information Systems, or a related field.
• 3+ years of experience in data engineering or a similar role.
• Strong experience with SQL and NoSQL databases (PostgreSQL, MySQL, MongoDB, etc.).
• Proficiency with big data technologies such as Hadoop, Spark, and data warehouse solutions like Redshift, Snowflake, or Google BigQuery.
• Experience with ETL (Extract, Transform, Load) processes and data integration tools (e.g., Apache NiFi, Talend, or similar).
• Familiarity with cloud platforms (AWS, GCP, Azure) and cloud-based data storage and processing solutions.
• Knowledge of data modeling, data architecture, and data governance best practices.
• Strong programming skills in Python, Java, or Scala.
• Excellent problem-solving skills and attention to detail.
• Strong collaboration and communication skills, with the ability to work with technical and non-technical stakeholders.
Bonus Qualifications:
• Experience with marketing or e-commerce data pipelines and analytics.
• Familiarity with tools like Airflow for workflow automation.
• Understanding of machine learning models and AI technologies.
What’s in it for you?
• Cutting-Edge Tech: Work with the latest tools and technologies in data engineering and analytics.
• Impactful Work: Play a key role in driving data-driven decisions across the company.
• Collaborative Culture: Join a supportive and dynamic team that values collaboration and innovation.
• Career Growth: As our data needs grow, so will your opportunities for advancement.
• Competitive Salary & Benefits: We offer a competitive salary, performance-based bonuses, and a comprehensive benefits package.
Test Project: Data Engineer
Objective:
We want to assess your ability to design, build, and optimize data pipelines, ensuring that data is well-structured and easily accessible for analysis. For this project, you’ll be tasked with developing a data pipeline and performing some basic data transformations to simulate real-world data handling at Brands by Infusion.
Project Requirements:
1. Data Pipeline Creation:
• Goal: Build a data pipeline to extract, transform, and load (ETL) marketing data into a database.
• Data Source: You’ll be provided with CSV files that contain marketing data such as social media engagement, email campaign performance, and web traffic metrics.
• Process:
• Write a script to extract data from the CSV files.
• Transform the data by:
• Cleaning up any missing or incorrect values.
• Normalizing the data (e.g., ensuring consistent date formats, unit conversions).
• Aggregating some data (e.g., summarizing daily traffic metrics by week).
• Load the cleaned and transformed data into a database (you can choose SQL or NoSQL).
2. Database Design:
• Goal: Design and implement a database schema that efficiently stores the marketing data for easy querying.
• Requirements:
• Develop a schema for the database that supports efficient querying of:
• Daily and weekly social media engagement trends.
• Email open and click-through rates across multiple campaigns.
• Web traffic metrics such as sessions, bounce rates, and conversion rates.
• The schema should support future data expansion (e.g., adding new campaigns, platforms).
• Document your rationale for the schema design in a brief README file.
3. Data Querying:
• Goal: Write a few sample queries to demonstrate how the data can be retrieved and analyzed.
• Requirements:
• Write SQL (or NoSQL) queries that:
• Fetch weekly trends in social media engagement.
• Retrieve the top 3 email campaigns by click-through rate.
• Summarize web traffic by traffic source (e.g., organic, paid, referral).
• Include these queries in your project submission, either as a SQL script or explained in your README file.
4. Bonus (Optional):
• Automation: Set up a simple job scheduler (e.g., using Airflow or Cron) to automate the pipeline so that it runs daily.
• Data Visualization: If you’d like, create a simple dashboard (using a tool like Tableau, PowerBI, or Google Data Studio) to visualize the data and present trends.
Submission Guidelines:
• Deliverables:
• A GitHub repository with:
• Your data pipeline code, including the script(s) for the ETL process.
• Database schema and sample queries.
• A README file with instructions on how to set up, run the project, and any assumptions or design decisions you made.
• (Optional) If you created a dashboard, provide screenshots or a link to the live dashboard.
• Timeline: You have [insert time frame here] to complete the project.
• Evaluation Criteria:
• Clean, well-documented code with clear instructions for running the project.
• Efficiency and scalability of the data pipeline.
• Quality of the database design and the ability to handle future data.
• Demonstration of thoughtful data handling and transformations.
• Bonus points for automation or data visualization.