
Overview
Catalyst Clinical Research provides customizable solutions to the biopharmaceutical and biotechnology industries through Catalyst Oncology, a full-service oncology CRO, and multi-therapeutic global functional and CRO services through Catalyst Flex. The company's customer-centric flexible service model, innovative technology, expert team members, and global presence advance clinical studies. Visit CatalystCR.com
The Data Engineer is a key member of the Data Engineering Team responsible for performing tasks related to all aspects of DataOps lifecycle. The products and services you build will enable analytical applications, AI products, and integrations enterprise systems. You-along with your teammates- will work with the internal and external stakeholders to turn requirements into solutions that will drive better decision making through the organization. You will develop marts, warehouses, models, and logic that contain and distill the intricate innerworkings of Catalyst through data; you will work with the Associate Director, Enterprise Data Architecture to ensure this fit within the overall analytics framework.
- Design, build, and maintain scalable data pipelines using Databricks Delta, and structured streaming with Delta Live Tables.
- Manage Unity Catalog for efficient data governance across multiple domains, regions, and end users.
- Develop and manage transformations in dbt and Databricks using Medallion/Multi-Hop Architecture, ensuring best practices in code quality and data modeling.
- Manage DAG workflows using systems like Airflow or Databricks Workflows to optimize data processing tasks.
- Build upon our CI/CD strategies on GitLab/GitActions for automated testing and deployment of data artifacts.
- Collaborate with cross-functional teams across different regions to ensure highly-availability and quality data integration.
Education: B.S. or M.S. Computer Science, Engineering, Economics, Mathematics, related field, or relevant experience
Experience:
- 3+ years of Data Engineering experience, including Webhooks, API, ELT/ETL, Data Lakehouse Architecture, and Event-Driven Architectures.
- 3+ years of Data Architecture experience, including data modeling for semantic layers, normalization forms and OBT.
- 3+ years of experience with cloud computing technologies (Azure, AWS, GCP)
- 3+ years of experience with the Databricks Data Intelligence platform
- Working knowledge of UX design methods as it relates to analytical and AI platforms.
- Prior experience with project management tools such as JIRA.
Required Certifications: N/A
Required Skills;
- Proficient in Python or PySpark
- Proficient in SQL/ Spark SQL
- Solid understanding of cloud computing environments like (Azure, AWS, GCP)
- Knowledge of Big Data technologies (Spark)
- Solid exposure to Delta Live Tables and Databricks Workflows.
- Solid understanding of Data Lakehouse design
- Solid understanding of code modularization strategies for Jupyter notebook -style coding
- Adept with data structures such as delta, parquet, YAML, XML, JSON, and HTML
- Proficient in the administration of the Databricks platform
- Strong organizational, problem-solving, and analytical skills
- Ability to manage priorities and workflow.
- Proven ability to handle multiple projects and meet deadlines.
- Strong interpersonal skills.
- Ability to deal effectively with a diversity of individuals at all organizational levels.
- Commitment to excellence and high standards.
- Creative, flexible, and innovative team player.
- Ability to work independently and as a member of various teams and committees.
Nice to Have:
- Demonstrated experience working with MLFlow, create feature stores, develop using vector databases, or graph databases.
- Familiarity with SDTM, FHIR, HL7 & SMART.
Working Hours
- Every day: 1:30 PM - 9:00 PM IST
OR
- Monday, Wednesday, Friday: 2:30 PM - 10:30 PM IST
- Tuesday, Thursday: 9:00 AM - 5:00 PM IST
Note: Working hours may vary based on individual seniority, business demand, and ability to work independently. This will be evaluated on a case-by-case basis.