
Overview
About Clear
The journey of simplicity throughout the last decade, urged us to make things clear, so that it's easier done than said.
Clear today is India's leading fintech SaaS platform, serving 3K+ enterprises, 6L+ SMEs, and 5M+ individuals, with our ITR, GST, e-Invoicing products, and more. While the journey has not been easy, it has been transforming.
Founded in 2011, the decade-long journey of ClearTax defines growth. Starting with just 3 tech products related to tax and filing, we now build mobile and web-based SaaS products for invoices, taxes, payments, and credit and augment them with strong advanced analytics and artificial intelligence. We are also a Series C funded startup with a strong team of 1000+ members, and as we continue to evolve into a world of new-financial solutions, we're looking for individuals with perspectives to join our team.
Job brief
We are seeking a talented Data/Software Engineer II with expertise in big data processing and ETL pipeline management. Additionally, a solid background in software engineering including building scalable & performant web systems with clear focus on reusable modules.
Key Responsibilities:
- Design, develop, and maintain ETL pipelines to process large-scale datasets efficiently and reliably.
- Build and optimize Spark-based data pipelines to perform transformations and aggregate data for analytics and machine learning applications.
- Implement AWS Glue jobs to support data ingestion, transformation, and integration across various data sources.
- Leverage Apache Iceberg for efficient data storage, management, and querying, with a focus on performance and scalability.
- Utilize Airflow to orchestrate complex workflows and ensure the timely and efficient execution of data processing jobs.
- Implement Change Data Capture (CDC) processes to capture real-time changes from source systems and integrate them into downstream data systems.
- Build scalable and efficient ETL solutions that maintain high data quality and data governance standards.
- Develop, test, and deploy web services for data access APIs, integrating data pipelines with other applications.
- Ability to translate fuzzy business problems into technical problems & come up with design, estimates, planning, execution & deliver the solution independently.
- Use MySQL, MongoDB, and other database technologies to store and retrieve data as needed for ETL processes and web services.
Required Skills and Qualifications:
—2-4 years of experience as a data/software engineer or in a related role, preferably in a fast-paced and data-intensive environment.
- Strong experience with Spark for batch and real-time data processing, including writing and optimizing Spark jobs.
- Strong problem-solving skills and a proactive approach to tackling complex data challenges.
- Knowledge of Apache Kafka or similar streaming technologies for real-time data processing.
- Understanding of distributed systems and cloud-based architectures.
- Strong expertise in - coding, data structures, algorithms, low-level class & DB design, high-level system design, and architecting for high scale using distributed systems.
- Excellent communication and collaboration skills to work effectively within cross-functional teams.
- Experience with CDC techniques and tools to capture data changes in real-time.
- Experience with SQL and OLAP data stores.