
Overview
We are looking for a highly experienced Senior PySpark Developer to join our team. The ideal candidate will lead the development of big data solutions, leveraging Apache Spark with Python (PySpark) to process and analyze large-scale datasets. This role requires strong technical expertise, hands-on experience with distributed systems, and the ability to mentor junior developers.
Key Responsibilities
Big Data Development
Design, develop, and optimize big data solutions using PySpark.
Implement data pipelines and ETL processes to ingest, process, and transform large datasets.
Must have excellent knowledge in Apache Spark and Python programming experience.
Work with data engineers and architects to design scalable data processing workflows.
Performance Optimization
Optimize PySpark jobs for performance and scalability in distributed environments.
Troubleshoot and resolve issues related to performance, memory usage, and job execution.
Should have experience in integrating PySpark with downstream and upstream applications through a batch/real-time interface
Should have experience in fine-tuning process and troubleshooting performance issues
Data Integration and Analytics
Integrate structured and unstructured data from various sources into data lakes or warehouses.
Deep experience in developing data processing tasks using PySpark such as reading data from external sources, merging data, performing data enrichment and loading into target data destinations.
Collaborate with data scientists and analysts to implement machine learning pipelines and data analytics solutions.
Should be able to understand and interpret existing code on an ETL tool and convert it to equivalent PySpark code with improved performance.
Collaboration and Leadership
Collaborate with cross-functional teams to gather and translate business requirements into technical solutions.
Discuss with Tech BA and understand the business needs of the project.
Mentor and guide junior developers, providing code reviews and technical training.
Work closely with stakeholders to ensure solutions align with business goals.
Documentation and Standards
Maintain comprehensive documentation of data workflows, code, and processes.
Should have demonstrated expertise in the development of design documents like HLD, LLD etc.
Adhere to industry best practices and coding standards.
Requirements:
Bachelors degree in Computer Science, Data Engineering, or a related field (Masters preferred).
5+ years of experience in big data development, with a strong focus on PySpark and Apache Spark.
Proficiency in Python, with a deep understanding of Spark Core, Spark SQL, and Spark Streaming.
Experience with big data platforms such as Hadoop, Hive, and HDFS.
Hands-on experience with cloud platforms (e.g., AWS EMR, Azure Databricks, Google BigQuery).
Strong knowledge of SQL and data modeling concepts.
Familiarity with CI/CD pipelines and version control tools like Git.
Excellent problem-solving and analytical skills.
Strong communication and leadership abilities.
Preferred Qualifications
Experience with orchestration tools like Apache Airflow or AWS Step Functions.
Knowledge of data serialization formats (e.g., Parquet, Avro).
Familiarity with containerization tools such as Docker and Kubernetes.
Certifications in cloud technologies or big data frameworks.
About Virtusa
Teamwork, quality of life, professional and personal development: values that Virtusa is proud to embody. When you join us, you join a team of 27,000 people globally that cares about your growth — one that seeks to provide you with exciting projects, opportunities and work with state of the art technologies throughout your career with us.
Great minds, great potential: it all comes together at Virtusa. We value collaboration and the team environment of our company, and seek to provide great minds with a dynamic place to nurture new ideas and foster excellence.
Virtusa was founded on principles of equal opportunity for all, and so does not discriminate on the basis of race, religion, color, sex, gender identity, sexual orientation, age, non-disqualifying physical or mental disability, national origin, veteran status or any other basis covered by appropriate law. All employment is decided on the basis of qualifications, merit, and business need.