Overview
Technical Skills
Programming Languages
- Proficiency in Python, with a focus on data processing libraries (e.g., Pandas).
- Familiarity with SQL for querying and managing relational databases.
Data Processing and Big Data
- Experience with Pandas, including optimization techniques such as:
- Chunk size to handle large datasets in memory-efficient batches.
- Basic understanding of PySpark for distributed data processing, including:
- Working with DataFrames for scalable data manipulations.
- Using Spark SQL for querying large datasets.
- Optimizing jobs using partitioning and caching.
Data Pipelines
- Basic understanding of ETL/ELT concepts.
- Exposure to pipeline tools like Apache Airflow or Luigi.
Databases and Optimization
- Familiarity with relational databases (e.g., MySQL, PostgreSQL).
- Techniques to improve database throughput, such as:
- Batch inserts for efficient data loading.
- Query optimization using indexing and partitioning.
- Managing database connections effectively to minimize latency.
- Basic understanding of NoSQL databases (e.g., MongoDB).
Cloud Fundamentals
- Basic knowledge of cloud platforms like AWS, Azure, or Google Cloud (e.g., S3, Lambda, BigQuery).
- Familiarity with distributed storage concepts.
Version Control and CI/CD
- Knowledge of Git/GitHub for collaboration and version control.
Data Modeling
- Basic understanding of relational and dimensional data modeling.
Analytical and Problem-Solving Skills
- Strong analytical skills to optimize data pipelines and troubleshoot performance bottlenecks.
- Basic understanding of distributed computing principles to scale data workflows.
Soft Skills
- Eagerness to learn and adapt to new tools and technologies.
- Clear communication and collaboration skills for working with cross-functional teams.
- Strong attention to detail for ensuring data quality and accuracy.
Education and Experience
- Bachelor’s degree in Computer Science, Data Science, or a related field.
- Hands-on experience with academic or personal projects involving large datasets.
- Familiarity with PySpark optimizations like: ?
- Using repartition() and coalesce() for better resource utilization.
- Understanding lazy evaluation and its role in Spark job execution.
- Awareness of visualization tools like Tableau or Power BI.
- Exposure to basic cloud certifications (e.g., AWS Certified Cloud Practitioner).
Job Type: Full-time
Pay: ?30,000.00 - ?40,000.00 per month
Schedule:
- Day shift
Education:
- Bachelor's (Required)
Experience:
- total work: 3 years (Required)
Location:
- Gurugram, Haryana (Required)
Work Location: In person