Free cookie consent management tool by TermsFeed Data Engineer (Web Scraper)- Intern | Antal Tech Jobs
Back to Jobs
5 Weeks ago

Data Engineer (Web Scraper)- Intern

decor
Pune, Maharashtra, India
Information Technology
Other
Clootrack

Overview

About the Role

We're looking for a skilled Web Scraping Data Engineer (Intern) to design and implement robust data extraction systems. In this role, you'll develop scalable crawling architectures to collect high-quality data while ensuring compliance with ethical standards and data regulations.

Key Responsibilities

  • Design and maintain efficient web crawling systems using frameworks like Scrapy, Playwright, or Selenium

  • Implement data processing pipelines to clean, normalize, and structure extracted content
  • Optimize crawling strategies to improve efficiency while respecting website policies
  • Develop monitoring systems to identify and resolve scraping issues quickly
  • Deliver high-quality datasets for analysis and model training
  • Implement storage solutions for large-scale data management
  • Ensure compliance with data regulations and ethical scraping practices

Required Skills

  • Strong Python programming experience.

  • Good to know SQL.
  • Hands-on experience with web scraping tools (BeautifulSoup, Scrapy, Selenium)
  • Proficiency with HTML, JavaScript, and HTTP protocols
  • Experience with data processing libraries (pandas, PySpark)
  • Familiarity with Linux/UNIX environments
  • Knowledge of version control systems and code review practices
  • Strong problem-solving abilities and attention to detail
  • Excellent communication skills (written and verbal English)

Good to have :(Optional)

  • Familiarity with AI frameworks (Hugging Face, LangChain, OpenAI)

  • Familiarity with LLM training pipelines and data requirements
  • Experience with text data augmentation and synthetic data generation


Preferred Qualifications

  • Experience with large-scale distributed crawling systems

  • Knowledge of proxy management and anti-bot evasion techniques
  • Familiarity with any cloud platforms (AWS, GCP, Azure)
  • Experience with containerization (Docker, Kubernetes)


What We Offer

  • Opportunity to work on cutting-edge data collection projects

  • Collaborative environment with talented engineers
  • Competitive compensation package
  • Professional growth and development opportunities

Share job
Similar Jobs
View All
17 Hours ago
MTS II - Software Engineer
Information Technology
  • 4 - 7 Yrs
  • Pune
MAJOR RESPONSIBILITIES • Design, implement, integrate, and verify software applications and tools using JavaScript, NodeJS, and C++. • Enhance, optimize, and improve the efficiency and robustness of current software, with a particular focus on OSS ...
decor
1 Day ago
Test Engineer - Functional Testing
Information Technology
  • Hyderabad, Telangana, India
Job Description Proven experience of 2 years of hands-on experience of functional testing. Strong knowledge of quality best practices and methodologies for software testing Experience with automation tools such as selenium, Cypress.io Katalon Stu...
decor
1 Day ago
UcodeSoft Solutions - iOS Developer - Xcode
Information Technology
  • Hyderabad, Telangana, India
Responsibilities Collaborate with the development team to design and implement new features for our iOS applications using Swift. Write clean, maintainable, and efficient code under the guidance of senior developers. Assist in translating UI/UX d...
decor
1 Day ago
Senior Technical Business Analyst
Information Technology
  • Hyderabad, Telangana, India
About the role:As a Senior Business Analyst, you will: Leads the creation & presentation of estimates for overall cost, skill, effort & timeline for new & existing solutions and projects from a functional perspective. Demonstrates solutions to inter...
decor
1 Day ago
Senior QA Engineer
Information Technology
  • Hyderabad, Telangana, India
Our Mission SPAN is enabling electrification for all ⚡We are a mission-driven company designing, building, and deploying products that electrify the built environment, reduce carbon emissions, and slow the effects of climate change. Decarbonization ...
decor
1 Day ago
Software Test Engineer
Information Technology
  • Bangalore, Karnataka, India
Job DescriptionWe are looking for Software Testing Engineers with the ability to architect and implement modern test automation tools and frameworks to support automated functional testing of mobile and web applications they will also facilitate the...
decor
1 Day ago
Full Stack Developer (JAVA & Angular)
Information Technology
  • Bangalore, Karnataka, India
Fullstack Developer must be proficient in Java (Spring Boot) for backend development and Angular for frontend development. The ideal candidate will be responsible for designing, developing, and maintaining scalable web applications, ensuring seamles...
decor
1 Day ago
QA Engineer
Information Technology
  • Bangalore, Karnataka, India
We're looking for a...QA EngineerApply Now!Position OverviewYou will participate in the testing effort of a leading SaaS product for small and medium sized hotels. He/she will lead the Quality Engineering effort and help to test and critique softwar...
decor

Talk to us

Feel free to call, email, or hit us up on our social media accounts.
Social media