214318 - 1051540 INR - Annual
Pune, India
Information Technology
Full-Time
Texila Educare Healthcare and Technology Enterprise Pvt Ltd
Overview
Experience: 2Years
Key Responsibilities:
- Develop and Maintain Web Scraping Scripts: Build efficient, scalable, and robust web scraping tools using Python and relevant libraries (e.g., BeautifulSoup, Scrapy, Selenium).
- Data Extraction: Extract structured and unstructured data from websites and APIs, focusing on gathering high-quality and clean datasets.
- Data Processing and Storage: Process, clean, and store extracted data in databases (SQL/NoSQL) or data warehouses, ensuring it's ready for analysis and reporting.
- Website Parsing and HTML Manipulation: Parse complex HTML structures and interact with websites that require JavaScript rendering.
- Error Handling and Logging: Develop error handling and logging mechanisms to ensure scripts run reliably and provide useful diagnostics when failures occur.
- Automation and Scheduling: Automate scraping jobs to run on a regular basis using task schedulers (e.g., cron jobs) and ensure minimal downtime.
- Ensure Compliance: Implement scraping systems that comply with website Terms of Service and applicable laws (e.g., GDPR, Copyright Laws, and Robots.txt).
- Optimize Performance: Optimize scraping performance for speed and reliability. Handle rate limits, CAPTCHAs, and IP blocking mechanisms to ensure smooth operations.
- Documentation and Reporting: Maintain clear documentation of scraping processes, data flows, and any issues encountered. Provide status updates and reports to stakeholders.
- Collaboration: Work closely with data analysts, product teams, and engineers to ensure data quality and availability for decision-making processes.
Required Skills and Qualifications:
- Proficiency in Python: Strong experience with Python, especially in libraries like BeautifulSoup, Scrapy, Requests, Selenium, and Pandas.
- Web Scraping Frameworks: Experience with scraping tools such as Scrapy, Selenium, or Puppeteer.
- HTML, CSS, JavaScript: Deep understanding of web technologies, including HTML, CSS, and JavaScript to navigate websites and handle dynamic content.
- Data Manipulation and Storage: Experience with SQL and NoSQL databases (e.g., MySQL, PostgreSQL, MongoDB) and data processing libraries (e.g., Pandas).
- APIs: Experience working with RESTful APIs to extract or push data.
- Data Formats: Knowledge of data formats like JSON, XML, CSV, and how to parse/handle them.
- Error Handling and Debugging: Strong skills in troubleshooting, debugging, and optimizing web scraping operations.
- Networking and HTTP Protocols: Familiarity with HTTP requests, headers, cookies, and web scraping proxies (e.g., rotating proxies, IP management, VPNs).
- Version Control: Experience using version control systems like Git.
- Problem Solving and Critical Thinking: Ability to handle complex scraping challenges like dynamic content, CAPTCHA, JavaScript rendering, etc.
Preferred Qualifications:
- Experience with Cloud Technologies: Familiarity with cloud platforms such as AWS, Google Cloud, or Azure for scalable scraping and storage solutions.
- Distributed Systems: Experience with managing distributed web scraping jobs using tools like Celery, RabbitMQ, or Kubernetes.
- Data Quality and Validation: Experience in data validation, cleaning, and transforming data for downstream processes.
- Knowledge of Machine Learning: Familiarity with applying machine learning techniques to parse and extract data from semi-structured or unstructured sources.
Job Type: Full-time
Pay: ?214,318.07 - ?1,051,539.21 per year
Schedule:
- Day shift
Experience:
- total work: 2 years (Preferred)
Work Location: In person
Similar Jobs
View All
Talk to us
Feel free to call, email, or hit us up on our social media accounts.
Email
info@antaltechjobs.in