Overview
Job Description (JD) for the Data Engineer (2–5 Years Experience) role at Algo8.ai, factoring in candidates with enterprise data lake experience and general open-source data stack proficiency—even if they haven't directly worked with SCADA/PLC systems.
Job Title: Data Engineer – Edge & Industrial Data Pipelines (2–5 Years)
Location: Lucknow / On-Site at Client Locations / Hybrid
Company: Algo8.ai – AI for Industrial Excellence
About
Algo8.ai
Algo8.ai is a leader in AI-powered digital transformation for industrial enterprises. Our solutions span manufacturing, energy, and process industries, integrating deep tech like AI, edge computing, and IoT to unlock massive value from industrial data.
Role Overview
We’re seeking a Data Engineer with experience building robust enterprise-scale data lakes, ingestion pipelines, and real-time streaming workflows using open-source tools. While direct experience with OT systems like SCADA or PLCs is a bonus, what matters most is your ability to design and deploy scalable data infrastructure—including for edge/on-prem environments—using modern open-source technologies.
Key Responsibilities
- Design and implement data ingestion pipelines from structured, semi-structured, and streaming data sources into a centralized data lake.
- Work with open-source tools such as Apache NiFi, Kafka, Airflow, Spark, Flink, and Delta Lake to build scalable data platforms.
- Deploy and manage containerized services using Docker, and optionally orchestrate them with Kubernetes in on-prem/edge environments.
- Implement metadata management, data governance, and quality frameworks across the pipeline.
- Build batch and streaming data processing pipelines (e.g., using Spark Structured Streaming).
- Design schemas and manage time-series or analytical databases for downstream AI/ML workflows.
- Collaborate with Algo8’s AI and domain consultants to shape the data layer that powers industrial AI applications.
Required Skills & Qualifications
- 2–5 years of experience in data engineering or related roles.
- Strong experience building and maintaining enterprise-grade data lakes and data pipelines.
- Proficiency in Python, SQL, and distributed processing tools like Apache Spark.
- Hands-on experience with data ingestion frameworks (e.g., Apache NiFi, Kafka Connect).
- Familiarity with containerization (Docker) and deploying solutions in on-prem or edge environments.
- Understanding of data modeling, pipeline orchestration (e.g., Airflow), and system design.
- Experience working in hybrid cloud/on-prem environments preferred.
- Ability to write clean, well-documented, and production-ready code.
Bonus Skills (Nice to Have)
- Familiarity with industrial systems (MES, SCADA, PLC, LIMS, ERP) and protocols (OPC-UA, MQTT).
- Experience with time-series databases (e.g., InfluxDB, TimescaleDB).
- Exposure to manufacturing or process industry datasets.
- Understanding of streaming analytics or event-driven architectures using Flink/Kafka Streams.
What We Offer
- Opportunity to shape the data backbone of cutting-edge Industrial AI systems.
- Work with cross-disciplinary teams combining AI, domain expertise, and edge computing.
- Real-world impact in improving sustainability, productivity, and efficiency in factories.
- Dynamic startup culture with fast growth, autonomy, and visibility.
How to Apply:
Interested candidates can share their resumes at https://forms.gle/RPVb2PFpYwoc3dpa9
Join us and be a part of the AI revolution! 🚀