Free cookie consent management tool by TermsFeed Data Engineer for AI | Antal Tech Jobs
Back to Jobs
5 Weeks ago

Data Engineer for AI

decor
Pune, Maharashtra, India
Information Technology
Full-Time
Cloudera

Overview

Business Area:
Professional Services
Seniority Level:
Mid-Senior level
Job Description:
At Cloudera, we empower people to transform complex data into clear and actionable insights. With as much data under management as the hyperscalers, we're the preferred data partner for the top companies in almost every industry. Powered by the relentless innovation of the open source community, Cloudera advances digital transformation for the world’s largest enterprises.
Role:
As a Customer Enablement Engineer specializing in Data Engineering for AI, you will design, develop, and deliver comprehensive curriculum content, including student guides, labs, quizzes, and certifications on data engineering and data preparation skills. This curriculum will enable Cloudera customers to effectively build AI systems on the Cloudera Hybrid platform.
Objective of this Role:
To ensure customers are successfully enabled to prepare data with high quality that meets the requirements to efficiently build their ML/AI including LLMs.
As the Data Engineer for AI you will:
  • Responsible for developing high quality and impactful “data engineering for AI” course
  • Enable instructors to successfully deliver the course in classrooms to our customers
  • Deliver hands-on workshops to customers in person or remote on select course topics
  • Record and publish course content as online modules in digital format
  • Work with internal & external SMEs and Customers to regularly seek inputs for improvement
  • Assist Edu sales leaders to sell Educational products by being a technical resources
  • Own your own self development and stay resourceful all the time. Enrich your own knowledge on various topics in data analytics and AI by being a self-learner .
We’re excited about you if you have:
  • Five (5) or more years of data engineering experience with SQL, Python, Hive, Spark, Flink, Kafka, Nifi and Airflow.
  • Hands-on experience in developing data ingest (batch and realtime) pipelines from various data sources into large analytics platforms, data warehouses, data lakes and lake houses
  • Experience with one or more LMS (learning management systems)
  • Experience or educated in preparing data ( both structured and unstructured ) for ML/AI model development including training and fine tuning of LLMs
  • Experience with data governance, data lineage, and metadata best practices
  • Experienced using data quality & data profiling tools and data catalogs
  • Experience in having published technology education content on digital media platforms like Udemy, LinkedIn, YouTube or own website etc as Curriculum Developer or independent contributor
  • Experience in working in public cloud environments from one of the hyperscalers like AWS, Google Cloud and Microsoft Azure). A cloud certification is preferred
  • Experience working with containers and Kubernetes. A certification in Kubernetes is preferred
  • Experience in (or trained on) the Cloudera platform (CDP, HDP or CDH ) and any underlying Apache projects
  • Experience or training in preparing data for ML/AI model development including LLMs
  • Experience or training on Iceberg, Trino and Vector databases like Pinecone orMilvus
  • Experience using configuration management tools such as Git, Ansible, Puppet or Chef
  • Familiarity with scripting tools such as bash shell scripts, Python and/or Perl
Soft Skills Essential
  • Ability to work closely with the curriculum content development team to define the operational requirements for technical training courses
  • Ability to build efficient, well-architected, easy-to-use hands-on lab environments
  • Ability to work as part of a remote, distributed team
It is a plus if you have:
  • Certification in cloud on at least one hypescaler: AWS, Azure, or GCP
  • Expertise in preprocessing unstructured data for generative AI, including tokenization and embedding generation
  • Proficiency with one or more vector databases (e.g., Pinecone, Milvus) for managing embeddings in semantic search and data retrieval.
  • Skills in handling large-scale datasets for LLMs, including sharding, distributed loading, and parallel data processing.
  • Knowledge of data lineage, versioning, and metadata tracking to ensure compliant, high-quality training data for generative AI.
What you can expect from us:
  • Generous PTO Policy
  • Support work life balance with
    Unplugged Days
  • Flexible WFH Policy
  • Mental & Physical Wellness programs
  • Phone and Internet Reimbursement program
  • Access to Continued Career Development
  • Comprehensive Benefits and Competitive Packages
  • Paid Volunteer Time
  • Employee Resource Groups
Cloudera is an Equal Opportunity / Affirmative Action Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, pregnancy, sexual orientation, gender identity, national origin, age, protected veteran status, or disability status.
#LI-Hybrid
#LI-SN1
Share job
Similar Jobs
View All
1 Day ago
Lead Python Developer
Information Technology
  • Bangalore, Karnataka, India
Job Description : Python Lead DeveloperLocation : Hyderabad, IndiaExperience : 8+ YearsEmployment Type : Full-timeRole OverviewWe are looking for a Python Lead Developer with 8+ years of experience who has successfully led a development team and has...
decor
1 Day ago
Lead ETL Data Engineer
Information Technology
  • Bangalore, Karnataka, India
About UsAbout DATAECONOMY: We are a fast-growing data & analytics company headquartered in Dublin with offices inDublin, OH, Providence, RI, and an advanced technology center in Hyderabad,India. We are clearly differentiated in the data & analytics ...
decor
1 Day ago
Revalsys Technologies - Software Test Engineer - Selenium/Appium
Information Technology
  • Bangalore, Karnataka, India
Roles And Responsibilities Design, Develop and execute automated test scripts for web and mobile applications Develop and execute test plans, test cases and test reports Develop data-driven, modular automation scripts that can be re-used in the f...
decor
1 Day ago
Security Consultant
Information Technology
  • Bangalore, Karnataka, India
Who We AreAt Kyndryl, we design, build, manage and modernize the mission-critical technology systems that the world depends on every day. So why work at Kyndryl? We are always moving forward – always pushing ourselves to go further in our efforts to...
decor
1 Day ago
.Net Full Stack Developer - AngularJS
Information Technology
  • Bangalore, Karnataka, India
Position Name: DotNet Fullstack Developer with AngularExperience: 6-10 YearsLocation: BangaloreWork mode: On-sitePrimary skills: Asp.net, C#, .Netcore, Angular, MVC, Web API, MySQL/SQL Server, AWS/AzureQualification: Any Technical DegreeRoles & Resp...
decor
1 Day ago
ETL Data Engineer/ AWS ETL Data Engineer
Information Technology
  • Bangalore, Karnataka, India
About UsAbout DATAECONOMY: We are a fast-growing data & analytics company headquartered in Dublin with offices inDublin, OH, Providence, RI, and an advanced technology center in Hyderabad,India. We are clearly differentiated in the data & analytics ...
decor
1 Day ago
Sarvika Technologies - Senior Java Developer - Spring/Hibernate
Information Technology
  • Bangalore, Karnataka, India
Job DescriptionAs an Application Developer, you will participate in product development sessions with business owners, business analysts and team members to analyse business requirements and proposed solutions.A highly motivated and enthusiastic to ...
decor
1 Day ago
Lead Java Developer - Spring Boot/Hibernate
Information Technology
  • Bangalore, Karnataka, India
Responsibilities Actively contribute to and participate in design and architecture discussions, daily stand-ups, and Agile Sprint planning sessions. Design and develop high-volume, low-latency applications for critical systems, ensuring high avail...
decor

Talk to us

Feel free to call, email, or hit us up on our social media accounts.
Social media