Bangalore, Karnataka, India
Information Technology
Full-Time
FiftyFive Technologies
Overview
Role
We are looking for a Data Engineer with a minimum of 1 year of experience working with LLMs and a strong background in data pipelines, storage optimization, and AI-driven data processing. The ideal candidate will play a key role in managing, optimizing, and scaling data architectures to support LLM applications and AI :
We are looking for a Data Engineer with a minimum of 1 year of experience working with LLMs and a strong background in data pipelines, storage optimization, and AI-driven data processing. The ideal candidate will play a key role in managing, optimizing, and scaling data architectures to support LLM applications and AI :
- Design and develop scalable, high-performance data pipelines for AI and LLM-powered applications.
- Work with structured and unstructured data, ensuring efficient preprocessing, transformation, and storage.
- Implement and optimize data retrieval and indexing strategies for LLM fine-tuning and inference.
- Manage vector databases (FAISS, Chroma, Pinecone, Weaviate, Astra DB. ) for retrieval-augmented generation (RAG) workflows.
- Build and maintain ETL/ELT workflows using tools like Airflow, Prefect, or Dagster.
- Ensure data quality, governance, and lineage to support AI-driven insights.
- Collaborate with ML engineers and researchers to improve LLM data pipelines and infrastructure.
- Work with cloud-based data storage solutions (AWS S3 GCP BigQuery, Azure Data Lake, etc. ).
- Automate data monitoring, validation, and debugging to ensure seamless pipeline :
- 4+ years of overall experience in data engineering, AI infrastructure, or related fields.
- At least 1 year of hands-on experience working with LLMs and AI data workflows
- Strong expertise in Python, SQL, and distributed data processing frameworks (Spark, Dask, Ray, or similar).
- Experience with vector databases and retrieval systems for AI-driven applications.
- Knowledge of data modeling, indexing, and storage optimization
- Familiarity with ETL/ELT frameworks like Apache Airflow, Prefect, or Dagster.
- Experience handling large-scale datasets and optimizing data ingestion pipelines.
- Understanding of cloud-based data architectures (AWS, GCP, Azure).
- Basic knowledge of MLOps principles and integrating data workflows with AI models.
- Exposure to LLM fine-tuning, embeddings, and retrieval-augmented generation (RAG).
- Experience with LangChain, Hugging Face Transformers, or OpenAI APIs.
- Familiarity with feature stores (Feast, Tecton) and streaming platforms (Kafka).
Similar Jobs
View All
Talk to us
Feel free to call, email, or hit us up on our social media accounts.
Email
info@antaltechjobs.in