Information Technology
Full-Time
Splunk
Overview
Overview
As a Principal Software Engineer in the Artificial Intelligence group, you will play a crucial role in building and optimizing the core software infrastructure that powers AI-driven solutions. You will focus on architecting and deploying highly scalable, production-ready backend systems that support AI assistants, intelligent agents, and foundational AI services. Collaborating with machine learning engineers and cross-functional teams, you will drive best practices in software engineering, DevOps, Kubernetes-based deployments, and backend service development. Your expertise will be instrumental in accelerating AI innovation by ensuring robust, reliable, and efficient system operations.
Responsibilities
Note
As a Principal Software Engineer in the Artificial Intelligence group, you will play a crucial role in building and optimizing the core software infrastructure that powers AI-driven solutions. You will focus on architecting and deploying highly scalable, production-ready backend systems that support AI assistants, intelligent agents, and foundational AI services. Collaborating with machine learning engineers and cross-functional teams, you will drive best practices in software engineering, DevOps, Kubernetes-based deployments, and backend service development. Your expertise will be instrumental in accelerating AI innovation by ensuring robust, reliable, and efficient system operations.
Responsibilities
- Design and implement high-performance backend architectures that seamlessly integrate with AI-powered products. Focus on building modular, fault-tolerant, and efficient services that support large-scale AI workloads while ensuring low-latency interactions between data pipelines, inference engines, and enterprise applications.
- Develop robust model-serving APIs and containerized microservices that enable real-time AI inference and batch processing with high throughput and low latency.
- Implement end-to-end monitoring, logging, and alerting solutions to ensure AI systems operate reliably at scale.
- Improve scalability by designing distributed systems that efficiently handle AI workloads and inference pipelines.
- Own Kubernetes-based deployments by developing and maintaining Helm charts, Kubernetes operators, and cloud-native workflows to streamline AI model deployment.
- Automate infrastructure management using Infrastructure as Code (IaC) tools like Terraform or CloudFormation.
- Optimize CI/CD pipelines for AI applications, ensuring smooth model retraining, testing, and deployment cycles.
- Improve security and compliance by implementing best practices in access control, container security, and vulnerability management.
- Partner closely with AI/ML teams to ensure seamless model integration into production environments.
- Lead architecture discussions and provide strategic technical guidance on AI platform evolution.
- Mentor and guide engineers to enhance team skills in backend development, DevOps, and cloud technologies.
- Strong backend development experience in Python (preferred) or Java, with expertise in building RESTful APIs, microservices, and event-driven architectures.
- Deep understanding of Kubernetes and container orchestration, with experience in deploying AI/ML workloads at scale.
- Expertise in DevOps and CI/CD pipelines, including experience with Jenkins, GitHub Actions, ArgoCD, or similar tools.
- Cloud expertise (AWS/GCP/Azure), including hands-on experience with cloud-native services for AI workloads (e.g., S3, Lambda, EKS/GKE/AKS, DynamoDB, RDS etc.).
- Experience in performance tuning and system optimization for large-scale AI/ML workloads.
- Proven ability to collaborate with ML engineers, data scientists, data engineers and product teams to deliver AI-powered solutions efficiently.
- Experience in technical leadership, driving architectural decisions, and mentoring engineers.
- Strong problem-solving skills, with the ability to balance trade-offs between scalability, maintainability, and performance.
- Prior experience working with AI/ML pipelines, model serving frameworks, or distributed AI workloads.
- Experience in AI observability, monitoring model drift, and optimizing inference latency.
- Understanding of cybersecurity, observability, or related domains to enhance AI-driven decision-making.
Note
Similar Jobs
View All
Talk to us
Feel free to call, email, or hit us up on our social media accounts.
Email
info@antaltechjobs.in