Job Description
We are seeking a visionary Senior AI Infrastructure Engineer to architect the backbone of our next-generation artificial intelligence ecosystem. As we prepare for the technological leap of 2026, you will be responsible for building scalable, resilient, and high-performance computing environments that power cutting-edge machine learning models. If you are passionate about optimizing complex systems and driving the future of AI, we want to hear from you.
Why Join Us?
- Work on pioneering projects that define the AI landscape.
- Competitive compensation and equity package.
- Flexible remote-first culture with state-of-the-art equipment.
Responsibilities
- Design and deploy highly scalable GPU clusters and distributed computing systems tailored for training large language models (LLMs).
- Optimize deep learning pipelines to maximize hardware utilization and reduce training time.
- Implement robust CI/CD pipelines for AI model deployment and monitoring.
- Collaborate with ML researchers to translate theoretical models into efficient, production-grade infrastructure.
- Ensure data integrity, security, and compliance within cloud environments (AWS/Azure/GCP).
- Conduct capacity planning and forecasting for future AI workloads.
Qualifications
- 5+ years of experience in DevOps, Site Reliability Engineering (SRE), or Infrastructure Engineering.
- Strong proficiency in Python, Docker, Kubernetes, and container orchestration.
- Deep understanding of distributed systems, networking, and cloud architecture.
- Experience with GPU virtualization and managing high-performance computing (HPC) clusters.
- Excellent problem-solving skills and ability to work in a fast-paced, agile environment.
- BS or MS in Computer Science, Engineering, or a related technical field.