Job Description
Are you ready to architect the future of artificial intelligence? Quantum Horizon Labs is seeking a visionary Senior AI Infrastructure Engineer to lead our next-generation deployment strategies. As we pioneer breakthrough technologies for the 2026 era, you will build the robust, scalable backbone that supports our most ambitious machine learning initiatives.
We are looking for a technical leader who thrives in fast-paced environments and is passionate about optimizing the intersection of hardware and software. You will work alongside world-class researchers and engineers to ensure our AI models are not only smart but incredibly efficient and reliable at scale.
Why join us?
- Competitive base salary and equity package.
- Flexible remote-first and hybrid work policies.
- Access to the latest AI hardware and cloud resources.
- Opportunity to define the technical roadmap for 2026 and beyond.
Responsibilities
- Design, implement, and maintain high-throughput, low-latency AI inference pipelines using Kubernetes, Docker, and serverless architectures.
- Optimize large-scale transformer models for edge deployment and cloud scalability, focusing on memory efficiency and compute utilization.
- Collaborate closely with Data Science and ML teams to translate complex model requirements into robust, production-grade engineering solutions.
- Implement advanced monitoring, logging, and observability tools to ensure system reliability and performance across distributed clusters.
- Drive technical strategy for our GPU cluster expansion, resource management, and cost optimization in the cloud.
- Establish and enforce best practices for code quality, security, and CI/CD pipelines within the AI infrastructure ecosystem.
Qualifications
- 7+ years of experience in software engineering with a strong focus on Machine Learning systems and distributed computing.
- Proficiency in Python, C++, and Rust, with deep understanding of data structures and algorithmic efficiency.
- Extensive experience with cloud platforms (AWS, GCP, or Azure) and container orchestration technologies.
- Hands-on experience with MLOps tools (MLflow, Kubeflow, Airflow) and model versioning strategies.
- Bachelor’s degree in Computer Science, Electrical Engineering, or a related technical field (Master’s degree is a plus).
- Strong problem-solving skills with a demonstrated ability to troubleshoot complex infrastructure issues.