Job Description
Are you ready to engineer the future of intelligence? At Nebula Dynamics, we are not just building for todayβwe are architecting the systems that will define the year 2026 and beyond. We are seeking a visionary Lead AI Infrastructure Engineer to lead our core computing infrastructure and ensure our platforms scale seamlessly into the next decade of AI evolution.
In this role, you will bridge the gap between cutting-edge machine learning research and robust, production-grade infrastructure. You will be responsible for designing the backbone of our AI models, optimizing inference costs, and ensuring our systems are resilient, secure, and ready for the demands of 2026. If you want to work on problems that matter and build the future of technology, we want to hear from you.
In this role, you will bridge the gap between cutting-edge machine learning research and robust, production-grade infrastructure. You will be responsible for designing the backbone of our AI models, optimizing inference costs, and ensuring our systems are resilient, secure, and ready for the demands of 2026. If you want to work on problems that matter and build the future of technology, we want to hear from you.
Responsibilities
- Architect Scalable Systems: Design and implement high-performance distributed systems capable of handling petabyte-scale data and millions of concurrent requests.
- Optimize AI Workloads: Lead initiatives to reduce latency and compute costs for large language models and neural networks, ensuring peak efficiency.
- Cloud & DevOps Integration: Manage complex cloud environments (AWS/Azure) using Kubernetes and Terraform to automate deployment and scaling.
- Future-Proofing Infrastructure: Evaluate emerging hardware (e.g., NPUs, quantum-ready storage) to integrate next-gen technologies into our stack.
- Collaborate with R&D: Work closely with data scientists and researchers to translate model requirements into engineering reality.
- Security & Compliance: Enforce rigorous security protocols to protect proprietary data and ensure compliance with global standards.
Qualifications
- Experience: 7+ years of experience in backend engineering, with at least 3 years specifically in AI/ML infrastructure or systems architecture.
- Technical Stack: Proficiency in Python, C++, and deep knowledge of ML frameworks (PyTorch, TensorFlow, JAX).
- Infrastructure: Extensive experience with Kubernetes, Docker, and cloud platforms (AWS, GCP, or Azure).
- Performance: Strong understanding of distributed computing, message queues, and database optimization.
- Leadership: Proven track record of leading engineering teams and mentoring junior developers.
- Future-Ready Mindset: A passion for staying ahead of the curve with emerging tech trends and a strategic vision for 2026 and beyond.