What are the responsibilities and job description for the Principal Engineer – AI/ML Infrastructure & Generative Systems position at Stealth Startup?
We are a team out of MIT incubated by UM6P Foundry reinventing how organizations capture and leverage their institutional knowledge. Our platform transforms fragmented information into a trusted resource that powers faster decisions and long-term innovation. We are now hiring an experienced engineer to lead the build-out of our AI/ML infrastructure and generative systems. This is a hands-on role at the cutting edge of LLM deployment, GPU optimization, and retrieval-augmented generation (RAG). You’ll own core components of the platform and collaborate directly with the founding team to shape the technical roadmap.
In this role you will
- Design, build, and deploy retrieval-augmented generation (RAG) pipelines using LLMs and vector databases.
- Develop secure backend APIs for data ingestion, indexing, and semantic search across enterprise systems (e.g., SharePoint, Teams, SQL).
- Manage GPU-based inference environments optimized for scalability, latency, and cost.
- Implement MLOps best practices for training, fine-tuning, evaluation, and deployment of generative AI models.
- Collaborate with founders on architecture and build-vs-buy decisions to accelerate roadmap.
- Own the full lifecycle from prototype → MVP → production, ensuring security, compliance, and enterprise readiness.
- Support prototyping of lightweight front-end interfaces to showcase platform capabilities.
**This is role is on site and based in Cambridge, MA**
Required Qualifications
- 5 years of experience in ML infrastructure, backend engineering, or AI platform development.
- Experience deploying LLMs and generative AI models in production, with fluency across multiple frameworks such as PyTorch, TensorFlow, Hugging Face, and Azure OpenAI.
- Hands-on expertise in LLM post-training, alignment, fine-tuning, and deployment.
- Strong backend development skills in Python (FastAPI, Flask, or Django) and REST/GraphQL APIs.
- Hands-on experience with GPU inference and performance tuning.
- Familiarity with vector databases (Pinecone, Weaviate, Milvus, or FAISS) and semantic search.
- Comfort working in an early-stage startup environment and delivering under ambiguity.
Preferred Qualifications
- Master’s or PhD in Computer Science, ML, or related field.
- Experience fine-tuning and aligning LLMs (RLHF, LoRA, adapters, prompt tuning).
- Experience with knowledge graphs, enterprise knowledge management, or large-scale search systems.
- Familiarity with LLM orchestration frameworks (LangChain, LlamaIndex) or MCP protocol.
- Prior experience as a founding/early engineer at a startup.
Why Join Us?
- Be part of an innovative startup at the intersection of AI and enterprise solutions.
- Work in a collaborative, fast-paced, rewarding and dynamic environment.
- Directly shape the future of the company and its products.
- Competitive salary, bonus, benefits package, and strong opportunities for leadership growth.
- Continuous learning and career growth opportunities.