What are the responsibilities and job description for the Senior AI/LLM Engineer position at Spirit Works AI?
About the role
Parrative AI is building AI capabilities for the wealth/fintech space with a strong emphasis on governance, auditability, and responsible model use. We’re hiring a Senior AI/LLM Engineer to lead the development of fine-tuning and deployment workflows for large language models, with deep expertise in LoRA/QLoRA adapter training, evaluation, and production hardening.
This role is hands-on and senior: you’ll own end-to-end delivery from training through deployment, and you’ll set engineering standards for reliability, reproducibility, and measurement.
Responsibilities
- Lead LLM adaptation and fine-tuning using LoRA/QLoRA adapters, including experiment design, training configuration, and performance optimization.
- Develop and maintain reproducible training pipelines: dataset versioning, configuration management, experiment tracking, and artifact provenance.
- Build and operate evaluation frameworks for quality, safety, and consistency, including regression testing and release gates.
- Optimize inference performance (latency, throughput, cost) and improve production stability (timeouts, retries, structured output validation).
- Partner with product, engineering, and risk/compliance stakeholders to translate requirements into measurable model behaviors and acceptance criteria.
- Establish and improve documentation needed for internal governance (model change logs, evaluation summaries, training artifacts, release notes).
- Mentor engineers and contribute to technical direction, code quality standards, and incident readiness.
Required qualifications
- 6 years of professional experience in ML/AI engineering (or equivalent).
- Demonstrated senior-level experience shipping LLM-powered features into production.
- Strong hands-on expertise with LoRA/QLoRA:
- Adapter configuration and training strategy
- Quantization-aware constraints and tradeoffs
- Dataset preparation for instruction tuning and task-specific adaptation
- Proficiency in Python and modern ML tooling (PyTorch, Hugging Face Transformers, PEFT/TRL, Accelerate/DeepSpeed or similar).
- Experience designing evaluations and quality gates (offline tests, regression suites, scoring frameworks).
- Familiarity with containerized development and deployment (Docker, Linux, CI/CD).
Preferred qualifications
- Experience in regulated or high-compliance environments (fintech, healthcare, insurance, gov).
- Experience with private model serving and optimization techniques (quantized inference, batching, caching).
- Exposure to model risk concepts (governance, audit trails, reproducibility, change control).
- Experience with speech-to-text or conversational AI datasets and long-context behavior.
- Strong written communication and comfort producing clear technical documentation.
What success looks like
- Deliver LoRA/QLoRA fine-tuned adapters that improve task performance while maintaining conservative, reliable behavior.
- Implement evaluation and regression gates that prevent quality regressions and enable confident releases.
- Improve production observability: model metrics, latency, token usage, configuration traceability, and failure modes.
- Establish durable engineering practices for reproducible training, safe deployment, and documented change control.
Why Parrative AI
- Ownership over a critical technical domain with meaningful real-world constraints.
- Work on applied LLM systems where quality, governance, and reliability matter as much as capability.
- Small team, high trust, direct impact.