What are the responsibilities and job description for the LLM Application Development Engineer position at MASH Pro Tech?
AI Application Engineer - LLM Application Development
Santa Clara - 5 days Onsite
NVIDIA-Specific Stack
- NVIDIA NIM — deployment, inference, API integration, model lifecycle · Advanced · Must-have
- NeMo framework — model configuration, inference optimization · Proficient · Must-have
- NeMoGuardrails — rails configuration, content safety models, jailbreak detection, topical control · Advanced · Must-have
- NVIDIA Riva — ASR / TTS integration in application layer · Proficient · Must-have
- NVIDIA NIM models — llama-3.1-nemoguard-8b, llama-3.2-nv-rerankqa-1b-v2, llama-3.2-nv-embedqa-1b-v2, nemotron-3-super-120b-a12b familiarity · Proficient · Nice to have
Duties:
- Will work on the intelligence layer for multiple programs — owns all model quality, RAG accuracy, prompt engineering, and AI safety across applications
- Socratic tutor persona, adaptive learning recommendation engine, multi-modal AI (text and voice via NVIDIA Riva), RAG evaluation framework, and feedback loop into retrieval
- 6-LLM call chain orchestration (NeMoGuardrails → intent classification → query rewriting → RAG → synthesis), Perplexity web search integration, NIM recommendation engine, and compatibility check logic
- Production-grade AI quality from launch — this is not a research or prototyping role; accuracy thresholds, latency requirements, and safety guardrails must pass InfoSec adversarial testing before Release 1
Required Skills
LLM Application Development
- LLM prompt engineering — system prompts, few-shot examples, chain-of-thought, instruction following · Expert · Must-have
- Multi-step LLM chain orchestration — LangChain, LlamaIndex, or custom orchestration · Expert · Must-have
- Multi-turn conversation design — context window management, conversation summarization, session memory · Advanced · Must-have
- Streaming LLM response handling — token-by-token streaming, partial response rendering · Advanced · Must-have
- Model selection and benchmarking — matching model size to task; balancing latency, cost, and accuracy · Advanced · Must-have
RAG Pipeline Design & Quality
- RAG pipeline design — chunking strategy, embedding model selection, retrieval configuration · Expert · Must-have
- Vector similarity search tuning — index parameters, similarity thresholds, retrieval depth · Advanced · Must-have
- Reranking — cross-encoder rerankers, relevance scoring · Advanced · Must-have
- RAG evaluation frameworks — RAGAS, TruLens, or equivalent; automated eval pipelines · Advanced · Must-have
- Hybrid search — combining dense vector retrieval with BM25 or keyword search · Proficient · Nice to have
AI Safety & Guardrails
- Prompt injection detection and mitigation · Advanced · Must-have
- Jailbreak testing and red-teaming LLM systems · Advanced · Must-have
- Content safety classifier integration · Advanced · Must-have
- Hallucination detection and mitigation strategies · Advanced · Must-have
- Topical control — enforcing scope boundaries on LLM responses · Advanced · Must-have
Evaluation & Production Quality
- Automated evaluation pipeline design — test set curation, metric selection, regression detection · Advanced · Must-have
- A/B evaluation methodology for prompt and model changes · Proficient · Must-have
- Latency profiling for LLM call chains — identifying bottlenecks across multi-step pipelines · Proficient · Must-have
- Feedback loop design — user signal collection, signal-to-retrieval-weight integration · Proficient · Must-have
- Production model monitoring — accuracy drift detection, quality degradation alerting · Proficient · Must-have
Development
- Python — ML/AI application development, async programming · Expert · Must-have
- API design for AI services — streaming endpoints, error handling, timeout management · Advanced · Must-have
- Embedding model operations — model selection, batch embedding, index updates · Advanced · Must-have
Nice to Have
- Adaptive learning systems or personalization engine experience
- Knowledge graph integration with RAG
- Multi-agent orchestration patterns
- ServiceNow API integration
- Prior experience building AI products on NVIDIA infrastructure
Experience
- 11 years of software engineering with at least 2 years focused on LLM application development in production — not research, not demos, not internal tools with 10 users
- Has shipped an LLM-powered feature or product to production where real users depend on the accuracy and the engineer owns the quality metrics
- Has owned an AI safety or guardrails implementation for a customer-facing product — not just added an off-the-shelf filter; designed and tested the safety layer
- Has built RAG evaluation pipelines and used them to make go/no-go release decisions — accuracy gating is part of the workflow.
- Has profiled and optimized a multi-step LLM call chain for latency
- Has worked with NVIDIA NIM or NeMo in a real project (strongly
Salary : $80 - $85