What are the responsibilities and job description for the AI/LLM Engineer position at Tential Solutions?
Senior SDET – AI / LLM Quality Engineering (Shared Services)
About The Team
This role sits within the QA Center of Excellence, as part of a small, highly specialized AI Quality Engineering team consisting of two SDETs and one Data Engineer.
The team operates as a shared service across the organization, defining how Large Language Model (LLM)–powered systems are tested, evaluated, observed, and trusted before and after production release.
Rather than building customer-facing AI features, this team builds LLM-based testing and evaluation frameworks and partners with product, platform, and data teams to ensure generative AI solutions meet quality, reliability, and compliance standards.
Role Overview
We are seeking a Senior Software Development Engineer in Test (SDET) with a strong automation and systems-testing background to focus on LLM quality, validation, and evaluation.
In This Role, You Will
Key Responsibilities
LLM Testing & Evaluation
Core SDET Experience
About The Team
This role sits within the QA Center of Excellence, as part of a small, highly specialized AI Quality Engineering team consisting of two SDETs and one Data Engineer.
The team operates as a shared service across the organization, defining how Large Language Model (LLM)–powered systems are tested, evaluated, observed, and trusted before and after production release.
Rather than building customer-facing AI features, this team builds LLM-based testing and evaluation frameworks and partners with product, platform, and data teams to ensure generative AI solutions meet quality, reliability, and compliance standards.
Role Overview
We are seeking a Senior Software Development Engineer in Test (SDET) with a strong automation and systems-testing background to focus on LLM quality, validation, and evaluation.
In This Role, You Will
- Test LLM-powered applications used across the enterprise
- Build LLM-driven testing and evaluation workflows
- Define organization-wide standards for GenAI quality and reliability
Key Responsibilities
LLM Testing & Evaluation
- Design and implement test strategies for LLM-powered systems, including:
- Prompt and response validation
- Regression testing across model, prompt, and data changes
- Evaluation of accuracy, consistency, hallucinations, and safety
- Build and maintain LLM-based evaluation frameworks using tools such as DeepEval, MLflow, Langflow, and LangChain
- Develop synthetic and real-world test datasets in partnership with the Data Engineer
- Define quality thresholds, scoring mechanisms, and pass/fail criteria for GenAI systems
- Build and maintain automated test frameworks for:
- LLM APIs and services
- Agentic and RAG workflows
- Data and inference pipelines
- Integrate testing and evaluation into CI/CD pipelines, enforcing quality gates before production release
- Partner with engineering teams to improve testability and reliability of AI systems
- Perform root-cause analysis of failures related to model behavior, data quality, or orchestration logic
- Instrument LLM applications with Datadog LLM Observability to monitor:
- Latency, token usage, errors, and cost
- Quality regressions and performance anomalies
- Build dashboards and alerts focused on LLM quality, reliability, and drift
- Use production telemetry to continuously refine test coverage and evaluation strategies
- Act as a consultative partner to product, platform, and data teams adopting LLM technologies
- Provide guidance on:
- Test strategies for generative AI
- Prompt and workflow validation
- Release readiness and risk assessment
- Contribute to organization-wide standards and best practices for explaining, testing, and monitoring AI systems
- Participate in design and architecture reviews from a quality-first perspective
- Advocate for automation-first testing, infrastructure as code, and continuous monitoring
- Drive adoption of Agile, DevOps, and CI/CD best practices within the AI quality space
- Conduct code reviews and promote secure, maintainable test frameworks
- Continuously improve internal tooling and frameworks used by the QA Center of Excellence
Core SDET Experience
- 5 years of experience in SDET, test automation, or quality engineering roles
- Strong Python development skills
- Experience testing backend systems, APIs, or distributed platforms
- Proven experience building and maintaining automation frameworks
- Comfort working with ambiguous, non-deterministic systems
- Hands-on experience testing or validating ML- or LLM-based systems
- Familiarity with LLM orchestration and evaluation tools such as:
- Langflow, LangChain
- DeepEval, MLflow
- Understanding of challenges unique to testing generative AI systems
- Experience with Datadog (especially LLM Observability)
- Exposure to Hugging Face, PyTorch, or TensorFlow (usage-level)
- Experience testing RAG pipelines, VectorDBs, or data-driven platforms
- Background working in platform, shared services, or Center of Excellence teams
- Experience collaborating closely with data engineering or ML platform teams
- ? Not a pure ML research or model training role
- ? Not a feature-focused backend engineering role
- ? Not manual QA
- You will define how AI quality is measured across the organization
- You will build LLM-powered testing systems, not just test scripts
- You will influence multiple teams and products, not just one codebase
- You will work at the intersection of AI, automation, and reliability