What are the responsibilities and job description for the AI/LLM Engineer position at Tential Solutions?

Senior SDET – AI / LLM Quality Engineering (Shared Services)

About The Team

This role sits within the QA Center of Excellence, as part of a small, highly specialized AI Quality Engineering team consisting of two SDETs and one Data Engineer.

The team operates as a shared service across the organization, defining how Large Language Model (LLM)–powered systems are tested, evaluated, observed, and trusted before and after production release.

Rather than building customer-facing AI features, this team builds LLM-based testing and evaluation frameworks and partners with product, platform, and data teams to ensure generative AI solutions meet quality, reliability, and compliance standards.

Role Overview

We are seeking a Senior Software Development Engineer in Test (SDET) with a strong automation and systems-testing background to focus on LLM quality, validation, and evaluation.

In This Role, You Will

Test LLM-powered applications used across the enterprise
Build LLM-driven testing and evaluation workflows
Define organization-wide standards for GenAI quality and reliability

This is a hands-on engineering role with significant influence across teams.

Key Responsibilities

LLM Testing & Evaluation

Design and implement test strategies for LLM-powered systems, including:

Prompt and response validation
Regression testing across model, prompt, and data changes
Evaluation of accuracy, consistency, hallucinations, and safety

Build and maintain LLM-based evaluation frameworks using tools such as DeepEval, MLflow, Langflow, and LangChain
Develop synthetic and real-world test datasets in partnership with the Data Engineer
Define quality thresholds, scoring mechanisms, and pass/fail criteria for GenAI systems

Test Automation & Framework Development

Build and maintain automated test frameworks for:

LLM APIs and services
Agentic and RAG workflows
Data and inference pipelines

Integrate testing and evaluation into CI/CD pipelines, enforcing quality gates before production release
Partner with engineering teams to improve testability and reliability of AI systems
Perform root-cause analysis of failures related to model behavior, data quality, or orchestration logic

Observability & Monitoring

Instrument LLM applications with Datadog LLM Observability to monitor:

Latency, token usage, errors, and cost
Quality regressions and performance anomalies

Build dashboards and alerts focused on LLM quality, reliability, and drift
Use production telemetry to continuously refine test coverage and evaluation strategies

Shared Services & Collaboration

Act as a consultative partner to product, platform, and data teams adopting LLM technologies
Provide guidance on:

Test strategies for generative AI
Prompt and workflow validation
Release readiness and risk assessment

Contribute to organization-wide standards and best practices for explaining, testing, and monitoring AI systems
Participate in design and architecture reviews from a quality-first perspective

Engineering Excellence

Advocate for automation-first testing, infrastructure as code, and continuous monitoring
Drive adoption of Agile, DevOps, and CI/CD best practices within the AI quality space
Conduct code reviews and promote secure, maintainable test frameworks
Continuously improve internal tooling and frameworks used by the QA Center of Excellence

Required Skills & Experience

Core SDET Experience

5 years of experience in SDET, test automation, or quality engineering roles
Strong Python development skills
Experience testing backend systems, APIs, or distributed platforms
Proven experience building and maintaining automation frameworks
Comfort working with ambiguous, non-deterministic systems

AI / LLM Experience

Hands-on experience testing or validating ML- or LLM-based systems
Familiarity with LLM orchestration and evaluation tools such as:

Langflow, LangChain
DeepEval, MLflow

Understanding of challenges unique to testing generative AI systems

Nice to Have

Experience with Datadog (especially LLM Observability)
Exposure to Hugging Face, PyTorch, or TensorFlow (usage-level)
Experience testing RAG pipelines, VectorDBs, or data-driven platforms
Background working in platform, shared services, or Center of Excellence teams
Experience collaborating closely with data engineering or ML platform teams

What This Role Is Not

? Not a pure ML research or model training role
? Not a feature-focused backend engineering role
? Not manual QA

Why This Role Is Unique

You will define how AI quality is measured across the organization
You will build LLM-powered testing systems, not just test scripts
You will influence multiple teams and products, not just one codebase
You will work at the intersection of AI, automation, and reliability

#Remote

Apply for this job

Receive alerts for other AI/LLM Engineer job openings

AI/LLM Engineer

What are the responsibilities and job description for the AI/LLM Engineer position at Tential Solutions?

What is the career path for a AI/LLM Engineer?

Job openings at Tential Solutions

Not the job you're looking for? Here are some other AI/LLM Engineer jobs in the Tampa, FL area that may be a better fit.

We don't have any other AI/LLM Engineer jobs in the Tampa, FL area right now.

AI Assistant is available now!