What are the responsibilities and job description for the LLM Ops Engineer position at HirexHire?
ABOUT USHirexHire (pronounced hire by hire) is a Chicago-based recruiting and talent consultancy that integrates with companies short-term to provide long-term talent solutions. We take a seat in our client’s everyday operations to understand their people goals, gaps, and challenges. We then develop and implement the processes and technologies to execute a sustainable and scalable talent plan.We partner with companies expecting or experiencing high growth which need to hire at scale or fill a critical role rapidly. Our clients are not looking for quick-fix placements but are thoughtfully building a hiring strategy to scale their businesses.OUR CLIENT Headquarters: Chicago, IL Industry: Legal/AI SaaS Company Size: 1000 What They Do: Our client is a PE-backed SaaS company at the forefront of legal technology, specializing in productivity and risk management software for small to large law firms and their employees globally. They offer a connected ecosystem of solutions to drive innovation across many aspects of a firm.THE ROLEWe are seeking an experienced LLM Engineer to join our client's newly established LLM Ops Team in their Denver office. In this role, you will be responsible for managing the complex lifecycle of Large Language Models from development to deployment, monitoring, and continuous improvement.This role is hybrid to the Denver, CO area.WHAT YOU WILL DOModel Development & TrainingFine-tune pre-trained models for specific use casesCurate and prepare datasets for trainingManage training infrastructure, resources, and computational environmentsImplement optimization techniques to improve model performanceDeployment & ServingDevelop and manage APIs for model servingScale infrastructure to handle varying demand loadsBuild and maintain the GenAI middleware/sidecar layerIntegrate LLMs with existing systems and data sourcesMonitoring & EvaluationTrack performance metrics including latency and throughputMonitor quality metrics such as hallucination rates and accuracyOptimize costs associated with model inference and trainingCreate and maintain dashboards for real-time performance insightsTesting & Quality AssuranceCreate and maintain golden datasets for benchmark testingImplement statistical validation methods for model outputsSet up similarity matching criteria for response evaluationDevelop confidence score thresholds for production systemsFeedback Loops & IterationDesign and implement user feedback collection systemsEstablish continuous improvement processesCreate A/B testing frameworks for model and feature evaluationConduct trace analysis to identify areas for performance optimizationSafety & ComplianceImplement content moderation systemsDetect and mitigate bias in model outputsEnsure regulatory compliance in AI systemsDevelop output validation frameworksPrompt ManagementVersion and store prompts systematicallyCreate and maintain prompt templatesSet up playground environments for prompt testingAbstract prompts from application code for better maintainabilityWHAT YOU WILL NEEDExperience with LLM development, fine-tuning, and deploymentStrong programming skills, particularly in PythonExperience with Kubeflow, Apache Airflow, MLFlow, or other LLM Pipeline technologyExperience with Azure OpenAI, AWS Sagemaker, and/or Vertex AIUnderstanding of machine learning operations and MLOps principlesKnowledge of infrastructure scaling and optimizationExperience with AI monitoring tools and dashboard creationFamiliarity with AI safety, bias detection, and compliance requirementsStrong problem-solving abilities and analytical thinkingFamiliarity with ISO 27001 and SOC2 CertificationWHAT OUR CLIENT OFFERS YOUSupportive Company CultureGlobal, Dynamic, and Diverse TeamComprehensive Benefits Package (health insurance, retirement savings, generous PTO, and work-life balance)Career Growth and Development