What are the responsibilities and job description for the Machine Learning Engineer position at Scale.jobs?
About The Role
The role involves architecting and deploying production-scale machine learning systems that integrate Large Language Models (LLMs) with proprietary datasets. The focus is on moving beyond experimental notebooks to build robust, scalable infrastructure that supports real-time inference and automated model retraining.
The engineer will collaborate with cross-functional teams to solve challenges in Retrieval-Augmented Generation (RAG), model quantization, and latency optimization. This position is critical for bridging the gap between cutting-edge research and stable, value-driven product features in a fast-paced environment.
Key Responsibilities
The role involves architecting and deploying production-scale machine learning systems that integrate Large Language Models (LLMs) with proprietary datasets. The focus is on moving beyond experimental notebooks to build robust, scalable infrastructure that supports real-time inference and automated model retraining.
The engineer will collaborate with cross-functional teams to solve challenges in Retrieval-Augmented Generation (RAG), model quantization, and latency optimization. This position is critical for bridging the gap between cutting-edge research and stable, value-driven product features in a fast-paced environment.
Key Responsibilities
- Develop and maintain end-to-end ML pipelines for training, evaluating, and deploying fine-tuned LLMs and embedding models.
- Optimize RAG architectures by implementing advanced indexing strategies, query expansion, and re-ranking techniques using vector databases like Weaviate or Pinecone.
- Build and manage MLOps infrastructure for model monitoring, experiment tracking, and automated CI/CD deployment using tools like MLflow, Kubeflow, or BentoML.
- Implement systematic evaluation frameworks to measure model performance, hallucinations, and safety using LLM-as-a-judge and human-in-the-loop workflows.
- Design and scale backend microservices in Python (FastAPI/Flask) to serve model predictions with sub-second latency in a containerized Kubernetes environment.
- Collaborate on data engineering tasks to build high-quality synthetic datasets and cleaning pipelines for domain-specific model alignment (RLHF/DPO).
- 4–7 years of experience in machine learning engineering or software engineering with a heavy focus on data-driven products.
- Proven track record of deploying at least two ML models into a high-traffic production environment.
- Expert-level Python skills and deep familiarity with PyTorch, Hugging Face Transformers, and orchestration libraries like LangChain or LlamaIndex.
- Experience with cloud infrastructure (AWS or GCP) and container orchestration using Docker and Kubernetes.
- Strong understanding of vector similarity search, dimensionality reduction, and high-performance database design.
- B.S., M.S., or Ph.D. in Computer Science, Mathematics, or a related quantitative field.
- Bonus: Experience with parameter-efficient fine-tuning (PEFT), Triton Inference Server, or contributing to open-source AI frameworks.