What are the responsibilities and job description for the AI/ML Engineer position at Progressive Technology Federal Systems?
AI/ML Engineer – Local LLM & RAG SystemsPTFS is seeking an experienced AI/ML Engineer with strong expertise in deploying and managing locally hosted Large Language Models (LLMs) and buildingRetrieval-Augmented Generation (RAG) pipelines. The ideal candidate has hands-on experience with frameworks such as Ollama, LangChain, LlamaIndex, or VLLM, and is highly skilled in Python-based orchestration, vector search, and scalable data storage systems such as Vector Databases or Apache Solr. This role will be responsible for designing, optimizing, and maintaining our on-premise or air-gapped GenAI infrastructure, integrating new models, and keeping our architecture modular and future-proof.LLM Deployment & Orchestration Deploy, run, and optimize locally hosted LLMs using frameworks such as Ollama, VLLM, GPT4All, or HuggingFace Transformers.Build and maintain model-serving pipelines with Python, including GPU optimization, quantization, batching, and model switching.Implement flexible architecture allowing rapid integration of new open-source or proprietary models. RAG Pipeline Development Architect end-to-end Retrieval-Augmented Generation (RAG) systems.Design and implement vector embedding, indexing, and retrieval layers, including chunking, metadata management, and routing logic.Integrate RAG flows using LangChain or LlamaIndex, ensuring low latency and high retrieval accuracy.Data Storage and Retrieval Develop and maintain Vector Databases such as:PineconeWeaviateChromaMilvusFAISS Or, architect a schema and search strategy for a Solr-based alternative using traditional indexing/search if vectors are not used.Manage ingestion pipelines, embedding generation, and update workflows for newly added data sources.Application & API Development Build backend services and APIs that interact with LLMs, embedding pipelines, and retrieval layers.Integrate agents, tools, and orchestration flows using:LangChainOpenAI function-calling equivalents in local modelsCustom Python toolchains Deploy services using Docker, Kubernetes, or local orchestrators when needed.System Performance, Optimization & MonitoringOptimize model performance, including:quantization (GGUF, GPTQ, AWQ)tensor parallelizationcaching strategiesMonitor system resources for memory, GPU/CPU utilization, and throughput.Implement automated pipelines to update models, refresh embedding stores, and version datasets.** Collaboration & Architecture** Work with cross-functional teams to align the LLM capabilities with business needs.Provide guidance on GenAI trends, limitations, and best practices.Contribute to documentation and provide internal training when needed.Required Skills & ExperienceTechnical Skills 3–7 years of experience in Machine Learning, MLOps, Backend Engineering, or AI Infrastructure.Expert-level proficiency in Python and relevant libraries (FastAPI, Pydantic, PyTorch, HuggingFace Ecosystem).Hands-on experience with LLM deployment via:OllamaVLLMGPT4AllHuggingFace TransformersLM StudioStrong experience with RAG frameworks:LangChainLlamaIndexProficiency with vector databases (Pinecone, Chroma, Weaviate, FAISS, Milvus).Experience with Solr, Elasticsearch, or OpenSearch (schema design, analyzers, indexing).Experience developing embeddings pipelines, chunking strategies, and metadata retrieval.Familiarity with containerization and orchestration (Docker, Kubernetes optional).Strong experience with model inference optimization: quantization, batching, GPU acceleration.ML/AI Knowledge Understanding of foundational LLM mechanics: transformers, tokenization, context windows, prompt engineering.Experience with model fine-tuning, LoRA adapters, or supervised fine-tuning (a plus).Knowledge of GenAI architectural patterns, agents, routing, tool use, and document indexing strategies.Preferred Qualifications Experience working in air-gapped or on-premise environments.Experience with CI/CD for ML systems.Familiarity with:Nvidia GPU stack (CUDA, cuBLAS, TensorRT)DevOps tools (Terraform, Ansible, Helm charts)Exposure to hybrid search systems combining vector keyword retrieval (BM25 embeddings).Experience integrating LLMs into enterprise systems.EducationBachelor’s degree in Computer Science, Data Science, Engineering, Mathematics, or related field.Master’s or higher preferred, but equivalent experience accepted.Soft SkillsStrong problem-solving ability and comfort working with ambiguous or evolving requirements.Excellent communication and ability to translate technical concepts for non-technical teams.Self-driven with a passion for exploring new GenAI technologies and keeping current with evolving LLM tools.SummaryThis role is ideal for someone who enjoys building practical, production-ready AI systems, particularly local LLMs, and wants to work at the cutting edge of the GenAI landscape—integrating models, designing robust retrieval systems, and ensuring future scalability.
Salary : $77,810 - $132,280