What are the responsibilities and job description for the On-Premises LLM & Vector Database Implementation Consultant position at Jobs via Dice?
Dice is the leading career destination for tech experts at every stage of their careers. Our client, Indotronix International Corp, is seeking the following. Apply via Dice today!
Job Description
We are seeking an experienced consultant to lead the design and deployment of a secure, on-premises Large Language Model (LLM) solution integrated with vector database and Retrieval-Augmented Generation (RAG) capabilities. The ideal candidate brings deep hands-on expertise across the full stack — from model deployment and inference optimization to enterprise security and knowledge transfer.
Core Experience
The consultant must have demonstrated experience deploying open-source LLMs, including models such as Meta Llama 3 and Mistral/Mixtral, within on-premises or private infrastructure environments. Strong Python proficiency is essential, particularly for LLM inference pipelines, prompt engineering, and system integration. The role also requires expertise in CPU-based inference strategies, model quantization techniques, and performance tuning to ensure efficient operation in resource-constrained environments.
Vector Databases & RAG
Candidates must have practical, production-level experience with open-source vector databases such as Qdrant, Chroma, Milvus, or pgvector. A strong track record of designing and implementing end-to-end RAG pipelines is required, along with expertise in embedding generation, management, and metadata filtering to support accurate and efficient semantic retrieval.
Job Description
We are seeking an experienced consultant to lead the design and deployment of a secure, on-premises Large Language Model (LLM) solution integrated with vector database and Retrieval-Augmented Generation (RAG) capabilities. The ideal candidate brings deep hands-on expertise across the full stack — from model deployment and inference optimization to enterprise security and knowledge transfer.
Core Experience
The consultant must have demonstrated experience deploying open-source LLMs, including models such as Meta Llama 3 and Mistral/Mixtral, within on-premises or private infrastructure environments. Strong Python proficiency is essential, particularly for LLM inference pipelines, prompt engineering, and system integration. The role also requires expertise in CPU-based inference strategies, model quantization techniques, and performance tuning to ensure efficient operation in resource-constrained environments.
Vector Databases & RAG
Candidates must have practical, production-level experience with open-source vector databases such as Qdrant, Chroma, Milvus, or pgvector. A strong track record of designing and implementing end-to-end RAG pipelines is required, along with expertise in embedding generation, management, and metadata filtering to support accurate and efficient semantic retrieval.