What are the responsibilities and job description for the Lead Data Architect position at Karsun Solutions?
Overview
Summary
Senior/Lead technical data architect to design, build, and operate enterprise dataplatforms that power GenAI and AI/ML use cases. This is a highly technical, handsonrole responsible for data platform architecture, endtoend data engineering, ML/LLMpipeline design, production model onboarding, and delivery of scalable Databricks-centric solutions across cloud environments. Candidate must be AWS CertifiedMachine Learning – Specialty.
Responsibilities
What You'll Be Doing:
- Architect and implement enterprise data platforms (batch streaming)optimized for ML, LLMs, and GenAI workloads.
- Lead design and hands on implementation of Databricks workspaces, UnityCatalog, Delta Lake design patterns, cluster policies, and performance tuning.
- Build and own end to end data pipelines (ingest, transform, feature engineering,serving) using PySpark, Databricks Jobs, Spark SQL, Delta Lake, andorchestration tools.
- Design and operationalize model training, fine tuning (LLM), evaluation,deployment, and monitoring pipelines (MLOps/RAG/CAG) integratingDatabricks MLflow, CI/CD, and infra-as-code.
- Implement vectorless and vectorization/embedding pipelines, vector storeintegrations, and retrieval layers for RAG (FAISS, Pinecone, Weaviate,Milvus).
- Define data schemas, governance, lineage, access controls, and data productAPIs; implement Unity Catalog or equivalent for centralized governance.
- Drive cost/performance optimization for storage, compute (spot/preemptible),and query patterns.
- Collaborate with engineers, data scientists, product owners, and security totranslate business needs into production GenAI solutions.
- Mentor and lead engineering teams; conduct architecture reviews, codereviews, and run technical deep dives.
- Implement observability for data and ML pipelines (metrics, logging, dataquality tests, alerting).
- Create reproducible experiment tracking, model registry, and rollout strategies(canary, shadow testing, rollback).
- Stay current on GenAI/LLM architectures and evaluate/introduce new toolingand frameworks.
Qualifications and Education
Required Qualifications:
- 8 years hands on experience in data engineering/platform architecture; 3 years in an architect or lead role.
- Proven, hands on Databricks experience (designing workspaces, Delta Lake,performance tuning, productionizing Spark jobs).
- Deep Spark PySpark expertise and experience with Databricks Runtime.
- Strong experience building ML/LLM pipelines and operationalizing models(training, fine tuning, serving).
- Practical experience with vector embeddings, semantic search, and RAGarchitectures.
- Solid Python expertise and common ML libraries (PyTorch, TensorFlow,Hugging Face transformers) and MLflow.
- Cloud platform experience (AWS strongly preferred).
- Experience with containerization and orchestration while leveraging opensource libraries for unstructured and structured data processing,serving/inference.
- Strong SQL skills; experience with distributed query/warehouse systems andparquet/AVRO/Delta formats.
- CI/CD and infra-as-code experience (Terraform, GitOps, Jenkins/GitHubActions/GitLab CI).
- Data governance, security, and IAM experience; experience implementingrow/column level access controls and data lineage.
- Demonstrated ability to design for scalability, reliability, and cost efficiency.
- BA or BS degree in CS, Computer Engineering, Information Technology or arelated field.
Compensation
The proposed salary range for this role is $****** to $******* USD. The salary range provided is a good faith estimate representative of all experience levels. Karsun considers several factors when extending an offer, including but not limited to, the role, function and associated responsibilities, a candidate’s work experience, location, education/training, and key skills.