What are the responsibilities and job description for the Sr. Data Scientist - Charlotte, NC - Onsite position at LEO DOES IT INC?
W2 only [No C2C]
Position : Sr. Data Scientist
Location: Charlotte, NC [Hybrid (4 days onsite and 1 day remote)]
Duration: 6 Months Contract
Job Description:
Data Engineering & Data Processing:
- Design and develop scalable ETL/ELT pipelines for ingesting, transforming, and processing structured and unstructured data.
- Build and optimize data pipelines using Databricks, Spark, SQL, and cloud-native AWS services.
- Implement data quality, validation, lineage, and monitoring processes.
- Support medallion/Lakehouse architecture patterns including bronze, silver, and gold data layers.
- Develop data pipelines to support AI/ML, GenAI, and RAG workloads, including document ingestion and embedding generation workflows.
Machine Learning & Modeling:
- Design and implement scalable ML models for classification, regression, clustering, forecasting, and recommendation systems.
- Apply advanced techniques including deep learning, ensemble learning, NLP, Generative AI, and LLM-based solutions where applicable.
- Conduct model evaluation, tuning, validation, and performance optimization using industry best practices.
- Develop and train models within Databricks ML and/or AWS SageMaker leveraging distributed computing and scalable cloud infrastructure.
- Build reusable feature engineering and model training pipelines.
- Develop Retrieval-Augmented Generation (RAG) solutions integrating LLMs with enterprise knowledge sources and vector databases.
Cloud & MLOps:
- Deploy and manage ML and GenAI models using AWS SageMaker and Databricks, including endpoint configuration, monitoring, and retraining workflows.
- Utilize Databricks MLflow for experiment tracking, model registry, and deployment automation.
- Implement and support vector database solutions for semantic search and RAG architecture.
- Collaborate with DevOps and platform teams to implement CI/CD pipelines for ML, GenAI, and data workloads.
- Automate operational workflows and optimize cloud resource utilization, scalability, reliability, and security.
Deliverables:
- Production-ready ML and GenAI solutions with supporting technical documentation.
- Scalable ETL/ELT pipelines and curated datasets.
- End-to-end Databricks notebooks, jobs, and workflows.
- Feature engineering pipelines and reusable ML components.
- RAG pipelines integrated with vector databases and enterprise knowledge sources.
- Weekly status reports and participation in Agile sprint ceremonies.
Skills & Qualifications:
- 8 years of experience in Data Science, Machine Learning, and Data Engineering.
- Strong proficiency in Python, SQL, Spark, and ML libraries such as scikit-learn, TensorFlow, and PyTorch.
- Experience with Generative AI, LLM frameworks, prompt engineering, and RAG architecture.
- Hands-on experience with vector databases and semantic search technologies.
- Hands-on experience with Databricks, MLflow, Delta Lake, and AWS SageMaker.
- Experience designing scalable data pipelines and distributed data processing solutions.
- Strong understanding of data mining, feature engineering, and data modeling techniques.
- Experience with cloud-native AWS data services and orchestration frameworks.
- Excellent communication, collaboration, and leadership skills.