What are the responsibilities and job description for the Sr. Data Scientist position at IMX Data?
Responsibilities:
- Design a diagnostic and procedural index on multi-terabyte healthcare claims data to identify patterns and cost drivers using SQL, R, and healthcare data standards such as ICD, CPT, and HCPCS
- Model and deploy schemas with automated clustering in Snowflake using dbt (Data Build Tool), enforcing fine-grained access control and cost governance through Role-Based Access Control (RBAC) and warehouse monitoring
- Perform time series analysis to uncover seasonality, trends, and anomalies in medical and financial datasets using Jupyter Notebooks with ARIMA, SARIMAX, and Facebook Prophet models
- Engineer an end-to-end retrieval augmented generation (RAG) workflow that extracts semantically relevant document chunks via ChromaDB and Pinecone, using tokenization, vector similarity (cosine similarity) with embeddings, and the OpenAI API, incorporating both hybrid and tree-based search strategies
- Deploy custom large language models to production with scalable inference pipelines built on Docker, FastAPI, and Kubernetes
Requirements:
Master’s degree in Computer Science, Computer Information Systems, or a related field with 1 year of experience.
Skills:
Skills-SQL, R, Snowflake, Jupyter Notebooks, ARIMA, SARIMAX, Facebook Prophet, ChromaDB, Pinecone, Tokenization, Vector similarity (Cosine Similarity) using embeddings, Docker, FastAPI, and Kubernetes