What are the responsibilities and job description for the Data Scientist - INDIA position at Vytwo Technologies Inc.?
Role: Data Scientist - INDIA
Location: Hyderabad, INDIA
We are seeking a highly skilled Data Scientist (3–7 years of experience) to join our team and work across two major data science domains:
Key Responsibilities
Structured Data – Machine Learning & Analytics
✅ Core Skills
Location: Hyderabad, INDIA
- Consultants local to INDIA are eligible. Category: Data Science – Structured Data / Text Data (NLP & GenAI)
We are seeking a highly skilled Data Scientist (3–7 years of experience) to join our team and work across two major data science domains:
- Structured Data (80–90%) – Predictive analytics, forecasting, cost estimation, likelihood modeling, and batch‑oriented machine learning pipelines.
- Text / Unstructured Data (NLP & GenAI) – Building low‑latency real‑time systems using deep learning, LLMs, prompt engineering, and agentic AI frameworks.
Key Responsibilities
Structured Data – Machine Learning & Analytics
- Build, deploy, and optimize ML models for predictive analytics, forecasting, classification, and regression.
- Perform large-scale feature engineering using PySpark and Big Data tools.
- Work on batch pipelines, model versioning, and experiment tracking.
- Develop cost estimation and risk/likelihood models using statistical and ML techniques.
- Build NLP pipelines using deep learning frameworks such as PyTorch, TensorFlow, or similar.
- Develop real‑time, low‑latency inference systems for text classification, embeddings, semantic search, summarization, and retrieval.
- Create prompts, context graphs, and agentic workflows for LLM-based systems.
- Apply knowledge of prompt engineering, context engineering, and autonomous agent frameworks to production systems.
- Work in Databricks for ETL, feature engineering, ML training, and orchestration.
- Use Azure services for model deployment, data pipelines, and infrastructure.
- Collaborate using Git-based workflows; leverage tools like GitHub Copilot, Claude Code, etc.
- Implement model monitoring, observability, drift detection, and performance tracking.
✅ Core Skills
- Strong hands-on experience with Databricks (Delta Lake, MLflow, Job Orchestration).
- Excellent PySpark skills for large-scale distributed data processing.
- Proficiency in Azure cloud services (ADF, Azure ML, AKS, Databricks on Azure).
- Strong understanding of ML algorithms, statistical methods, and data analysis.
- Experience with deep learning frameworks:
- PyTorch
- TensorFlow
- Transformers (HuggingFace)
- Experience with model monitoring and ML observability.
- Ability to write clean, optimized code and leverage AI code assistants.
- Prompt engineering (task prompts, chain of thought, tool calling, retrieval prompts).
- Context engineering (retrieval pipelines, RAG, memory management, context structuring).
- Knowledge of LLM-based agentic frameworks (LangChain, Semantic Kernel, CrewAI, AutoGen, etc.).
- Experience with vector databases and embedding models is a plus.
- Experience with containerization (Docker, Kubernetes, AKS).
- Experience deploying models to production (REST APIs, real-time endpoints).
- Knowledge of streaming technologies (Kafka, EventHub, Spark Streaming).
- Understanding of CI/CD for ML (Azure DevOps / GitHub Actions).
- A problem solver who is comfortable working with both structured and unstructured data.
- Someone who enjoys using modern AI tools to accelerate development.
- A data scientist who writes clean, production-grade code.
- A collaborator who thrives in cross-functional teams and fast-paced environments.