Demo

Machine Learning Scientist — Agentic data pipelines

Nexus Venture Partners
Boston, MA Full Time
POSTED ON 5/11/2026
AVAILABLE BEFORE 6/7/2026
Location

Boston Office

Employment Type

Full time

Location Type

Hybrid

Department

TechnologyMachine Learning

Compensation

  • Research Scientist I$148K – $186K
  • Offers Equity
  • Research Scientist II$168K – $210K

OverviewApplication

Job Summary

We are seeking a scientist to join our team at Iambic Therapeutics, working on data acquisition and curation for Enchant, our multimodal transformer model trained at scale on a wide variety of biomedical data. In this role, you will design and build agentic systems that acquire, clean, format, and quality-control the large-scale datasets that power Enchant training. You will work at the intersection of LLM-based automation and biomedical data engineering—developing AI agents that can navigate heterogeneous data sources, enforce quality standards, and operate reliably at scale.

This role is ideal for candidates who combine strong software engineering instincts with scientific understanding of biomedical data, and who are excited about using LLMs as tools to solve practical data problems.

Key Responsibilities

  • Design, build, and maintain agentic systems for automated data acquisition from public and proprietary biomedical data sources
  • Develop LLM-based pipelines for data cleaning, normalization, and formatting across diverse data modalities (e.g., molecular, genomic, clinical, literature)
  • Implement automated quality-control workflows that detect anomalies, flag inconsistencies, and enforce data standards
  • Evaluate and iterate on agent architectures, prompting strategies, and tool-use patterns to improve reliability and throughput
  • Collaborate with ML scientists on the Enchant team to understand data requirements and translate them into scalable acquisition and processing systems
  • Monitor and maintain data pipelines in production, diagnosing failures and improving robustness over time
  • Document data provenance, processing decisions, and quality metrics to support reproducibility and auditing

Qualifications

Required:

  • Master's or PhD in a computational STEM field, or equivalent industry experience
  • Strong Python engineering skills, including experience building and maintaining production-quality software
  • Hands-on experience with LLM APIs (e.g., Claude, GPT) and agentic patterns such as tool use, orchestration, and multi-step reasoning
  • Familiarity with biomedical or chemical data sources and formats (e.g., PDB, UniProt, ChEMBL, SDF/MOL, FASTA, or similar)
  • Comfort with data engineering fundamentals: ETL design, data validation, and working with structured and unstructured data at scale

Desired:

  • Experience with agent orchestration frameworks
  • Familiarity with cloud infrastructure and workflow orchestration (e.g., AWS, Docker, Kubernetes)
  • Knowledge of multimodal biomedical data—spanning small molecules, proteins, assays, images, ‘omics, and/or clinical records
  • Experience with large-scale dataset construction or curation for ML model training

Location

Remote (US or UK). On-site available in Bristol, UK and Boston, US.

ABOUT IAMBIC THERAPEUTICS

Iambic is a clinical-stage life-science and technology company developing novel medicines using its AI-driven discovery and development platform. Based in San Diego and founded in 2020, Iambic has assembled a world-class team that unites pioneering AI experts and experienced drug hunters. The Iambic platform has demonstrated delivery of new drug candidates to human clinical trials with unprecedented speed and across multiple target classes and mechanisms of action. Iambic is advancing a pipeline of potential best-in-class and first-in-class clinical assets, both internally and in partnership, to address urgent unmet patient need. Learn more about the Iambic team, platform, pipeline, and partnerships at iambic.ai.

MISSION & CORE VALUES

Our mission is to deliver better medicines through innovations in AI-based discovery technologies. The culture and work at Iambic Therapeutics are profoundly strengthened by the diversity of our people and our differences in background, culture, national origin, religion, sexual orientation, and life experiences. We are committed to building an inclusive environment where a diverse group of talented humans work together to discover therapeutics and create technologies.

PAY AND BENEFITS

We offer industry leading competitive pay, company paid healthcare, flexible spending accounts, voluntary life insurance, 401K matching, and uncapped vacation to our team. We are in a brand-new state-of-the art facility in beautiful San Diego with an onsite gym, dining, and easy access to great places to live and play.

Compensation Range: $148K - $210K

Key Responsibilities

  • Design, build, and maintain agentic systems for automated data acquisition from public and proprietary biomedical data sources
  • Develop LLM-based pipelines for data cleaning, normalization, and formatting across diverse data modalities (e.g., molecular, genomic, clinical, literature)
  • Implement automated quality-control workflows that detect anomalies, flag inconsistencies, and enforce data standards
  • Evaluate and iterate on agent architectures, prompting strategies, and tool-use patterns to improve reliability and throughput
  • Collaborate with ML scientists on the Enchant team to understand data requirements and translate them into scalable acquisition and processing systems
  • Monitor and maintain data pipelines in production, diagnosing failures and improving robustness over time
  • Document data provenance, processing decisions, and quality metrics to support reproducibility and auditing

Required:

  • Master's or PhD in a computational STEM field, or equivalent industry experience
  • Strong Python engineering skills, including experience building and maintaining production-quality software
  • Hands-on experience with LLM APIs (e.g., Claude, GPT) and agentic patterns such as tool use, orchestration, and multi-step reasoning
  • Familiarity with biomedical or chemical data sources and formats (e.g., PDB, UniProt, ChEMBL, SDF/MOL, FASTA, or similar)
  • Comfort with data engineering fundamentals: ETL design, data validation, and working with structured and unstructured data at scale

Desired:

  • Experience with agent orchestration frameworks
  • Familiarity with cloud infrastructure and workflow orchestration (e.g., AWS, Docker, Kubernetes)
  • Knowledge of multimodal biomedical data—spanning small molecules, proteins, assays, images, ‘omics, and/or clinical records
  • Experience with large-scale dataset construction or curation for ML model training

Location

Remote (US or UK). On-site available in Bristol, UK and Boston, US.

Key Responsibilities

  • Design, build, and maintain agentic systems for automated data acquisition from public and proprietary biomedical data sources
  • Develop LLM-based pipelines for data cleaning, normalization, and formatting across diverse data modalities (e.g., molecular, genomic, clinical, literature)
  • Implement automated quality-control workflows that detect anomalies, flag inconsistencies, and enforce data standards
  • Evaluate and iterate on agent architectures, prompting strategies, and tool-use patterns to improve reliability and throughput
  • Collaborate with ML scientists on the Enchant team to understand data requirements and translate them into scalable acquisition and processing systems
  • Monitor and maintain data pipelines in production, diagnosing failures and improving robustness over time
  • Document data provenance, processing decisions, and quality metrics to support reproducibility and auditing

Required:

  • Master's or PhD in a computational STEM field, or equivalent industry experience
  • Strong Python engineering skills, including experience building and maintaining production-quality software
  • Hands-on experience with LLM APIs (e.g., Claude, GPT) and agentic patterns such as tool use, orchestration, and multi-step reasoning
  • Familiarity with biomedical or chemical data sources and formats (e.g., PDB, UniProt, ChEMBL, SDF/MOL, FASTA, or similar)
  • Comfort with data engineering fundamentals: ETL design, data validation, and working with structured and unstructured data at scale

Desired:

  • Experience with agent orchestration frameworks
  • Familiarity with cloud infrastructure and workflow orchestration (e.g., AWS, Docker, Kubernetes)
  • Knowledge of multimodal biomedical data—spanning small molecules, proteins, assays, images, ‘omics, and/or clinical records
  • Experience with large-scale dataset construction or curation for ML model training

Location

Remote (US or UK). On-site available in Bristol, UK and Boston, US.

PAY AND BENEFITS

We offer industry leading competitive pay, company paid healthcare, flexible spending accounts, voluntary life insurance, 401K matching, and uncapped vacation to our team. We are in a brand-new state-of-the art facility in beautiful San Diego with an onsite gym, dining, and easy access to great places to live and play.

Compensation Range: $148K - $210K

Salary : $148,000 - $186,000

If your compensation planning software is too rigid to deploy winning incentive strategies, it’s time to find an adaptable solution. Compensation Planning
Enhance your organization's compensation strategy with salary data sets that HR and team managers can use to pay your staff right. Surveys & Data Sets

What is the career path for a Machine Learning Scientist — Agentic data pipelines?

Sign up to receive alerts about other jobs on the Machine Learning Scientist — Agentic data pipelines career path by checking the boxes next to the positions that interest you.
Income Estimation: 
$101,387 - $124,118
Income Estimation: 
$119,030 - $151,900
Income Estimation: 
$73,798 - $89,311
Income Estimation: 
$90,112 - $113,166
Employees: Get a Salary Increase
View Core, Job Family, and Industry Job Skills and Competency Data for more than 15,000 Job Titles Skills Library

Job openings at Nexus Venture Partners

  • Nexus Venture Partners Dallas, TX
  • Who Are We? Postman is the world’s leading API platform, used by more than 45 million developers and 500,000 organizations, including 98% of the Fortune 50... more
  • 8 Days Ago

  • Nexus Venture Partners Santa Clara, CA
  • About Us Orkes is a platform for developers to build durable, distributed event driven applications. Based on the popular open source orchestration engine ... more
  • 8 Days Ago

  • Nexus Venture Partners San Diego, CA
  • Location San Diego HQ Employment Type Full time Location Type Hybrid Department TechnologyIT Compensation Associate Director $156K – $190K OverviewApplicat... more
  • 1 Day Ago

  • Nexus Venture Partners San Diego, CA
  • Location San Diego HQ Employment Type Full time Department ScienceDMPK Compensation Senior Director$224K – $280K Executive Director$248K – $310K OverviewAp... more
  • 1 Day Ago


Not the job you're looking for? Here are some other Machine Learning Scientist — Agentic data pipelines jobs in the Boston, MA area that may be a better fit.

  • Merck Boston, MA
  • Job Description We are seeking an exceptional Agentic AI and Machine Learning expert for the position of Senior Scientist, Data Science within our Pharmaco... more
  • 2 Days Ago

  • Merck Cambridge, MA
  • Job Description We are seeking an exceptional Agentic AI and Machine Learning expert for the position of Senior Scientist, Data Science within our Pharmaco... more
  • 2 Days Ago

AI Assistant is available now!

Feel free to start your new journey!