What are the responsibilities and job description for the Data Associate, Machine Learning & Drug Discovery position at Jobs via Dice?
Dice is the leading career destination for tech experts at every stage of their careers. Our client, StaffRight Associates, LLC, is seeking the following. Apply via Dice today!
Preface: Computational Biophysics & Data Architecture
This engagement centers on the intersection of Computational Biophysics and Foundational Machine Learning, specifically targeting the high-fidelity data pipelines required for atomic-level drug discovery. StaffRight Associates is seeking an individual with a rigorous STEM academic pedigree—possessing a minimum of a Bachelor’s or Master’s degree in a quantitative science—who demonstrates first-principles mastery of data structures and systemic organization. The complexity of this mission requires a candidate capable of bridging theoretical scientific datasets with practical, high-performance computing applications, ensuring that the raw data fueling proprietary molecular dynamics and ML models is structured for maximum architectural integrity and research discovery.
The Mission
The objective is to orchestrate the foundational data layer for a world-class machine learning ecosystem focused on pharmacological innovation. You will move beyond simple data management to serve as a critical link in the scientific lifecycle, transforming vast, heterogeneous datasets into structured, high-provenance assets. This role is a two-year high-intensity mission designed for early-career innovators who wish to apply engineering rigor to the challenges of molecular simulation and therapeutic design within a high-performance Linux environment.
Core Technical Objectives
Preface: Computational Biophysics & Data Architecture
This engagement centers on the intersection of Computational Biophysics and Foundational Machine Learning, specifically targeting the high-fidelity data pipelines required for atomic-level drug discovery. StaffRight Associates is seeking an individual with a rigorous STEM academic pedigree—possessing a minimum of a Bachelor’s or Master’s degree in a quantitative science—who demonstrates first-principles mastery of data structures and systemic organization. The complexity of this mission requires a candidate capable of bridging theoretical scientific datasets with practical, high-performance computing applications, ensuring that the raw data fueling proprietary molecular dynamics and ML models is structured for maximum architectural integrity and research discovery.
The Mission
The objective is to orchestrate the foundational data layer for a world-class machine learning ecosystem focused on pharmacological innovation. You will move beyond simple data management to serve as a critical link in the scientific lifecycle, transforming vast, heterogeneous datasets into structured, high-provenance assets. This role is a two-year high-intensity mission designed for early-career innovators who wish to apply engineering rigor to the challenges of molecular simulation and therapeutic design within a high-performance Linux environment.
Core Technical Objectives
- Synthesize and curate large-scale scientific datasets to catalyze the development of predictive models for molecular behavior and drug-target interactions.
- Engineer robust data pipelines within a Linux-based architecture to ensure the seamless flow of information from raw scientific output to model-ready structures.
- Formalize data provenance protocols by meticulously cataloging metadata, ensuring that every data point is traceable, discoverable, and architecturally sound.
- Optimize SQL and Python-based workflows to refine the ingestion and organization of diverse data sources, maintaining high fidelity across massive computational scales.
- Orchestrate the systemic organization of experimental and simulated data to facilitate rapid discovery and iterative research cycles within the ML team.
- Architectural Philosophy: A commitment to data integrity and a "clean room" approach to dataset structuring, recognizing that the quality of ML output is directly tethered to the precision of the underlying data.
- Technical Versatility: Deep familiarity with the Linux ecosystem and a functional command of Python and SQL for the manipulation of high-volume datasets.
- Methodological Rigor: An obsession with accurate metadata and provenance, ensuring that complex scientific datasets remain organized and accessible over long-term research horizons.
- Educational Foundation: A Bachelor’s or graduate degree in a core STEM discipline (Physics, Chemistry, Computer Science, Mathematics, or Bioengineering) with a strong emphasis on quantitative analysis.
- Mathematical & Computational Fluency: Demonstrated ability to navigate complex datasets and a fundamental understanding of the computational requirements of high-stakes scientific research.