What are the responsibilities and job description for the Data Scientist position at The University of Texas at Arlington?
Job Summary
The Data Scientist design, build, and operate robust, secure data pipelines that power clinical-research products, analytics dashboards, and downstream data-science workloads. Partner closely with clinicians, investigators, Office of Information Technology and collaborators’ Business Intelligence teams, and external research collaborators to translate complex biomedical data into actionable insights.
Minimum Qualifications
The Data Scientist design, build, and operate robust, secure data pipelines that power clinical-research products, analytics dashboards, and downstream data-science workloads. Partner closely with clinicians, investigators, Office of Information Technology and collaborators’ Business Intelligence teams, and external research collaborators to translate complex biomedical data into actionable insights.
Minimum Qualifications
- Bachelor’s degree in Computer Science, Engineering or related field (or equivalent experience).
- Seven (7) years of professional experience in data engineering, software development, or an equivalent mix of education and relevant experience in similar role.
- Experience with Snowflake, Microsoft Azure Synapse, or other modern data- warehouse platforms.
- Exposure to machine-learning pipelines (e.g., using OpenAI or other LLM services).
- Experience building/maintaining cloud data platforms (such as GCP , OCI , Linode, AWS , Azure) and data-lake/warehouse solutions, as well as production workload management.
- Hands-on Linux system administration (containerization, networking, security).
- Architect end-to-end pipelines that ingest high-volume de-identified clinical, genomic and phenotypic datasets from collaborators’ EHR systems (Epic Clarity/Caboodle) and cloud storage.
- Build and host production-grade web portals and REST APIs for secure researcher/clinician access supporting role-based permissions and audit trails.
- Leverage OpenAI LLMs (or similar NLP services) to auto-extract Human Phenotype Ontology ( HPO ) terms from de-identified clinical documentation.
- Design high-throughput ETL workflows that parse heterogeneous datasets for ingestion into relational databases and cloud-native warehouses, feeding results into downstream analytics pipelines.
- Design and develop real-time capable analytical systems to integrate with and/or augment EHR systems.
- Perform systems administration for data-platform hosts, including system hardening, patch management, firewall configuration.
- Implement monitoring stacks and custom health checks to maintain near-continuous system availability.
- Translate clinical research requirements into technical specifications, producing clear data-model diagrams, lineage documentation, and data-dictionary artifacts.
- Deliver data-product demos to investigators, effectively showcasing how pipeline outputs support precision medicine reporting.
- Champion standards for metadata management, schema versioning, and test-driven data engineering.
- Other duties as assigned.