What are the responsibilities and job description for the Data Engineer position at Virtusa?
Role Summary
The Senior Data Engineer will be responsible for architecting and owning high-integrity, GxP-validated data pipelines for drug discovery R&D and FDA submission. This role requires deep specialization in cloud-native AWS architectures to automate multimodal sensor, video, and audio ETL/ELT workflows for global clinical trials, ensuring full regulatory compliance (GxP, HIPAA) across the data lifecycle.
Key Responsibilities
Platform Architecture & Systems Design Design and own the end-to-end data platform supporting concurrent clinical trials across therapeutic areas. Ingest multimodal clinical data (EEG, PPG, speech biomarkers, video, actigraphy) from multiple external vendors.
Cloud-Native Modernization Architect and implement event-driven ingestion frameworks (using AWS Serverless, EventBridge, S3, DynamoDB) to replace legacy managed ETL, reducing per-pipeline operational cost and cutting data delivery latency from days to hours.
Data Quality & Compliance Design, deploy, and own a config-driven, metadata-driven Quality Control (QC) framework that automatically validates data against vendor data dictionaries for schema, nulls, types, and ID format. Own Installation Qualification (IQ) execution across GxP production pipelines to ensure clinical data can be used in regulatory submissions.
Data Governance & Leadership Define and enforce data contracts across vendor relationships, serving as the technical authority to standardize onboarding patterns and reduce new vendor integration time.
Observability & Reporting Build operational dashboards (PowerBI, CloudWatch) and automated alerting (SNS) to provide real-time stakeholder visibility into pipeline health, QC pass/fail rates, and data completeness.
ML & Research Infrastructure Engineer data ingestion, lineage, and governance infrastructure for foundation model development on biosensor data, ensuring datasets are structured, traceable, and compliant for GPU-accelerated training workflows.
The Senior Data Engineer will be responsible for architecting and owning high-integrity, GxP-validated data pipelines for drug discovery R&D and FDA submission. This role requires deep specialization in cloud-native AWS architectures to automate multimodal sensor, video, and audio ETL/ELT workflows for global clinical trials, ensuring full regulatory compliance (GxP, HIPAA) across the data lifecycle.
Key Responsibilities
Platform Architecture & Systems Design Design and own the end-to-end data platform supporting concurrent clinical trials across therapeutic areas. Ingest multimodal clinical data (EEG, PPG, speech biomarkers, video, actigraphy) from multiple external vendors.
Cloud-Native Modernization Architect and implement event-driven ingestion frameworks (using AWS Serverless, EventBridge, S3, DynamoDB) to replace legacy managed ETL, reducing per-pipeline operational cost and cutting data delivery latency from days to hours.
Data Quality & Compliance Design, deploy, and own a config-driven, metadata-driven Quality Control (QC) framework that automatically validates data against vendor data dictionaries for schema, nulls, types, and ID format. Own Installation Qualification (IQ) execution across GxP production pipelines to ensure clinical data can be used in regulatory submissions.
Data Governance & Leadership Define and enforce data contracts across vendor relationships, serving as the technical authority to standardize onboarding patterns and reduce new vendor integration time.
Observability & Reporting Build operational dashboards (PowerBI, CloudWatch) and automated alerting (SNS) to provide real-time stakeholder visibility into pipeline health, QC pass/fail rates, and data completeness.
ML & Research Infrastructure Engineer data ingestion, lineage, and governance infrastructure for foundation model development on biosensor data, ensuring datasets are structured, traceable, and compliant for GPU-accelerated training workflows.