What are the responsibilities and job description for the Principal Data Architect position at Apex IT Services?
Principal Data Architect
Bethesda, MD (3 days in office)
Long term
Qualifications
- 20 years in data architecture/engineering, leading large-scale migrations (hundreds of millions to billions of records) from legacy/mainframe sources (e.g., DB2) to cloud/SaaS targets.
- Strong SQL PostgreSQL (staging/ETL persistence): schema design, partitioning, bulk loading, tuning, and operational rigor (HA/DR/backup, auditability, reconciliation).
- Delta/incremental load design (CDC/watermarking, micro-batching), with idempotent processing and replay/backfill strategies.
- Salesforce data loading at scale: object modeling, external IDs, load sequencing, Bulk API patterns, and governor/locking constraints.
- Data quality, de-duplication, and survivorship frameworks for customer/transaction domains; strong validation and reconciliation patterns.
- Cloud experience on AWS supporting large-scale data platforms/migration factories (e.g., S3, RDS/Aurora PostgreSQL, Glue, EMR/Spark, Lambda/Step Functions, IAM/KMS, CloudWatch).
Large-Volume Source-to-Target Data Validation (At Scale)
- Define end-to-end reconciliation (DB2 -> PostgreSQL -> Salesforce): count/control totals (e.g., financial/points), tolerances, and sign-off thresholds.
- Implement scalable integrity checks: partition-level hashing/checksums, stratified sampling, and distribution/edge-case validation.
- Operationalize quality and auditability: rule-based validations, automated quarantine/exception workflows, lineage (batch/run IDs, manifests, watermarks), and dashboarded reporting.
- Account for Salesforce constraints: external ID unique/idempotency, lock/contention monitoring, and relationship verification post-load.