What are the responsibilities and job description for the Principal Data Architect - Bethesda, MD (Weekly 3 days onsite) position at APEX IT Services?
Hi,
we are immediately looking for Principal Data Architect _ Bethesda, MD (Weekly 3 days onsite) with one of our direct clients. if you are interested or know someone who is looking for projects, please respond with resume
Position: Principal Data Architect (Historical Data Migration)
Location: Bethesda, MD (Weekly 3 days onsite)
Job Description:
- Lead solution architecture and technical governance for a large-scale historical data migration (3B records) from DB2 to Salesforce using a PostgreSQL staging/ETL layer. Own end-to-end migration strategy, scalability/performance design, data quality and reconciliation, and phased (throttled) capability activation with incremental delta loads to enable reliable, auditable Production cutover.
Key Responsibilities
- Define target-state data architecture (DB2 -> PostgreSQL -> Salesforce): canonical models, transformation patterns, lineage, and reconciliation controls.
- Design and govern high-volume migration execution: partitioning, parallelization, tuning, runtime scalability, and environment alignment.
- Own phased activation and delta load strategy: sequencing/dependencies, CDC or watermarking, micro-batching, replay/backfill, and idempotent loading into Salesforce.
- Establish quality, resiliency, and governance: validation rules, error/quarantine handling, duplicate/survivorship rules, audit evidence, and ARB/Design reviews.
- Partner across Product, Salesforce, ETL/Integration, and Release teams, mentor engineers, produce HLD/LLD, mappings, runbooks, and operational dashboards.
Qualifications
- Years in data architecture/engineering, leading large-scale migrations (hundreds of millions to billions of records) from legacy/mainframe sources (e.g., DB2) to cloud/SaaS targets.
- Strong SQL PostgreSQL (staging/ETL persistence): schema design, partitioning, bulk loading, tuning, and operational rigor (HA/DR/backup, auditability, reconciliation).
- Delta/incremental load design (CDC/watermarking, micro-batching), with idempotent processing and replay/backfill strategies.
- Salesforce data loading at scale: object modeling, external IDs, load sequencing, Bulk API patterns, and governor/locking constraints.
- Data quality, de-duplication, and survivorship frameworks for customer/transaction domains; strong validation and reconciliation patterns.
- Cloud experience on AWS supporting large-scale data platforms/migration factories (e.g., S3, RDS/Aurora PostgreSQL, Glue, EMR/Spark, Lambda/Step Functions, IAM/KMS, CloudWatch).
Large-Volume Source-to-Target Data Validation (At Scale)
- Define end-to-end reconciliation (DB2 -> PostgreSQL -> Salesforce): count/control totals (e.g., financial/points), tolerances, and sign-off thresholds.
- Implement scalable integrity checks: partition-level hashing/checksums, stratified sampling, and distribution/edge-case validation.
- Operationalize quality and auditability: rule-based validations, automated quarantine/exception workflows, lineage (batch/run IDs, manifests, watermarks), and dashboarded reporting.
- Account for Salesforce constraints: external ID unique/idempotency, lock/contention monitoring, and relationship verification post-load.
Additional Details
- Nice to have Qualifications: Salesforce certifications (Data/Application Architect), MuleSoft experience, and migration factory governance. Deliverables (First 60-90 Days): Approved migration architecture/design, phased activation strategy, and operational readiness package.
- Success Measures: Repeatable automated migration runs, meeting throughput windows within Salesforce limits, and operational stability in production.