What are the responsibilities and job description for the Head of Data Platform position at A2Z?

Why This Role Exists

We operate a multi-tenant automotive SaaS platform serving thousands of dealer groups across the United States. Our data layer — MySQL, Aurora, DynamoDB with DynamoDB Streams, S3, Glue Data Catalog — has grown to support a complex, high-throughput transactional platform. That layer works. Now we need to make it intelligent.

We are building a Dealer Intelligence Platform: a closed-loop system that observes raw signals from dealer operations, predicts outcomes, optimizes decisions under constraints, acts through approved channels, and learns from what happened. Pricing optimization, lead routing, inventory mix planning, service bay scheduling — each is a self-contained optimization function that consumes features, scores predictions, and closes the loop with action telemetry.

This role owns the entire data substrate that makes that loop possible — the lake, the feature store, the model registry, the action ledger, and the governance framework that keeps it all tenant-isolated and audit-grade. You are not inheriting a finished architecture; you are designing the one that turns a transactional platform into a decision engine.

Scope & Scale

Targeting 5,000 dealer tenants, each with isolated databases and per-tenant configuration
Billions in annual Gross Merchandise Value (GMV) flowing through platform transactions
30 third-party integrations across DMS, CRM, lending, F&I, and marketplace providers — each pushing data in different formats (SOAP/XML, REST, JSON, email)
Data pipelines spanning 6 integration domains with multi-protocol vendor connectivity
Data Streams processing real-time change events across onboarding, inventory, and transaction tables
A Product roadmap with 6 optimization functions — each requiring its own entity model, feature set, constraint definition, and feedback path

What You Will Own

The data platform end to end: Bronze (raw telemetry), Silver (canonical entities via Standardization Agent), Gold (KPIs derived features) — plus the Feature Store and Action Ledger that make the optimization loop possible
Feature Store architecture — online (sub-50ms reads for real-time scoring) and offline (point-in-time joins for training). Feature contract: owner, freshness SLO, PII tag, training/inference parity. Governed by Lake Formation
Action Ledger — every recommendation, approval, override, and outcome logged as a first-class object. This is the substrate that closes the loop: without it, models cannot retrain, we cannot attribute lift, and we cannot prove value to dealers
Model data pipeline — the feature materialization, training data assembly, and serving infrastructure that feeds SageMaker models and Bedrock agents across all optimization functions
Data migration strategy for the legacy platform — defining which tables move to DynamoDB, which consolidate into Aurora Serverless, and how dual-write validation works at every stage
Data quality and anomaly detection — automated monitoring for schema drift, null-rate spikes, stale pipelines, and integration data inconsistencies. If a bad feature reaches a model, that is your bug
Data governance: retention policies, PII handling, audit trail integrity, and multi-tenant AI governance (per-dealer data isolation for model training, cost attribution for Bedrock inference, FTC-grade action audit)
Streaming and event-driven data flows: DynamoDB Streams, EventBridge, CDC patterns, and real-time feature materialization

Technical Environment

Data Stores: DynamoDB (single-table design, DynamoDB Streams), MySQL/Aurora, S3 data lake (Bronze/Silver/Gold layers, Iceberg), DAX, Glue Data Catalog
Feature Store: SageMaker Feature Store (online offline), S3 Iceberg for point-in-time training data, Lake Formation-governed feature access
Streaming & Events: DynamoDB Streams, EventBridge, CDC, real-time feature materialization pipelines
ML/AI Platform: SageMaker (training pipelines, model registry, Model Monitor), AWS Bedrock (Claude, Titan), LangChain/LangGraph, Bedrock Knowledge Bases
Pipelines: Apache NiFi (multi-provider normalization), AWS Glue (ETL, Data Catalog), Athena (ad-hoc analytics)
Infrastructure: CloudFormation, Lake Formation, OpenTelemetry, CircleCI
Analytics: Glue Data Catalog, Athena, QuickSight or equivalent BI layer

Hands-On Expectations

This is not a slides-and-meetings role. We expect roughly 30-40% of your time in code and SQL (writing critical-path feature pipelines, data migrations, schema designs, and reviewing PRs with rigor), 40-50% in design (data architecture docs, entity modeling sessions, feature catalog design, optimization-function data contracts), and 10-20% on cross-team alignment, mentoring, and operating mechanisms.

First 12 Months

Months 1-3: Immerse in the data landscape — audit every data store, pipeline, integration flow, and DynamoDB Stream. Map the current entity model against the v2 optimization-function requirements. Publish the first data architecture ADR. Identify the top data quality gaps and define the canonical entity schemas that Silver and Gold will standardize around
Months 4-6: Stand up the Feature Store (online offline) and migrate the first round of features from Gold derivations. Build the Action Ledger schema and approval-gateway integration. Deliver the first data migration wave with dual-write validation. Ship automated data quality monitoring that catches drift before it reaches a model
Months 7-9: Pricing Optimizer data pipeline live in production — features flowing, predictions scoring, action telemetry closing the loop. Feature catalog operational with freshness SLOs and PII tagging. Per-dealer global model training pipelines running. Lead Routing data contracts defined and in shadow mode
Months 10-12: Second optimization function (Lead Routing or Inventory Mix) in production. Drift detection and auto-rollback wired into Model Monitor. Outcome dashboards proving incremental lift to dealers. Data platform roadmap documented for the next 12 months. The team treats feature quality, action telemetry, and training/inference parity as non-negotiable engineering standards because of the patterns you set

Requirements

You Should Have

7 years in data engineering or data architecture with at least 2 years in a platform-level architect or Head-of role
Deep experience with both relational (MySQL/PostgreSQL/Aurora) and NoSQL (DynamoDB, DynamoDB Streams) data modeling — and strong opinions on when each is appropriate
Hands-on experience building or operating feature stores — online serving, offline training, point-in-time correctness, feature freshness SLOs
Hands-on experience with AWS data and ML services: Glue, Athena, S3, DynamoDB, DynamoDB Streams, Aurora, Lake Formation, SageMaker
Current, practicing AI/Gen-AI practitioner — you have built or operated systems that prepare data for LLMs, embeddings, or ML models in production. This is not theoretical interest; it is hands-on recent experience
Experience with streaming and CDC patterns: DynamoDB Streams, Kinesis, EventBridge, or Kafka for real-time data propagation and feature materialization
Experience with data pipeline orchestration: NiFi, Airflow, Step Functions, or equivalent
Understanding of data migration patterns: dual-write, change data capture (CDC), reconciliation validation, zero-downtime cutover
Experience with multi-tenant data architectures — database-per-tenant, schema-per-tenant, Row-Level Security, and the judgment to know which trade-offs matter at 5,000 tenants
Strong data governance instincts: retention policies, PII handling, audit trails, cost attribution, and the discipline to enforce them before the first model ships
The ability to write a data architecture doc that engineers can implement without ambiguity, and the judgment to know when to write the SQL or pipeline code yourself instead

Strongly Preferred: ML & Optimization Data Infrastructure

Hands-on experience with SageMaker — training pipelines, model registry, Model Monitor, and production model lifecycle (shadow → canary → drift watch → auto-rollback)
Hands-on experience with AWS Bedrock — Knowledge Bases, model invocation, guardrails, and production deployment of foundation models
Experience building embeddings pipelines — vectorizing structured and unstructured data for semantic search, recommendations, or retrieval-augmented generation
Experience designing retrieval systems — chunking strategies, metadata filtering, re-ranking, and evaluating retrieval quality
Experience building closed-loop data systems — action telemetry, outcome attribution, A/B holdout management, and lift measurement
Experience with data quality and anomaly detection at scale — automated monitoring for schema drift, null rates, freshness SLAs, and feature/training skew
Understanding of multi-tenant AI governance: PII redaction, tenant-scoped inference, per-dealer model routing, cost attribution, and audit logging

Nice to Have

Experience with automotive, fintech, or multi-tenant marketplace data — compliance retention requirements
Familiarity with data formats and protocols from third-party providers (CDK, DealerTrack, Tekion)
Experience with constrained optimization, assignment problems, or scheduling solvers at the data layer
Background with event-driven and streaming data architectures (EventBridge, DynamoDB Streams, Kafka, CDC streams)
Experience with vector databases (OpenSearch, Pinecone) for production AI workloads
Experience with Iceberg, Delta Lake, or other open table formats for lakehouse architectures

Benefits

About A2Z Sync

A2Z Sync is a fast-paced and innovative automotive SaaS company seeking to make life better for our customers. We offer you a fun, casual, and collaborative culture, while fostering an environment where you work hard, see your results, and feel your impact. We are committed to our employees, and this starts with providing benefits that allow you to care for you and your family.

Mission

At A2Z Sync, we replace the friction of disconnected systems with the velocity of a single platform. We integrate digital insights with in-store operations to deliver transparent transactions that bring clarity to the car buyer and increased profitability to the dealer.

Our Values: We Are DRIVEN

Dealership Obsessed: We measure our success by the dealer's wins and the trust of their buyers, not just our own code
Relentless Ownership: No lone wolves, but no pass-backs either. We don't say "that's not my job."
Invent with Purpose: We don't chase "shiny" tech. We replace guesswork with intelligence, building the "data backbone" that turns raw information into a competitive advantage
Value Every Perspective: We are Better Together. We check egos at the door
Evolve or Evaporate: Change is our constant. We stay ahead by learning faster than the competition
Now Over Next: Perfection is the enemy of progress. We prefer action over endless analysis

Here's how we are doing it:

A2Z Sync offers comprehensive medical, dental, and vision benefits
Employer provided STD/LTD and life insurance
Matching 401k plan
Unlimited paid time off, including 10 paid holidays
Real ownership of a high-stakes AI surface — your roadmap, your architecture decisions, your metrics

Salary : $160,000 - $195,000

Apply for this job

Receive alerts for other Head of Data Platform job openings

Head of Data Platform

What are the responsibilities and job description for the Head of Data Platform position at A2Z?

What is the career path for a Head of Data Platform?

Job openings at A2Z

Not the job you're looking for? Here are some other Head of Data Platform jobs in the Denver, CO area that may be a better fit.

We don't have any other Head of Data Platform jobs in the Denver, CO area right now.

AI Assistant is available now!