What are the responsibilities and job description for the Data Architect (Google Cloud Platform) position at TechSpace Solutions Inc.?
Job Title: Data Architect (Google Cloud Platform)
Location: Dallas, TX / Charlotte, NC (Onsite)
Duration: 12 Months
Project/Program:
Identity & Access Management (IAM) Data Modernization
Migration of an on-premises SQL data warehouse to a modern enterprise Data Lake platform, enabling analytics and GenAI use cases.
The platform leverages PySpark-based processing, CI/CD pipelines, and containerized deployments on OpenShift (OCP), with Google Cloud Platform as a preferred cloud platform, to deliver scalable, secure, and high-performance data solutions
About Program/Project:
The IAM Data Modernization program focuses on transforming legacy data platforms into a scalable and cloud-compatible architecture.
Key Highlights:
Integration Scope: 30 source systems with multiple downstream integrations
Capabilities: Metrics, reporting, advanced analytics, and GenAI use cases (NL querying, summarization, cross-domain insights)
Benefits:
- Scalable and resilient data platform
- High-performance semantic and analytics layer
- Single source of truth for enterprise-wide reporting and analytics
Role Summary:
We are looking for a Data Architect with strong expertise in OpenShift (OCP), PySpark, and CI/CD pipelines to design and govern scalable data platforms.
The role requires defining end-to-end data architecture, containerised deployment patterns, orchestration strategies (Airflow/Autosys), and platform standards, along with hands-on involvement in implementation.
Key Responsibilities:
Data Architecture & Platform Design:
- Define enterprise data architecture for IAM data lake and analytics platform
- Design scalable, modular, and containerized data pipeline architectures on OCP
- Establish data models, schema governance, and data lifecycle strategies
- Define best practices for data partitioning, performance optimization, and cost efficiency
OpenShift (OCP) & Platform Engineering:
- Architect and govern containerized data workloads on OpenShift (OCP)
- Define standards for deployment, scaling, and workload isolation
- Collaborate with DevOps teams for platform engineering and infrastructure alignment
Big Data & Processing (PySpark Focus):
- Define architecture for PySpark-based batch and near real-time processing pipelines
- Provide guidance on distributed processing design, optimisation, and performance tuning
- Establish reusable frameworks for ETL/ELT processing
Data Ingestion & Orchestration
- Architect data ingestion frameworks (batch, streaming, CDC)
- Define orchestration strategies using Airflow / Autosys
- Implement standards for retry, backfills, dependency management, and error handling
DevOps / CI-CD:
- Define and oversee CI/CD strategy for data and platform deployments
- Enable automation of build, test, and deployment processes
- Ensure integration of CI/CD pipelines with OCP-based environments
Cloud & Data Platforms:
- Provide architecture guidance for Google Cloud Platform-based data platforms (preferred, not mandatory)
- Define integration patterns for cloud-native and on-premises hybrid environments
- Guide teams on cloud migration strategies and modern data platform adoption
Data Governance, Quality & Observability
Define frameworks for:
- Data quality, validation, and lineage
- Metadata management and cataloguing
- Establish monitoring, logging, alerting, and SLOs for platform reliability
- Ensure compliance with data security and audit requirements
Stakeholder Collaboration
- Work closely with client architects, IAM teams, and business stakeholders
- Translate business requirements into scalable technical architecture
- Provide architectural guidance and mentorship to engineering teams
Core Skills (Must Have)
- OpenShift (OCP) / Kubernetes-based platforms
- PySpark / Spark ecosystem
- CI/CD implementation for data platforms
- Airflow / Autosys orchestration tools
Solid understanding of:
- Data lake architectures (layered models)
- ETL/ELT design patterns
- Distributed data processing concepts
Data Engineering & Storage:
- Data formats: Parquet, ORC, Avro
- Partitioning and performance tuning
- Large-scale data modelling for analytics
Cloud (Preferred Not Mandatory)
- Experience with Google Cloud Platform (Google Cloud Platform) (preferred)
- Exposure to services like Big Query, Dataproc, Dataflow, GCS is a plus
Observability & Reliability
- Monitoring, logging, alerting frameworks
- Dashboards, SLOs, and operational runbooks
Good to Have
- Experience with IAM domain / cybersecurity data
- Understanding of data security and access control frameworks
- Exposure to GenAI-enabled data platforms
- Experience in Agile delivery and team leadership
Experience:
- 10 14 years in Data Architecture / Data Engineering
- Strong experience in OCP, PySpark, CI/CD, and orchestration frameworks
- Prior experience in data modernization / migration programs
- Education: Bachelor s/master's in computer science, Information Systems, or equivalent
Certifications:
- OpenShift / Kubernetes certifications
- Google Cloud Platform certifications