What are the responsibilities and job description for the Data Architect – OCP (OpenShift) / IAM Data Modernization position at Carman Solutions Group?
Job Title: Data Architect OCP (OpenShift) / IAM Data Modernization
Location: Dallas, TX or Charlotte, NC (100% Onsite)
Duration: 6-12 Months
Objective
Migration of an on-premises SQL data warehouse to a modern enterprise Data Lake platform, enabling analytics and GenAI use cases.
The platform leverages PySpark-based processing, CI/CD pipelines, and containerized deployments on OpenShift (OCP), with GCP as a preferred cloud platform, to deliver scalable, secure, and high-performance data solutions.
About Program/Project
The IAM Data Modernization program focuses on transforming legacy data platforms into a scalable and cloud-compatible architecture.
Key Highlights
Integration Scope: 30 source systems with multiple downstream integrations.
Capabilities: Metrics, reporting, advanced analytics, and GenAI use cases (NL querying, summarisation, cross-domain insights).
Benefits
Scalable and resilient data platform.
High-performance semantic and analytics layer.
Single source of truth for enterprise-wide reporting and analytics.
Role Summary
We are looking for a Data Architect with strong expertise in OpenShift (OCP), PySpark, and CI/CD pipelines to design and govern scalable data platforms.
The role requires defining end-to-end data architecture, containerised deployment patterns, orchestration strategies (Airflow/Autosys), and platform standards, along with hands-on involvement in implementation.
Key Responsibilities
Data Architecture & Platform Design:
Define enterprise data architecture for IAM data lake and analytics platform.
Design scalable, modular, and containerised data pipeline architectures on OCP.
Establish data models, schema governance, and data lifecycle strategies.
Define best practices for data partitioning, performance optimisation, and cost efficiency.
OpenShift (OCP) & Platform Engineering
Architect and govern containerised data workloads on OpenShift (OCP).
Define standards for deployment, scaling, and workload isolation.
Collaborate with DevOps teams for platform engineering and infrastructure alignment.
Big Data & Processing (PySpark Focus)
Define architecture for PySpark-based batch and near real-time processing pipelines.
Provide guidance on distributed processing design, optimisation, and performance tuning.
Establish reusable frameworks for ETL/ELT processing.
Data Ingestion & Orchestration
Architect data ingestion frameworks (batch, streaming, CDC).
Define orchestration strategies using Airflow/Autosys.
Implement standards for retry, backfills, dependency management, and error handling.
DevOps/CI-CD
Define and oversee CI/CD strategy for data and platform deployments.
Enable automation of build, test, and deployment processes.
Ensure integration of CI/CD pipelines with OCP-based environments.
Cloud & Data Platforms (Preferred)
Provide architecture guidance for GCP-based data platforms (preferred, not mandatory).
Define integration patterns for cloud-native and on-premise hybrid environments.
Guide teams on cloud migration strategies and modern data platform adoption.
Data Governance, Quality & Observability
Data quality, validation, and lineage.
Metadata management and cataloguing.
Establish monitoring, logging, alerting, and SLOs for platform reliability.
Ensure compliance with data security and audit requirements.
Stakeholder Collaboration.
Work closely with client architects, IAM teams, and business stakeholders.
Translate business requirements into scalable technical architecture.
Provide architectural guidance and mentorship to engineering teams.
Required Skills - Core Skills (Must Have)
OpenShift (OCP)/Kubernetes-based platforms.
PySpark/Spark ecosystem.
CI/CD implementation for data platforms.
Airflow/Autosys orchestration tools
Solid Understanding Of
Data lake architectures (layered models).
ETL/ELT design patterns.
Distributed data processing concepts.
Data Engineering & Storage
Data formats: Parquet, ORC, Avro.
Partitioning and performance tuning.
Large-scale data modelling for analytics.
Cloud (Preferred Not Mandatory).
Experience with Google Cloud Platform (GCP) (preferred).
Exposure to services like BigQuery, Dataproc, Dataflow, GCS is a plus.
Observability & Reliability
Monitoring, logging, alerting frameworks.
Dashboards, SLOs, and operational runbooks.
Good To Have
Experience with IAM domain/cybersecurity data.
Understanding of data security and access control frameworks.
Exposure to GenAI-enabled data platforms.
Experience in Agile delivery and team leadership.
Experience
10 14 years in Data Architecture/Data Engineering.
Strong experience in OCP, PySpark, CI/CD, and orchestration frameworks.
Prior experience in data modernization/migration programs.
Education
Bachelor's/Master's in Computer Science, Information Systems, or equivalent.
Certifications (Preferred)
OpenShift/Kubernetes certifications.
GCP certifications (preferred, not mandatory).
Regards,
Himanshu Rawat
himanshu@carmansg.com
Location: Dallas, TX or Charlotte, NC (100% Onsite)
Duration: 6-12 Months
Objective
Migration of an on-premises SQL data warehouse to a modern enterprise Data Lake platform, enabling analytics and GenAI use cases.
The platform leverages PySpark-based processing, CI/CD pipelines, and containerized deployments on OpenShift (OCP), with GCP as a preferred cloud platform, to deliver scalable, secure, and high-performance data solutions.
About Program/Project
The IAM Data Modernization program focuses on transforming legacy data platforms into a scalable and cloud-compatible architecture.
Key Highlights
Integration Scope: 30 source systems with multiple downstream integrations.
Capabilities: Metrics, reporting, advanced analytics, and GenAI use cases (NL querying, summarisation, cross-domain insights).
Benefits
Scalable and resilient data platform.
High-performance semantic and analytics layer.
Single source of truth for enterprise-wide reporting and analytics.
Role Summary
We are looking for a Data Architect with strong expertise in OpenShift (OCP), PySpark, and CI/CD pipelines to design and govern scalable data platforms.
The role requires defining end-to-end data architecture, containerised deployment patterns, orchestration strategies (Airflow/Autosys), and platform standards, along with hands-on involvement in implementation.
Key Responsibilities
Data Architecture & Platform Design:
Define enterprise data architecture for IAM data lake and analytics platform.
Design scalable, modular, and containerised data pipeline architectures on OCP.
Establish data models, schema governance, and data lifecycle strategies.
Define best practices for data partitioning, performance optimisation, and cost efficiency.
OpenShift (OCP) & Platform Engineering
Architect and govern containerised data workloads on OpenShift (OCP).
Define standards for deployment, scaling, and workload isolation.
Collaborate with DevOps teams for platform engineering and infrastructure alignment.
Big Data & Processing (PySpark Focus)
Define architecture for PySpark-based batch and near real-time processing pipelines.
Provide guidance on distributed processing design, optimisation, and performance tuning.
Establish reusable frameworks for ETL/ELT processing.
Data Ingestion & Orchestration
Architect data ingestion frameworks (batch, streaming, CDC).
Define orchestration strategies using Airflow/Autosys.
Implement standards for retry, backfills, dependency management, and error handling.
DevOps/CI-CD
Define and oversee CI/CD strategy for data and platform deployments.
Enable automation of build, test, and deployment processes.
Ensure integration of CI/CD pipelines with OCP-based environments.
Cloud & Data Platforms (Preferred)
Provide architecture guidance for GCP-based data platforms (preferred, not mandatory).
Define integration patterns for cloud-native and on-premise hybrid environments.
Guide teams on cloud migration strategies and modern data platform adoption.
Data Governance, Quality & Observability
Data quality, validation, and lineage.
Metadata management and cataloguing.
Establish monitoring, logging, alerting, and SLOs for platform reliability.
Ensure compliance with data security and audit requirements.
Stakeholder Collaboration.
Work closely with client architects, IAM teams, and business stakeholders.
Translate business requirements into scalable technical architecture.
Provide architectural guidance and mentorship to engineering teams.
Required Skills - Core Skills (Must Have)
OpenShift (OCP)/Kubernetes-based platforms.
PySpark/Spark ecosystem.
CI/CD implementation for data platforms.
Airflow/Autosys orchestration tools
Solid Understanding Of
Data lake architectures (layered models).
ETL/ELT design patterns.
Distributed data processing concepts.
Data Engineering & Storage
Data formats: Parquet, ORC, Avro.
Partitioning and performance tuning.
Large-scale data modelling for analytics.
Cloud (Preferred Not Mandatory).
Experience with Google Cloud Platform (GCP) (preferred).
Exposure to services like BigQuery, Dataproc, Dataflow, GCS is a plus.
Observability & Reliability
Monitoring, logging, alerting frameworks.
Dashboards, SLOs, and operational runbooks.
Good To Have
Experience with IAM domain/cybersecurity data.
Understanding of data security and access control frameworks.
Exposure to GenAI-enabled data platforms.
Experience in Agile delivery and team leadership.
Experience
10 14 years in Data Architecture/Data Engineering.
Strong experience in OCP, PySpark, CI/CD, and orchestration frameworks.
Prior experience in data modernization/migration programs.
Education
Bachelor's/Master's in Computer Science, Information Systems, or equivalent.
Certifications (Preferred)
OpenShift/Kubernetes certifications.
GCP certifications (preferred, not mandatory).
Regards,
Himanshu Rawat
himanshu@carmansg.com