What are the responsibilities and job description for the Data Engineer position at Rapsys Technologies?
Data Engineer - : Apache Airflow, Apache NiFi
• Orchestration: Apache Airflow, Apache NiFi.
• Programming: Java (Core), Python (for Airflow), Unix Shell Scripting.
• Big Data/Storage: Apache Spark, MinIO, AWS S3.
• Security: SSL/TLS, Certificate Management, IAM, Java Keystores.
• OS: Linux/Unix (RHEL/Ubuntu).
Key Responsibilities:
• Pipeline Orchestration: Design and develop complex, reusable DAGs in Apache Airflow to automate data workflows, scheduling, and monitoring across the enterprise.
• Data Ingestion & Flow: Create and optimize high-volume data streams using Apache NiFi, leveraging custom processors and controller services for diverse data sources and sinks.
• Custom Development: Utilize Java to develop custom NiFi processors or troubleshoot the core NiFi framework, ensuring the platform meets specific architectural requirements.
• Object Storage Management: Implement and manage data storage solutions using MinIO and AWS S3, ensuring high availability and efficient data retrieval patterns.
• Large-Scale Processing: Develop and maintain Apache Spark jobs for heavy-duty data transformations, integrating them into NiFi and Airflow orchestration layers.
• Security & Certificate Management: Secure NiFi clusters and data flows using TLS/mTLS, managing Java KeyStores (JKS), TrustStores, and SSL certificate lifecycles to ensure data-in-motion security.
• System Administration: Perform environment setup, performance tuning, and troubleshooting within Unix/Linux environments, including shell scripting for task automation.
• Cloud Integration: Deploy and manage data infrastructure components on AWS, utilizing IAM for access control and integrating cloud-native services with hybrid data pipelines.
• Monitoring & Optimization: Establish robust logging and alerting for data pipelines to proactively identify bottlenecks, ensuring 99.9% reliability of data delivery.
Role Descriptions: Key ResponsibilitiesPipeline Orchestration Design and develop complex| reusable DAGs in Apache Airflow to automate data workflows| scheduling| and monitoring across the enterprise.Data Ingestion Flow Create and optimize high-volume data streams using Apache NiFi| leveraging custom processors and controller services for diverse data sources and sinks.Custom Development Utilize Java to develop custom NiFi processors or troubleshoot the core NiFi framework| ensuring the platform meets specific architectural requirements.Object Storage Management Implement and manage data storage solutions using MinIO and AWS S3| ensuring high availability and efficient data retrieval patterns.Large-Scale Processing Develop and maintain Apache Spark jobs for heavy-duty data transformations| integrating them into NiFi and Airflow orchestration layers.Security Certificate Management Secure NiFi clusters and data flows using TLSmTLS| managing Java KeyStores (JKS)| TrustStores| and SSL certificate lifecycles to ensure data-in-motion security.System Administration Perform environment setup| performance tuning| and troubleshooting within UnixLinux environments| including shell scripting for task automation.Cloud Integration Deploy and manage data infrastructure components on AWS| utilizing IAM for access control and integrating cloud-native services with hybrid data pipelines.Monitoring Optimization Establish robust logging and alerting for data pipelines to proactively identify bottlenecks| ensuring 99.9 reliability of data delivery.
Essential Skills: Key ResponsibilitiesPipeline Orchestration Design and develop complex| reusable DAGs in Apache Airflow to automate data workflows| scheduling| and monitoring across the enterprise.Data Ingestion Flow Create and optimize high-volume data streams using Apache NiFi| leveraging custom processors and controller services for diverse data sources and sinks.Custom Development Utilize Java to develop custom NiFi processors or troubleshoot the core NiFi framework| ensuring the platform meets specific architectural requirements.Object Storage Management Implement and manage data storage solutions using MinIO and AWS S3| ensuring high availability and efficient data retrieval patterns.Large-Scale Processing Develop and maintain Apache Spark jobs for heavy-duty data transformations| integrating them into NiFi and Airflow orchestration layers.Security Certificate Management Secure NiFi clusters and data flows using TLSmTLS| managing Java KeyStores (JKS)| TrustStores| and SSL certificate lifecycles to ensure data-in-motion security.System Administration Perform environment setup| performance tuning| and troubleshooting within UnixLinux environments| including shell scripting for task automation.Cloud Integration Deploy and manage data infrastructure components on AWS| utilizing IAM for access control and integrating cloud-native services with hybrid data pipelines.Monitoring Optimization Establish robust logging and alerting for data pipelines to proactively identify bottlenecks| ensuring 99.9 reliability of data delivery.
Desirable Skills:
Keyword:
Skills: Digital : Python~Digital : Apache Spark~Digital : Databricks~Core Java~Unix / Linux Basics and Commands
Experience Required: 6-8
Salary : $55 - $60