What are the responsibilities and job description for the Data Engineer (W2 Contract) position at HPTech Inc.?
We are seeking a strong, handson Data Engineer to join a fastmoving Cybersecurity organization focused on threat detection, correlation, and automated remediation. This role is heavily dataengineering focused (approximately 80% Data Engineering / 20% ML exposure) and requires deep fundamentals, not surfacelevel experience.
This team works with largescale, highvolume data pipelines that support nearrealtime security analytics and GenAIdriven tools used by Cyber Operations teams and executive leadership.
Key Responsibilities
- Design, build, and maintain scalable data pipelines handling large volumes of structured and semistructured data
- Develop and optimize pipelines using PySpark and Databricks
- Implement data ingestion, transformation, and automation workflows in AWS
- Work with realtime and nearrealtime data sources, including Kafka and APIs
- Design pipelines supporting highvolume processing (beyond simple microbatching)
- Apply best practices around:
- Data quality
- Performance optimization
- Pipeline reliability and scalability
- Collaborate with cybersecurity, data science, and platform teams to support:
- Threat detection use cases
- Log analysis and security telemetry
- GenAIpowered data products
- Participate in technical and behavioral interviews, including handson discussions and screensharing exercises
Required Qualifications
Data Engineering Fundamentals (Must Have)
- 8 years of professional data engineering experience (10 years of experience is preferred)
- Strong Python skills for data engineering (pandas, Data Frames not application development)
- Solid handson experience with PySpark / Apache Spark
- Proven experience building pipelines in Databricks
- Strong AWS experience, including:
- S3
- Core AWS services used in data pipelines
- Experience designing and automating endtoend data pipelines
- Strong understanding of data modeling and SCD (Slowly Changing Dimensions)
Streaming & Integration
- Experience working with streaming or nearrealtime data
- Kafka (must understand how to consume data, even if not used daily)
- APIs
- Microbatching with highvolume use cases
- Ability to articulate tradeoffs between batch, microbatch, and streaming architectures
Engineering Depth
- Able to clearly explain core skills and handson contributions
- Comfortable demonstrating fundamentals during screenshare sessions
- Strong problemsolving and debugging skills
Nice to Have
- Exposure to ML / ML Ops (not the primary focus)
- Java experience
- Experience working in security, analytics, or logheavy environments
- Valid technical certifications (must be able to clearly demonstrate and explain them)