What are the responsibilities and job description for the Pyspark Developer position at Jobs via Dice?
Dice is the leading career destination for tech experts at every stage of their careers. Our client, HTD Resources, LLC, is seeking the following. Apply via Dice today!
Role: PySpark Developer
Mode: Onsite
Duration: Full-time
Visa Types: // EAD
Relocation: Yes
Locations: Pittsburgh, PA; Dallas, TX; Cleveland, OH
Job Description:
We are seeking an experienced PySpark Developer to design, develop, and support scalable data pipelines and big data solutions in an enterprise environment. The ideal candidate will have strong hands-on experience with Apache Spark and PySpark, along with expertise in building efficient data processing systems that handle large-scale datasets.
Key Responsibilities:
Role: PySpark Developer
Mode: Onsite
Duration: Full-time
Visa Types: // EAD
Relocation: Yes
Locations: Pittsburgh, PA; Dallas, TX; Cleveland, OH
Job Description:
We are seeking an experienced PySpark Developer to design, develop, and support scalable data pipelines and big data solutions in an enterprise environment. The ideal candidate will have strong hands-on experience with Apache Spark and PySpark, along with expertise in building efficient data processing systems that handle large-scale datasets.
Key Responsibilities:
- Design, develop, and maintain scalable data pipelines using PySpark and Spark SQL
- Perform data ingestion, transformation, and processing from multiple data sources
- Develop and optimize Spark DataFrame and SQL-based transformations for performance and efficiency
- Work with Hadoop ecosystem components including Hive, HDFS, and Impala
- Ensure data quality, data validation, and reconciliation between source and target systems
- Monitor, troubleshoot, and resolve production issues and performance bottlenecks
- Collaborate with cross-functional teams including data engineers, analysts, and business stakeholders
- Support data platform reliability, scalability, and performance tuning
- Strong hands-on experience in PySpark
- Solid experience with Apache Spark and Spark SQL
- Strong programming skills in Python
- Experience working with big data processing frameworks and distributed systems
- Good understanding of data pipeline development and optimization techniques
- Experience with Hadoop ecosystem tools such as Hive, HDFS, Impala
- Experience with cloud platforms (AWS / Azure / Google Cloud Platform)
- Knowledge of data warehousing concepts and ETL processes
- Familiarity with performance tuning and debugging in Spark environments
- Strong problem-solving and analytical skills
- Strong communication and collaboration skills
- Ability to work in a fast-paced, production support environment
- Attention to detail and strong troubleshooting abilities