What are the responsibilities and job description for the SQL Developer with PySpark and Python position at Princeton IT Services, Inc?
Job Title: SQL Developer with PySpark and Python
Location: Pittsburgh, PA (Hybrid)
Job Type: Contract
Position Summary
We are seeking a highly skilled SQL Developer with strong expertise in PySpark and Python to support data engineering and analytics initiatives. The ideal candidate will work on developing and optimizing complex SQL queries, building scalable data pipelines, and supporting the transformation and processing of large data sets in a distributed environment.
Key Responsibilities
Location: Pittsburgh, PA (Hybrid)
Job Type: Contract
Position Summary
We are seeking a highly skilled SQL Developer with strong expertise in PySpark and Python to support data engineering and analytics initiatives. The ideal candidate will work on developing and optimizing complex SQL queries, building scalable data pipelines, and supporting the transformation and processing of large data sets in a distributed environment.
Key Responsibilities
- Design, develop, and maintain complex SQL queries and stored procedures to extract and transform data.
- Develop and optimize data pipelines using PySpark in distributed processing environments (e.g., Databricks, EMR, or Spark clusters).
- Write modular, reusable Python scripts to support ETL workflows, data validation, and automation.
- Collaborate with data architects, analysts, and business stakeholders to understand data requirements and deliver high-quality solutions.
- Monitor and tune performance of queries and pipelines for efficiency and scalability.
- Ensure data quality and integrity across multiple data sources and systems.
- Participate in code reviews, documentation, and deployment processes.
- 5 years of experience in SQL development, including performance tuning and data modeling.
- 3 years of hands-on experience with PySpark for big data processing.
- Strong experience with Python, especially for data transformation and automation tasks.
- Experience working in cloud or distributed data platforms (e.g., Azure, AWS, or GCP environments).
- Familiarity with version control (e.g., Git) and CI/CD practices for data pipelines.
- Excellent problem-solving, communication, and collaboration skills.
- Experience with Databricks or other Spark-based data platforms.
- Knowledge of data lake, data warehouse, and real-time processing architectures.
- Experience with Apache Airflow, Snowflake, or similar technologies is a plus.
- Prior experience working in financial services, healthcare, or manufacturing domains is desirable.