What are the responsibilities and job description for the Data Engineer with Python and Scala position at Princeton IT Services?
Job Details
Job Title: Data Engineer with Python and Scala
Location: Anywhere in the USA (Remote)
Employment Type: Contract
Job Overview:
We are seeking an experienced Data Engineer to join our growing data team. The ideal candidate will have strong hands-on experience with Python, Scala, and database technologies, and will be responsible for building and optimizing scalable data pipelines and data models that power analytics and business insights.
Responsibilities:
- Design, build, and maintain scalable data pipelines and ETL processes.
- Develop and optimize data ingestion, transformation, and loading workflows from multiple sources.
- Collaborate with data analysts, data scientists, and other engineers to support data requirements.
- Ensure data quality, reliability, and performance across all systems.
- Work with large-scale datasets and ensure efficient data processing and storage.
- Implement best practices for data engineering, including code versioning, testing, and CI/CD.
- Contribute to the design and development of data models and metadata management.
- Troubleshoot and optimize queries and data processing jobs.
Required Skills and Qualifications:
- Bachelor's or Master's degree in Computer Science, Engineering, Information Systems, or related field.
- 8 years of experience as a Data Engineer or similar role.
- Strong programming skills in Python and Scala.
- Experience with relational and NoSQL databases (e.g., PostgreSQL, MySQL, MongoDB, Cassandra).
- Strong understanding of ETL concepts, data modeling, and data architecture.
- Hands-on experience with big data technologies such as Spark, Hadoop, or Kafka is a plus.
- Proficiency in SQL and query optimization.
- Experience with cloud platforms (AWS, Azure, or Google Cloud Platform) for data storage and pipeline orchestration.
- Strong problem-solving, debugging, and communication skills.
Preferred Skills:
- Experience with data orchestration tools (Airflow, Luigi, Prefect, etc.).
- Knowledge of streaming data and real-time processing.
- Familiarity with CI/CD pipelines and containerization (Docker, Kubernetes).
- Exposure to data governance and security best practices.