What are the responsibilities and job description for the Web Scraping Engineer (Mid-Level) position at Systems Plus, Inc.?
Daily Responsibilities:
Position supports the development and maintenance of the agency’s web scraping infrastructure. The position is responsible for extracting data from various websites and APIs, ensuring data quality and accuracy, and optimizing the scraping process for efficiency. Duties include:
- Develop and maintain web scraping scripts and tools to extract data from websites and APIs.
- Collaborate with cross-functional teams to understand data requirements and implement scraping solutions accordingly.
- Monitor and troubleshoot scraping processes to ensure data quality and accuracy.
- Optimize scraping scripts for performance and efficiency, considering factors such as speed, scalability, and resource utilization.
- Stay up-to-date with the latest web scraping techniques, tools, and best practices.
- Conduct data analysis and validation to ensure the integrity of scraped data.
- Collaborate with data engineering and data science teams to integrate scraped data into our data pipelines and systems.
- Document and communicate technical solutions, processes, and best practices to team members.
Required Experience:
- 3 years of professional experience in web scraping or a similar role.
- Python, Java and experience with web scraping libraries such as BeautifulSoup, Scrapy, or Selenium.
- Experience with AI/machine learning techniques for data extraction and classification.
- Understanding of HTML, CSS, and JavaScript to navigate and interact with websites.
- Experience working with APIs and handling different data formats (JSON, XML, etc.).
- Experience with Google Cloud Platform
- Familiarity with database systems and SQL for data storage and retrieval.
- Familiarity of data cleaning and preprocessing techniques to ensure data quality.
- Strong problem-solving skills and ability to troubleshoot and debug scraping issues.
- Excellent communication and collaboration skills to work effectively in a team environment.
- Attention to detail and ability to handle large volumes of data efficiently.
Preferred Experience:
- Experience with cloud platforms for scalable web scraping infrastructure.
- Familiarity with data visualization tools and techniques.
- Understanding of legal and ethical considerations related to web scraping.
Required Degree:
- BS/BA degree in Computer Science, Information Sciences, or related IT discipline
- OR Allowable Substitution: Additional ten (10) years of related professional experience can be substituted for a BS/BA degree.
Required Clearance: Must have a current Moderate Risk Public Trust. Secret clearance preferred.