What are the responsibilities and job description for the Senior Engineer, AI Data position at XCUTIVES INC.?
Job Desciption:
The Agentic AI Data Engineer is a hands-on role focused on building and maintaining the data pipelines and infrastructure that fuel AI agent systems. Within AI & Data group (Americas), you will be the builder who turns data architecture plans into reality, ensuring that AI models and agents have continuous access to high-quality, timely data. This client-facing consulting role involves hybrid work from client site as needed for deployment. You’ll work across wide array of business functions within Retail. By combining expertise in data ingestion, transformation, and integration with knowledge of AI data needs, you will play a critical part in enabling AI agents to perform reliably and accurately in production.
What Skills Are Expected:
Programming & Scripting: Strong programming skills, especially in Python, and experience with other languages like SQL.
Data Pipeline Development: Practical experience building data pipelines end-to-end.
Database and SQL Skills: Proficiency in writing and optimizing SQL queries.
Big Data & Distributed Processing: Experience with big data technologies like Apache Spark.
Streaming Data Experience: Familiarity with streaming frameworks and tools like Kafka.
API Integration and Web Services: Ability to interact with web APIs for data ingestion or extraction.
Data Formats and Parsing: Strong understanding of data formats and ability to parse JSON, XML, or custom text formats.
DevOps for Data Pipelines: Basic DevOps skills, including using Git for version control and CI/CD pipelines.
Problem Solving & Debugging: Strong ability to troubleshoot data issues.
Data Quality Focus: Attentiveness to data quality and skills in implementing checks and validating outputs.
Collaboration & Communication: Good communication skills to work with the team and clients.
Time Management & Flexibility: Ability to handle multiple tasks and prioritize effectively.
Domain Data Understanding: Aptitude to learn domain context from data.
Security & Privacy Business Units: Understanding of handling sensitive data securely in pipelines.
Continuous Learning: Willingness to learn new tools or frameworks as needed.
Key Technology Capabilities:
ETL / Data Integration Tools: Experience with tools such as Apache Airflow, Informatica PowerCenter, or cloud-based ones like Azure Data Factory.
Big Data Processing: Proficiency in Apache Spark and knowledge of Hadoop HDFS.
SQL & Databases: Strong practical SQL skills and familiarity with relational database systems.
NoSQL and Other Data Stores: Knowledge of specific systems like MongoDB or Cassandra.
Stream Processing: Hands-on usage of Apache Kafka and understanding of consumer group mechanics.
Cloud Storage & Compute: Familiarity with cloud storage services like Amazon S3 and cloud compute for ETL.
APIs & Web Services: Experience building or using connectors to RESTful APIs.
File Formats & Data Serialization: Understanding of various file formats and ability to convert between them.
Operating Systems & Scripting: Comfortable with Linux shell and basic shell scripting.
Version Control & CI/CD: Using Git for source control and setting up CI pipelines for data projects.
Monitoring & Logging Tools: Utilizing monitoring tools for data workflows.
Data Visualization/Verification: Basics of tools like Excel or Python’s Jupyter notebooks for data sanity checks.
Security & Networking: Understanding network configurations for data transfer.
Testing Frameworks: Familiarity with PyTest or unittest for writing tests for data transformations.
Collaboration Tools: Experience with tools like JIRA and documentation tools.
AI/ML Familiarity: Bonus if you understand some AI/ML fundamentals