What are the responsibilities and job description for the AI/ML Platform Engineer position at Surge IT?
We are seeking an experienced AI/ML Platform Engineer with a strong background in building, deploying, and operationalizing AI/ML solutions. The ideal candidate will have deep expertise in both AWS and Databricks environments, along with hands-on experience designing scalable machine learning workflows, pipelines, and model management systems.
This role requires a solid understanding of modern data and AI engineering practices, cloud infrastructure, and the ability to deliver production-grade AI/ML platforms.
Key Responsibilities
This role requires a solid understanding of modern data and AI engineering practices, cloud infrastructure, and the ability to deliver production-grade AI/ML platforms.
Key Responsibilities
- Design, build, and maintain scalable AI/ML platforms and pipelines for production environments.
- Develop and operationalize ML workflows, including data ingestion, transformation, training, and deployment.
- Collaborate with data scientists and engineers to enable efficient experimentation and model lifecycle management.
- Work with AWS (Lambda, SQS, EC2, EBS, S3) and Databricks to optimize performance and reliability of AI systems.
- Implement infrastructureascode solutions using tools like Terraform and manage containerized workloads using Kubernetes.
- Develop, test, and maintain code in Python (including PySpark) and other languages such as R, JavaScript, and PowerShell.
- Leverage generative AI tools and frameworks, including LangChain, for building advanced AI applications.
- Apply prompt engineering techniques for finetuning and improving generative AI models.
- Monitor and troubleshoot system performance using tools such as AWS XRay and Azure monitoring suites.
- 10 years of overall IT experience with at least 5 years focused on AI/ML engineering and platform development.
- Proven experience in AWS and Databricks ecosystems.
- Strong proficiency in Python, PySpark, and related ML frameworks.
- Handson experience with data engineering, model management, and MLOps workflows.
- Strong understanding of cloud infrastructure, automation, and container orchestration.
- Demonstrated experience in AI/ML coding, prompt writing, and generative AI development.
- Experience building scalable ML data platforms and cloudnative architectures.
- Familiarity with LangChain and modern LLMbased application development.
- Knowledge of Terraform, Kubernetes, AWS XRay, and Azure Databricks.
- Experience with machine learning model deployment, monitoring, and optimization.