What are the responsibilities and job description for the AI/ML Platform Engineer position at Surge IT?
- /No C2C option/
Core Requirements
- 10 years of IT/engineering experience
- 3 years of handson AI/ML development experience
- 4 years working directly with AWS services (Lambda, EC2, S3, DynamoDB, IoT Core, API Gateway, Fargate/ECS)
- Proven experience deploying ML systems into production environments
- Strong coding skills and ability to build systems endtoend
- Deep Learning frameworks: TensorFlow, PyTorch, Keras
- LLMs, prompt engineering, NLP pipelines
- Python and Java as primary languages; strong engineering fundamentals
- FastAPI and microservices for ML inference
- InfrastructureasCode (Terraform)
- Kubernetes and Docker for scalable ML workloads
- Distributed/cloud systems design with AWS
- Edgetocloud system integration experience
- Handson build experience (not just design/architecture)
- Build and deploy productiongrade ML/AI pipelines and services
- Develop LLMpowered and NLPdriven applications
- Write, optimize, and maintain highquality Pythonbased ML code
- Implement scalable infrastructure using Terraform, AWS, and Kubernetes
- Build FastAPIbased inference services and cloud APIs
- Collaborate with crossfunctional engineering teams to deliver highimpact systems
- Troubleshoot, optimize, and own systems endtoend as a handson engineer
- Experience with distributed systems and microservices
- Strong understanding of ML model lifecycle, deployment patterns, and operational monitoring