What are the responsibilities and job description for the SageMaker Platform Administrator || Atlanta, GA position at Ampstek?
Title: SageMaker Platform Administrator
Location: Atlanta, GA
Job Type: 12 Months
We are seeking an experienced SageMaker Platform Administrator to manage, maintain, and optimize the Amazon SageMaker environment for data science and machine learning teams. The role involves ensuring platform stability, managing user access and governance, optimizing costs, supporting ML workflow deployments, and collaborating with data scientists, ML engineers, and cloud operations teams to drive efficiency and compliance.
________________________________________
Key Responsibilities
• Administer and maintain the Amazon SageMaker platform, including setup, configuration, upgrades, and monitoring.
• Manage user access controls, roles, and permissions following security and compliance policies.
• Oversee SageMaker Studio, Notebooks, Endpoints, Pipelines, and Model Registry.
• Monitor platform health, resource utilization, performance, and optimize costs across compute/storage resources.
• Implement and maintain automation, monitoring, and alerting for SageMaker workloads.
• Support data scientists and ML engineers in deploying and managing ML models at scale.
• Troubleshoot platform and environment-related issues, ensuring minimal downtime.
• Collaborate with cloud engineering teams to integrate SageMaker with other AWS services (S3, Lambda, API Gateway, EKS, CloudWatch, etc.).
• Establish governance, compliance, and auditing practices for ML operations.
• Document standard operating procedures, best practices, and guidelines for platform usage.
________________________________________
Required Skills & Qualifications
• Bachelor’s or Master’s degree in Computer Science, Engineering, or related field.
• 4 years of experience in AWS Cloud administration with at least 2 years managing Amazon SageMaker.
• Strong understanding of AWS IAM, VPC, CloudWatch, CloudTrail, CloudFormation/Terraform.
• Hands-on experience with SageMaker Studio, Notebooks, Endpoints, Model Registry, and Pipelines.
• Experience in ML Ops practices, CI/CD for ML models, and automation of ML workflows.
• Strong troubleshooting skills with AWS networking, containerization (ECS/EKS/Docker), and integration with external data sources.
• Good knowledge of security, compliance, and cost optimization in AWS.
• Familiarity with Python, Boto3, or scripting for automation.
• Excellent communication, documentation, and collaboration skills.
________________________________________
Nice to Have (Preferred Skills)
• AWS Certified Solutions Architect, SysOps Administrator, or Machine Learning Specialty.
• Knowledge of Databricks, Kubeflow, MLflow, or other ML platforms.
• Experience with CI/CD pipelines (CodePipeline, Jenkins, GitHub Actions, etc.).
• Exposure to data engineering tools like Glue, EMR, or Redshift.