What are the responsibilities and job description for the MLOps Platform Engineer (SageMaker) position at AVA Consulting?
AVA Consulting is seeking a MLOps Platform Engineer (SageMaker)
Location: Plano, TX
U.S. Citizens and those authorized to work in the U.S. are encouraged to apply. We are unable to sponsor at this time.
Company Background: Our client, a major employer in the area, is looking for a MLOps Platform Engineer (SageMaker) to be part of its team in its North American operations.
Job Description:
- Client's Enterprise Platforms team is looking for a Senior ML Platform Engineer to design, build, and operationalize an enterprise ML platform on AWS SageMaker Unified Studio.
- You will migrate the organization from a fragmented ML toolchain to a unified, governed platform on AWS Landing Zone 2, covering the full ML lifecycle from data discovery through model deployment and monitoring.
Responsibilities:
- Set up SageMaker Unified Studio platform: domain configuration, project provisioning, persona-based roles, and multi-environment (Dev, Prod-UAT, Prod) promotion workflows
- Build MLOps pipelines using SageMaker Pipelines: data extraction from Snowflake, preprocessing, training, evaluation, and model registration
- Manage SageMaker Model Registry: cross-account model promotion, versioning, immutability, and lineage tracking
- Configure MLflow experiment tracking: auto-logging of parameters, metrics, and artifacts
- Set up identity and access management: Okta SSO, SailPoint entitlements, persona-based execution roles, service roles for pipelines
- Build model serving: real-time SageMaker endpoints and batch prediction workflows
- Set up model monitoring: data drift, model drift, performance degradation detection
- Configure data catalog: searchable datasets, access-level visibility, access-request workflows, lineage
- Own platform operations: observability (CloudWatch, Datadog), logging, custom images, instance availability
Requirements:
Must Have Skills:
- 10-15 years of software engineering experience focused on cloud infrastructure or ML platform operations
- 5 years hands-on with AWS, including deep expertise in Amazon SageMaker (Studio, Pipelines, Model Registry, Endpoints, Feature Store)
- 3 years building and operating production MLOps pipelines training, versioning, deployment, monitoring, rollback
- Experience with SageMaker Unified Studio or Studio Classic domain/project setup, blueprints, multi-tenant configuration
- Infrastructure-as-Code with Terraform, CDK, or CloudFormation
- IAM design for ML platforms execution roles, service roles, cross-account access, Lake Formation, SSO/SAML
- MLflow or equivalent experiment tracking
- SageMaker Pipelines or similar workflow orchestration (Airflow, Step Functions)
- Model serving real-time endpoints, batch transform, auto-scaling, endpoint monitoring
- Snowflake as a data source for ML pipelines
- Kubernetes (EKS) and container orchestration
- Networking and security VPC, security groups, private endpoints, cross-account connectivity
Preferred Skills:
- SageMaker Unified Studio domain provisioning, custom blueprints, project standardization
- SageMaker Feature Store for online/offline feature management
- SageMaker Model Monitor data quality checks, bias detection, drift detection
- AWS Machine Learning Specialty certification
NOTE: Interested Candidates can apply by sending their Updated Resume and Contact Details.
Ron Tolson
AVA Consulting
Fax:
Web: