Demo

Staff MLOps Engineer – ML Platform

BrightAI
Palo Alto, CA Full Time
POSTED ON 5/26/2026
AVAILABLE BEFORE 7/18/2026

Bright.AI is a high-growth Physical AI company transforming how infrastructure businesses interact with the physical world through intelligent automation. Our AI platform processes visual, spatial, and temporal data from billions of real‑world events—captured across edge devices, mobile sensors, and cloud infrastructure—to enable intelligent decision‑making at scale.

We are now hiring a Staff MLOps Engineer to lead the build‑out of our cloud‑native ML developer platform and production pipelines. This role is pivotal to building an integrated ML/AI development platform with programmatic data analysis and algorithm development capability on AWS—so teams can move from notebook to secure, reliable, and cost‑efficient production services quickly.

You’ll work at the intersection of ML engineering, cloud infrastructure, and developer experience, designing scalable data/model workflows, CI/CD for ML, observability, and governance that turn ideas into durable, monitored ML services.


Key Responsibilities:

  • Design, build, and operate our ML/AI development platform on AWS—including Amazon SageMaker AI (Studio/Notebooks, Training/Processing/Batch Transform, Real‑Time & Async Inference, Pipelines, Feature Store) and supporting services.
  • Establish golden‑path project templates, base Docker images, and internal Python libraries to standardize experiments, data processing, training, and deployment workflows.
  • Implement Infrastructure‑as‑Code (e.g., Terraform) and workflow orchestration (Step Functions, Airflow); optionally support EKS for training/inference.
  • Build automated data pipelines with S3, Glue, EMR/Spark (PySpark), Athena/Redshift; add data quality (Great Expectations/Deequ) and lineage.
  • Stand up experiment tracking and a model registry (SageMaker Experiments & Model Registry or MLflow); enforce versioning for data, code, and models.
  • Implement CI/CD for ML (CodeBuild/CodePipeline or GitHub Actions): unit/integration tests, data contracts, model tests, canary/shadow deployments, and safe rollback.
  • Ship real‑time endpoints (SageMaker endpoints/FastAPI on Lambda/ECS/EKS) and batch jobs; set SLOs and autoscaling, and optimize for cost/performance.
  • Build monitoring & observability for production models and services (drift, performance, bias with SageMaker Model Monitor; service telemetry with CloudWatch/Prometheus/Grafana).
  • Enforce security & governance: least‑privilege IAM, VPC isolation/PrivateLink, encryption, secret management.
  • Partner with backend engineers to productionize notebooks and prototypes.
  • Help integrate GenAI/Bedrock services where appropriate; support RAG pipelines with vector stores (OpenSearch) and evaluation harnesses.

Educational Background

  • B.S. or M.S. in Computer Science, Electrical/Computer Engineering, or related field; advanced degree a plus.
  • Strong foundation in machine learning systems, distributed computing, and data engineering; applied experience building production grade ML platforms.

Required Skills & Expertise

  • 8 years in software/ML engineering, including 4 years in MLOps or in a similar role.
  • Strong programming skills (proficient in Python), fluent with Docker and Terraform or AWS CDK.
  • Hands-on with AWS: SageMaker, S3, IAM, CloudWatch, ECR, and ECS/EKS/Lambda.
  • Built and operated CI/CD for ML (tests for code/data/models; automated deploys) and shipped real‑time & batch ML workloads to production.
  • Experience with experiment tracking & model registry (e.g., SageMaker Experiments/Model Registry or MLflow) and data versioning.
  • Implemented monitoring & quality (SageMaker Model Monitor, EvidentlyAI, Great Expectations/Deequ) and created on‑call/runbooks for model & service incidents.
  • Solid grasp of security & compliance in cloud ML (IAM policy design, VPC/private networking, KMS encryption, secrets management, audit logging).

Bonus Qualifications

  • Distributed training at scale (SageMaker Training, PyTorch DDP, Hugging Face on SageMaker).
  • Data engineering at scale (e.g., Spark/EMR, Glue, Redshift).
  • Observability stacks (e.g., Grafana), performance tuning, and capacity planning for ML services.
  • LLMOps/RAG (Bedrock, vector databases, evals) as optional capabilities.
  • Prior startup experience building ML platforms and products from the ground up.

Salary.com Estimation for Staff MLOps Engineer – ML Platform in Palo Alto, CA
$118,854 to $153,085
If your compensation planning software is too rigid to deploy winning incentive strategies, it’s time to find an adaptable solution. Compensation Planning
Enhance your organization's compensation strategy with salary data sets that HR and team managers can use to pay your staff right. Surveys & Data Sets

What is the career path for a Staff MLOps Engineer – ML Platform?

Sign up to receive alerts about other jobs on the Staff MLOps Engineer – ML Platform career path by checking the boxes next to the positions that interest you.
Income Estimation: 
$74,161 - $98,561
Income Estimation: 
$93,716 - $124,745
Income Estimation: 
$118,976 - $146,289
Income Estimation: 
$112,672 - $149,113
Income Estimation: 
$98,475 - $115,895
Income Estimation: 
$101,387 - $124,118
Income Estimation: 
$119,030 - $151,900
Employees: Get a Salary Increase
View Core, Job Family, and Industry Job Skills and Competency Data for more than 15,000 Job Titles Skills Library

Job openings at BrightAI

  • BrightAI Palo Alto, CA
  • Senior AI Engineer – RAG Systems Bright.AI is a high-growth Physical AI company transforming how businesses interact with the physical world through intell... more
  • 1 Day Ago

  • BrightAI Palo Alto, CA
  • Computer Vision Intern — Data Labeling & Annotation Type: Internship / Temporary Duration: 6 months - 12 months What You'll Gain Exposure to the full CV pi... more
  • 4 Days Ago

  • BrightAI Palo Alto, CA
  • Bright.AI is a high-growth Physical AI company transforming how infrastructure businesses interact with the physical world through intelligent automation. ... more
  • 7 Days Ago

  • BrightAI Palo Alto, CA
  • BrightAI is a high-growth Physical-AI company transforming how businesses interact with the physical world through intelligent automation. We are building ... more
  • 9 Days Ago


Not the job you're looking for? Here are some other Staff MLOps Engineer – ML Platform jobs in the Palo Alto, CA area that may be a better fit.

  • brightai Palo Alto, CA
  • Bright.AI is a high-growth Physical AI company transforming how infrastructure businesses interact with the physical world through intelligent automation. ... more
  • 7 Days Ago

  • Jobs via Dice Sunnyvale, CA
  • Job Description Hybrid This role is categorized as hybrid. This means the successful candidate is expected to report to the Sunnyvale Tecnical Center, CA a... more
  • 9 Days Ago

AI Assistant is available now!

Feel free to start your new journey!