What are the responsibilities and job description for the Research Engineer - Data Infrastructure position at Seer?

Senior / Staff Data Infrastructure Machine Learning Engineer

We are building advanced intelligent systems designed to operate in complex real-world environments. Our team develops the full stack — from high-performance hardware and distributed systems infrastructure to large-scale machine learning platforms and multimodal foundation models.

Backed by significant funding and operating at the intersection of AI, infrastructure, and large-scale systems engineering, we are investing heavily in research, infrastructure, and production-scale deployment to build next-generation intelligent systems.

We are hiring Senior and Staff-level Data Infrastructure Machine Learning Engineers to scale the systems powering our ML training data platform — from ingestion and storage to indexing, retrieval, observability, and throughput optimization across massive multimodal datasets.

What You’ll Do

Build and Scale High-Throughput Data Infrastructure

Architect, build, and operate distributed data infrastructure capable of processing and managing billions of video and multimodal data samples
Design systems with strong guarantees around reliability, latency, scalability, and cost efficiency
Optimize cloud object storage, metadata systems, databases, and large-scale distributed storage architectures

Develop Large-Scale Indexing and Retrieval Systems

Build efficient indexing and retrieval systems to support rapid dataset querying, filtering, and iteration
Improve data access patterns and retrieval performance for research and production ML workflows
Design scalable metadata and search infrastructure for multimodal datasets

Improve Observability and Reliability

Develop monitoring, alerting, failure recovery, and performance optimization frameworks for large-scale data pipelines
Build tooling to identify bottlenecks and improve operational visibility across distributed systems
Optimize workload balancing and throughput across distributed compute and storage infrastructure

Manage Data Lifecycle and Reproducibility

Build systems for artifact management, dataset versioning, lineage tracking, and reproducibility across training workflows
Ensure traceability and consistency across evolving datasets and training runs
Develop lightweight internal tooling enabling engineers and researchers to explore and analyze data at scale

Support ML and Vision-Language Workloads

Integrate and scale vision-language model (VLM) inference within distributed data pipelines
Support automated enrichment, filtering, metadata generation, and preprocessing workflows
Collaborate closely with ML systems and research teams to improve data quality and training velocity

What We’re Looking For

5 years of experience in data infrastructure, distributed systems, ML infrastructure, or related fields
Strong experience building and operating large-scale distributed data pipelines
Deep understanding of:
Distributed systems architecture
Databases and metadata systems
Indexing and retrieval strategies
Cloud storage architectures
Experience optimizing throughput, workload balancing, and cost-performance tradeoffs in cloud environments
Hands-on experience with distributed processing frameworks such as Ray or Spark
Strong observability, monitoring, and production reliability experience
Strong software engineering fundamentals with the ability to own systems end-to-end

Level Expectations

Senior engineers are expected to execute complex systems work with strong technical depth and increasing ownership
Staff-level engineers are expected to define architectural direction, drive technical strategy, and independently lead major infrastructure initiatives

Preferred Experience

Experience managing large multimodal datasets
Familiarity with ML training workflows and data lifecycle management
Experience running large-scale ML inference workloads in distributed or cloud environments
Familiarity with vision-language models (VLMs)
Experience working with real-world sensor data such as video, telemetry, or time-series streams
Familiarity with data warehouse technologies such as Snowflake, BigQuery, or Redshift
Experience with data versioning and lineage systems such as DVC, Delta Lake, or similar tooling

Why This Role Matters

Build the foundational data infrastructure that directly impacts model quality and system performance
Collaborate closely with ML systems and research teams on problems with immediate and measurable impact
Operate with high ownership in a small, highly technical environment
Help scale intelligent systems operating in real-world environments

About the Company

We are a research-driven AI company focused on building scalable intelligent systems capable of robust operation in dynamic environments. By combining advances in machine learning, distributed systems, and infrastructure engineering, we aim to push the frontier of large-scale AI systems.

We are committed to building an inclusive and diverse workplace and encourage applicants from all backgrounds to apply.

Salary : $250,000 - $400,000

Apply for this job

Receive alerts for other Research Engineer - Data Infrastructure job openings

Research Engineer - Data Infrastructure

What are the responsibilities and job description for the Research Engineer - Data Infrastructure position at Seer?

What is the career path for a Research Engineer - Data Infrastructure?

Job openings at Seer

Not the job you're looking for? Here are some other Research Engineer - Data Infrastructure jobs in the Palo Alto, CA area that may be a better fit.

We don't have any other Research Engineer - Data Infrastructure jobs in the Palo Alto, CA area right now.

AI Assistant is available now!