What are the responsibilities and job description for the Senior Software Engineer, Compute Platform position at Oscar?
A fast-growing, venture-backed startup is building a next-generation AI compute platform focused on decentralized, high-performance infrastructure. The company is rethinking how organizations access and scale compute by integrating global data centers into a unified, serverless platform.
Their mission is to democratize access to AI compute and provide an end-to-end lifecycle solution-from raw data to deployed models-through a combination of platform infrastructure and forward-deployed engineering.
With a global footprint and early traction, the team is tackling challenges across multi-cloud orchestration, GPU scheduling, and enterprise-grade infrastructure, with a strong focus on security and compliance.
This is a high-impact infrastructure role focused on designing and scaling distributed systems that power AI/ML workloads at scale.
You'll work across:
- Core platform architecture
- Multi-cloud compute orchestration
- Managed services development
- Customer-facing deployments
This role requires a strong mix of systems engineering product thinking, with exposure to both backend infrastructure and end-user experience.
Compute Platform & Multi-Cloud Architecture
- Design abstraction layers across cloud providers (AWS, GCP, Azure, bare-metal)
- Build systems that unify compute, storage, and networking across environments
- Expand global compute capacity by integrating with cloud and data center providers
- Architect reusable, composable infrastructure components
- Own services end-to-end (design → deployment → monitoring)
- Build orchestration systems for GPU workloads and container scheduling
- Develop APIs and control planes for provisioning, scaling, and lifecycle management
- Drive improvements in performance, reliability, and cost efficiency
- Build systems for billing, usage tracking, and cost attribution
- Develop observability tooling (metrics, logging, tracing)
- Establish engineering standards and best practices
- Mentor engineers and contribute to system design decisions
Core Requirements
- 4 years building distributed systems, backend infrastructure, or cloud platforms
- Strong experience with AWS, GCP, or Azure
- Deep understanding of:
- Compute (VMs, instances)
- Storage (object, block, file systems)
- Networking (VPCs, load balancers, security groups)
- Experience with Kubernetes and container orchestration
- Strong programming skills (Golang preferred; Python/Rust a plus)
- Experience building APIs, control planes, or platform services
- Familiarity with databases (Postgres, Redis, etc.) and messaging systems (Kafka, RabbitMQ)
- GPU orchestration or AI/ML infrastructure experience
- HPC or cluster management (Kubernetes, Slurm)
- Data engineering or large-scale ETL systems
- Systems-level programming (low-level infra, operators, daemons)
- ML platform engineering (training/inference pipelines)
- Experience deploying into enterprise or on-prem environments
This is a high-impact infrastructure role focused on designing and scaling distributed systems that power AI/ML workloads at scale.
You'll work across:
Core platform architecture
Multi-cloud compute orchestration
Managed services development
Customer-facing deployments
This role requires a strong mix of systems engineering product thinking, with exposure to both backend infrastructure and end-user experience.
What You'll Work On
Compute Platform & Multi-Cloud Architecture
Design abstraction layers across cloud providers (AWS, GCP, Azure, bare-metal)
Build systems that unify compute, storage, and networking across environments
Expand global compute capacity by integrating with cloud and data center providers
Architect reusable, composable infrastructure components
Managed Services & Platform Development
Own services end-to-end (design → deployment → monitoring)
Build orchestration systems for GPU workloads and container scheduling
Develop APIs and control planes for provisioning, scaling, and lifecycle management
Drive improvements in performance, reliability, and cost efficiency
Infrastructure & Platform Services
Build systems for billing, usage tracking, and cost attribution
Develop observability tooling (metrics, logging, tracing)
Establish engineering standards and best practices
Mentor engineers and contribute to system design decisions
Oscar Associates Limited (US) is acting as an Employment Agency in relation to this vacancy.
Salary : $300,000 - $400,000