What are the responsibilities and job description for the Senior Software Engineer, Compute Platform position at Jobs via Dice?
Dice is the leading career destination for tech experts at every stage of their careers. Our client, Oscar Technology, is seeking the following. Apply via Dice today!
About The Company
A fast-growing, venture-backed startup is building a next-generation AI compute platform focused on decentralized, high-performance infrastructure. The company is rethinking how organizations access and scale compute by integrating global data centers into a unified, serverless platform.
Their mission is to democratize access to AI compute and provide an end-to-end lifecycle solution-from raw data to deployed models-through a combination of platform infrastructure and forward-deployed engineering.
With a global footprint and early traction, the team is tackling challenges across multi-cloud orchestration, GPU scheduling, and enterprise-grade infrastructure, with a strong focus on security and compliance.
The Role
This is a high-impact infrastructure role focused on designing and scaling distributed systems that power AI/ML workloads at scale.
You'll work across:
What You'll Work On
< data-start="1448" data-end="1500">Compute Platform & Multi-Cloud Architecture
< data-start="2471" data-end="2497">Core Requirements
About The Company
A fast-growing, venture-backed startup is building a next-generation AI compute platform focused on decentralized, high-performance infrastructure. The company is rethinking how organizations access and scale compute by integrating global data centers into a unified, serverless platform.
Their mission is to democratize access to AI compute and provide an end-to-end lifecycle solution-from raw data to deployed models-through a combination of platform infrastructure and forward-deployed engineering.
With a global footprint and early traction, the team is tackling challenges across multi-cloud orchestration, GPU scheduling, and enterprise-grade infrastructure, with a strong focus on security and compliance.
The Role
This is a high-impact infrastructure role focused on designing and scaling distributed systems that power AI/ML workloads at scale.
You'll work across:
- Core platform architecture
- Multi-cloud compute orchestration
- Managed services development
- Customer-facing deployments
What You'll Work On
< data-start="1448" data-end="1500">Compute Platform & Multi-Cloud Architecture
- Design abstraction layers across cloud providers (AWS, Google Cloud Platform, Azure, bare-metal)
- Build systems that unify compute, storage, and networking across environments
- Expand global compute capacity by integrating with cloud and data center providers
- Architect reusable, composable infrastructure components
- Own services end-to-end (design deployment monitoring)
- Build orchestration systems for GPU workloads and container scheduling
- Develop APIs and control planes for provisioning, scaling, and lifecycle management
- Drive improvements in performance, reliability, and cost efficiency
- Build systems for billing, usage tracking, and cost attribution
- Develop observability tooling (metrics, logging, tracing)
- Establish engineering standards and best practices
- Mentor engineers and contribute to system design decisions
< data-start="2471" data-end="2497">Core Requirements
- 4 years building distributed systems, backend infrastructure, or cloud platforms
- Strong experience with AWS, Google Cloud Platform, or Azure
- Deep understanding of:
- Compute (VMs, instances)
- Storage (object, block, file systems)
- Networking (VPCs, load balancers, security groups)
- Experience with Kubernetes and container orchestration
- Strong programming skills (Golang preferred; Python/Rust a plus)
- Experience building APIs, control planes, or platform services
- Familiarity with databases (Postgres, Redis, etc.) and messaging systems (Kafka, RabbitMQ)
- GPU orchestration or AI/ML infrastructure experience
- HPC or cluster management (Kubernetes, Slurm)
- Data engineering or large-scale ETL systems
- Systems-level programming (low-level infra, operators, daemons)
- ML platform engineering (training/inference pipelines)
- Experience deploying into enterprise or on-prem environments