What are the responsibilities and job description for the Machine Learning Engineer position at Oho Group?

Senior/Lead Inference Systems Engineer (ML/AI)

About the Role

We're looking for a Senior or Lead Inference Systems Engineer to help build the next generation of large-scale AI inference infrastructure. Working alongside world-class hardware and software engineers, you'll play a key role in developing high-performance inference serving systems and cluster scheduling technologies that maximise the efficiency of modern foundation models.

This is an exciting opportunity to work at the cutting edge of distributed AI systems, helping shape how large language models are deployed, scaled, and optimised across heterogeneous compute environments.

Key Responsibilities

You will help design, build, and optimise large-scale inference serving platforms capable of delivering industry-leading throughput, latency, and efficiency.
You'll get the chance to develop and refine multi-node inference strategies that maximise performance across distributed compute clusters.
You will work on advanced optimisation techniques including tensor parallelism, pipeline parallelism, expert parallelism, continuous batching, and KV cache management.
This is an excellent opportunity for you to collaborate with hardware and systems teams to optimise workloads across compute, networking, and storage infrastructure.
You'll be responsible for driving performance improvements across leading inference frameworks such as vLLM, SGLang, and PyTorch.
You will contribute to the design and implementation of cluster scheduling systems that intelligently allocate resources and maximise utilisation at scale.
You'll get the opportunity to engage with the open-source community, contributing optimisations upstream and helping influence the future direction of widely adopted AI infrastructure projects.
You will help establish best practices around benchmarking, testing, debugging, and performance analysis to ensure a highly reliable production-grade platform.

Required Qualifications

You'll need strong software engineering experience with Python, C , and PyTorch.
You should have experience developing or contributing to modern LLM inference serving frameworks such as vLLM, SGLang, or equivalent technologies.
You must possess a deep understanding of large language model inference, including attention mechanisms, batching strategies, KV cache management, and serving optimisation techniques.
You'll need hands-on experience deploying, operating, or optimising large-scale distributed workloads across multi-node compute environments.
Experience with performance profiling, benchmarking, debugging, and system-level optimisation is essential.
You should be comfortable working in fast-paced engineering environments and collaborating across multiple technical disciplines.

Preferred Qualifications

Experience with distributed scheduling systems, cluster orchestration, resource management, or workload optimisation technologies.
Exposure to networking, storage systems, distributed caching, or infrastructure platforms supporting large-scale AI deployments.
Experience working with technologies such as Orca, LMCache, or similar distributed inference optimisation frameworks.
Knowledge of GPU programming and performance optimisation using CUDA, Triton, ROCm, or related technologies.
Understanding of AI accelerator architectures and large-scale heterogeneous computing environments.
Experience contributing to open-source AI infrastructure projects.

Education

You should be educated to Master's or PhD level in Computer Science, Computer Engineering, Electrical Engineering, or a related technical discipline, or possess equivalent industry experience.

Salary : $250,000 - $350,000

Apply for this job

Receive alerts for other Machine Learning Engineer job openings

Machine Learning Engineer

What are the responsibilities and job description for the Machine Learning Engineer position at Oho Group?

What is the career path for a Machine Learning Engineer?

Job openings at Oho Group

Not the job you're looking for? Here are some other Machine Learning Engineer jobs in the San Jose, CA area that may be a better fit.

We don't have any other Machine Learning Engineer jobs in the San Jose, CA area right now.

AI Assistant is available now!