Demo

Machine Learning Engineer- Inference Optimization | Experienced Hire

Susquehanna International Group, LLP
Philadelphia, PA Other
POSTED ON 6/2/2026
AVAILABLE BEFORE 6/12/2026

Overview

We are looking for a Machine Learning Engineer focused on low-latency inference optimization to help build, tune, and productionize high-performance model serving systems. This role sits at the intersection of machine learning, systems engineering, and GPU performance. You will work on inference workloads where latency, throughput, reliability, and hardware efficiency all matter, and where a deep understanding of modern inference runtimes can meaningfully improve production outcomes.

 

You will work closely with quantitative researchers and engineers to understand model structure, identify inference bottlenecks, and turn research ideas into efficient production systems. The work may involve other types of models, but focuses on transformer-style architectures, and structured inference workloads. You will evaluate and tune frameworks and related serving or compilation systems, while also reasoning about GPU execution, memory layout, batching strategies, precision tradeoffs, and end-to-end latency.

What you'll do

  • Design, build, and optimize low-latency inference systems for production machine learning workloads.
  • Profile model inference pipelines across model execution, runtime configuration, batching, memory movement, serialization, networking, and I/O.
  • Evaluate, integrate, and tune inference runtime systems.
  • Improve latency, throughput, GPU utilization, for production inference workloads.
  • Build and support benchmarking and profiling tools to compare model variants, hardware targets, runtime configurations, and deployment strategies.
  • Debug performance issues involving GPU memory, compute saturation, kernel behavior, CPU/GPU coordination, data movement, and serving-layer overhead.
  • Help shape model and system design choices so that research models are efficient to deploy under real latency constraints.
  • Where necessary, collaborate with lower-level systems or GPU specialists on custom operators, kernel-level optimization, or hardware-specific performance work.

What we’re looking for

  • Experience deploying, optimizing, or operating machine learning inference workloads in production or production-like environments.
  • Programming experience in Python, Java, C# etc. and at least one systems language such as C, C , Rust, or Go
  • Solid understanding of modern ML frameworks such as PyTorch, including model execution, export, tracing, compilation, and performance profiling.
  • Ability to reason about latency, throughput, batching, memory use, GPU utilization, and reliability under real workloads.
  • Strong practical judgment around tradeoffs between model quality, latency, throughput, implementation complexity, and maintainability.

Preferred qualifications

  • Experience optimizing inference for latency-sensitive or high-throughput applications.
  • Experience with model optimization techniques such as quantization, pruning, distillation, operator fusion, graph lowering, custom operators, or model compilation.
  • Exposure to CUDA, Triton language, ROCm, PTX, CuTe, CUTLASS, FlashInfer, or similar low-level GPU programming tools.
  • Experience running inference workloads on Kubernetes or GPU clusters, including scheduling, autoscaling, observability, and resource management.
  • Background in mathematics, physics, computer science, engineering, statistics, quantitative finance, or another technical field.
  • Demonstrated ability to improve real-world inference performance beyond a baseline framework implementation.

 

If you're a recruiting agency and want to partner with us, please reach out to recruiting@sig.com. Any resume or referral submitted in the absence of a signed agreement will not be eligible for an agency fee.

Hourly Wage Estimation for Machine Learning Engineer- Inference Optimization | Experienced Hire in Philadelphia, PA
$58.00 to $75.00
If your compensation planning software is too rigid to deploy winning incentive strategies, it’s time to find an adaptable solution. Compensation Planning
Enhance your organization's compensation strategy with salary data sets that HR and team managers can use to pay your staff right. Surveys & Data Sets

What is the career path for a Machine Learning Engineer- Inference Optimization | Experienced Hire?

Sign up to receive alerts about other jobs on the Machine Learning Engineer- Inference Optimization | Experienced Hire career path by checking the boxes next to the positions that interest you.
Income Estimation: 
$97,257 - $120,701
Income Estimation: 
$123,167 - $152,295
Income Estimation: 
$101,387 - $124,118
Income Estimation: 
$119,030 - $151,900
Employees: Get a Salary Increase
View Core, Job Family, and Industry Job Skills and Competency Data for more than 15,000 Job Titles Skills Library

Job openings at Susquehanna International Group, LLP

  • Susquehanna International Group, LLP Philadelphia, PA
  • Overview About Susquehanna Susquehanna is a global quantitative trading firm founded by a group of friends who share a passion for game theory and probabil... more
  • 2 Days Ago

  • Susquehanna International Group, LLP Philadelphia, PA
  • Overview As a Network Services team member, you will gain exposure to a large and complex network topology. We are responsible for monitoring, maintaining,... more
  • 2 Days Ago

  • Susquehanna International Group, LLP Philadelphia, PA
  • Overview Susquehanna International Group is seeking a FIX Connectivity Engineer to support, enhance, and expand electronic trading connectivity for our glo... more
  • 2 Days Ago

  • Susquehanna International Group, LLP York, NY
  • Overview Susquehanna is hiring a Senior Software Engineer to design and build production-grade AI agents that reduce operational overhead for developers, q... more
  • 2 Days Ago


Not the job you're looking for? Here are some other Machine Learning Engineer- Inference Optimization | Experienced Hire jobs in the Philadelphia, PA area that may be a better fit.

  • Susquehanna International Group, LLP Philadelphia, PA
  • Overview As a member of our Citrix Engineering team, you will own, maintain, and support high-volume, large-scale, highly available virtual application and... more
  • 4 Days Ago

  • Susquehanna International Group, LLP Philadelphia, PA
  • Overview As a member of our Platform Development team, you will be instrumental in building and optimizing high-performance trading systems, research compu... more
  • 1 Month Ago

AI Assistant is available now!

Feel free to start your new journey!