What are the responsibilities and job description for the Solutions Engineer position at GMI Cloud?

About the Role

We’re looking for a Forward Deployment Engineer (FDE) to work directly with customers and partners to design, deploy, and validate Inference dedicated endpoint & Model-as-a-Service products on GMI’s global infrastructure.

This is a high-impact, hybrid engineering role that sits at the intersection of platform engineering, applied ML, and customer success. You’ll be embedded with customers during early-stage deployments—turning research ideas, datasets, and business requirements into working, performant systems on real GPU clusters.

If you enjoy being close to users, debugging real systems, and shipping results fast (not just writing docs), this role is for you.

What You’ll Do

Own customer POCs end-to-end

Deploy and optimize LLM and multi-modal inference workflows on GMI clusters
Translate customer requirements into concrete system designs and experiments

Forward-deploy with customers

Work hands-on with research teams, startups, and enterprise customers
Debug performance, stability, and correctness issues in real environments

Inference deployment

Stand up and tune inference stacks (e.g. vLLM / SGLang / Ray Serve–style architectures)
Optimize latency, throughput, GPU utilization, and cost efficiency

Model-as-a-Service enablement

Help customers test, evaluate, and adopt the most frontier LLM and multi-modal models through GMI's unified API
Guide model selection, API integration, and migration across providers; shorten the "idea → production" cycle
Validate correctness, compatibility, and performance across the MaaS model catalog

Performance & reliability

Diagnose GPU, networking, and distributed system bottlenecks
Run benchmarks, profiling, and stress tests on multi-GPU / multi-node setups

Feedback loop to product

Feed real-world customer learnings back into GMI's platform, SDKs, and APIs
Help shape reference architectures, cookbooks, and best practices

What We’re Looking For

Core Requirements

Proficiency in at least one programming language (Python and Golang preferred)
Solid understanding of software systems and distributed systems
Hands-on experience with ML inference or serving systems
Comfort working directly with customers and ambiguous requirements
Ability to debug end-to-end systems (code, infra, networking, performance)

Nice to Have

Experience with:
LLM inference frameworks (vLLM, SGLang, Ray Serve, Triton, etc.)
Global, distributed systems
Hands-on experience developing and maintaining production services on Kubernetes
GPU performance profiling, optimization, and inference benchmarking
Prior experience as:
Forward Deployed Engineer
Solutions Engineer
ML Platform Engineer
Applied Research Engineer

What Makes This Role Special

You’re close to real users and real GPUs—not abstract roadmaps
You’ll work on cutting-edge inference and frontier models, not toy demos
You’ll influence product direction through direct customer feedback
Fast iteration, high ownership, and visible impact

Who Thrives Here

Engineers who like shipping over theorizing
People who enjoy being the “last mile” problem solver
Builders who want exposure to both deep systems and applied ML
Those excited by early-stage POCs that turn into real production systems

Apply for this job

Receive alerts for other Solutions Engineer job openings

Solutions Engineer

What are the responsibilities and job description for the Solutions Engineer position at GMI Cloud?

What is the career path for a Solutions Engineer?

Job openings at GMI Cloud

Not the job you're looking for? Here are some other Solutions Engineer jobs in the Mountain View, CA area that may be a better fit.

We don't have any other Solutions Engineer jobs in the Mountain View, CA area right now.

AI Assistant is available now!