What are the responsibilities and job description for the Model Optimization - Lead Engineer position at Lenovo and Careers?
General Information
- United States of America - Illinois - Chicago
Why Work at Lenovo
Lenovo is a US$69 billion revenue global technology powerhouse, ranked #196 in the Fortune Global 500, and serving millions of customers every day in 180 markets. Focused on a bold vision to deliver Smarter Technology for All, Lenovo has built on its success as the world’s largest PC company with a full-stack portfolio of AI-enabled, AI-ready, and AI-optimized devices (PCs, workstations, smartphones, tablets), infrastructure (server, storage, edge, high performance computing and software defined infrastructure), software, solutions, and services. Lenovo’s continued investment in world-changing innovation is building a more equitable, trustworthy, and smarter future for everyone, everywhere. Lenovo is listed on the Hong Kong stock exchange under Lenovo Group Limited (HKSE: 992) (ADR: LNVGY).
Description and Requirements
About Our Team
On-device AI is the future—enabling real-time, private, and always-available intelligence. You'll push the boundaries of what's possible on mobile hardware, delivering AI experiences that run locally with low latency and all-day battery life. Your optimizations directly enable breakthrough product features.
Lenovo is hiring for a Model Optimization Lead Engineer to lead optimization and deployment of large models for edge devices. You will master technologies, such as, Quantization frameworks (TensorRT, ONNX Runtime), edge AI runtimes (ExecuTorch, llama.cpp), NPU SDKs (Qualcomm QNN, Apple Core ML), model compression libraries, profiling tools (NVIDIA Nsight, Snapdragon Profiler)
Location: Chicago, IL,
Hybrid (3 days on-site, 2 days remote)
What You'll Do
- Lead optimization and deployment of large models (LLMs, VLMs, diffusion) for edge devices using quantization (INT4/INT8), pruning, knowledge distillation, and LoRA.
- Partner with silicon teams to optimize model execution on heterogeneous hardware: NPUs (Qualcomm Hexagon, Google Edge TPU), GPUs, and CPUs.
- Implement and benchmark deployment frameworks: TensorRT-LLM, ONNX Runtime, ExecuTorch, llama.cpp, MLC-LLM.
- Drive hardware-software co-design, influencing sensor and silicon roadmaps to enable efficient AI inference.
- Build ML ops infrastructure: model serving, A/B testing, performance monitoring, continuous optimization.
- Lead a team of optimization engineers and collaborate with ML researchers, hardware teams, and product managers.
- Stay at the forefront of on-device AI: sub-10B parameter models, mixed precision, sparse attention, federated learning.
Basic Qualifications
- 7 years in ML engineering or systems, with 3 years focused on model optimization and deployment.
- Bachelor's Degree in Engineering or Computer Science.
- Deep expertise in model compression: quantization (QAT, PTQ), pruning, distillation, low-rank adaptation.
- Hands-on experience with mobile/edge AI frameworks (TensorRT, ONNX, TFLite, CoreML).
Preferred Qualifications
- Understanding of hardware architectures: NPU/GPU/CPU characteristics, SIMD operations, memory hierarchies.
- Proficiency in C /Python and performance optimization (CUDA, OpenCL, or NPU programming).
- Track record of shipping ML models to production on resource-constrained devices.
The base salary budgeted range for this position is $180K-$220K. Individuals may also be considered for bonus and/or commission.
Lenovo’s various benefits can be found on www.lenovobenefits.com
- United States of America - Illinois - Chicago
Salary : $180,000 - $220,000