Demo

Perception Algorithm Engineer

Black Sesame Technologies Inc
San Jose, CA Full Time
POSTED ON 5/16/2026
AVAILABLE BEFORE 6/14/2026

Autonomous Driving Multimodal Model Algorithm Engineer

VLM / VLA / World Model

Black Sesame Technologies is building high-performance AI algorithms and self-developed chips for intelligent driving and beyond. As an Autonomous Driving Multimodal Model Algorithm Engineer, you will work on next-generation multimodal AI models for autonomous driving, including Vision-Language Models, Vision-Language-Action Models, and World Models.

You will collaborate with perception, prediction, planning, data, simulation, and deployment teams to integrate multimodal models with existing BEV perception, two-stage E2E, and one-stage E2E autonomous driving systems.

We are looking for candidates with hands-on experience in one or more of the following areas: Vision-Language Models, Vision-Language-Action Models, World Models.

Responsibilities

Multimodal Model Development for Autonomous Driving

  • Work on one or more multimodal modeling directions for autonomous driving, including VLM-based scene understanding, VLA-style planning-oriented modeling, and World Model-based future prediction.
  • Develop and optimize models that reason over multi-camera images, BEV features, map elements, object/lane instances, occupancy, trajectories, ego-motion, and driving context.
  • Explore model architectures that connect perception, prediction, planning, and decision-making in two-stage and one-stage E2E autonomous driving systems.
  • Collaborate with BEV perception and planning teams to improve representation quality, temporal consistency, long-tail robustness, and planning relevance.

Vision-Language and Vision-Language-Action Modeling

  • Develop VLM-based methods for driving scene understanding, open-vocabulary perception, risk reasoning, corner-case analysis, and interpretable autonomy.
  • Adapt and extend open-source multimodal architectures such as LLaVA, Qwen-VL, InternVL, MiniCPM-V, OpenVLA, or similar models for autonomous driving scenarios.
  • Research VLA-style models that map multimodal driving context, navigation intent, and high-level instructions to trajectories, actions, or planning representations.
  • Align visual, BEV, map, object, lane, occupancy, trajectory, and language representations for driving-specific tasks.
  • Build supervised fine-tuning, instruction-tuning, and efficient adaptation pipelines for driving-relevant multimodal tasks.

World Model and Future Prediction

  • Build world-model-based approaches for future BEV, occupancy, object motion, lane evolution, traffic interaction, and ego-conditioned scene rollout.
  • Explore generative and predictive modeling methods such as diffusion models, autoregressive transformers, latent dynamics models, video prediction, and BEV prediction.
  • Use learned world models for scenario generation, counterfactual reasoning, long-tail case mining, planning evaluation, and closed-loop analysis.
  • Work with simulation and data teams to improve safety-critical scenario discovery and model-based evaluation.

Efficient Adaptation and Deployment

  • Apply efficient fine-tuning and adaptation methods such as LoRA, QLoRA, Adapter, Prompt Tuning, Prefix Tuning, or other PEFT techniques.
  • Develop multimodal feature alignment modules, including projection heads, query adapters, cross-attention modules, tokenization strategies, and representation converters.
  • Optimize model architecture, latency, memory footprint, and compute cost for automotive deployment.
  • Apply distillation, quantization, pruning, sparse computation, and efficient attention methods where appropriate.
  • Collaborate with chip, compiler, runtime, and deployment teams to adapt multimodal models to in-house automotive AI hardware.

Research, Evaluation, and Iteration

  • Track the latest research in VLM, VLA, World Models, BEV perception, E2E driving, robotics foundation models, generative simulation, and multimodal learning.
  • Design evaluation metrics for reasoning quality, grounding accuracy, temporal consistency, prediction quality, planning relevance, and safety-critical scenarios.
  • Perform systematic failure analysis and drive data/model iteration based on real-world autonomous driving cases.
  • Contribute to patents, technical reports, internal research platforms, and conference or journal publications.

Qualifications

  • MS or PhD in Computer Science, Electrical Engineering, Robotics, Artificial Intelligence, or a related field.
  • Strong background in deep learning, computer vision, multimodal learning, robotics, or autonomous driving.
  • Hands-on experience in one or more of the following areas:
  • Vision-Language Models, multimodal large models, or open-source VLM adaptation
  • Vision-Language-Action models, robotics foundation models, or action-conditioned modeling
  • World models, generative prediction, latent dynamics modeling, or future scene simulation
  • BEV perception, multi-view 3D perception, or end-to-end autonomous driving
  • Motion prediction, planning, trajectory generation, or closed-loop evaluation
  • Practical experience with open-source multimodal architectures such as LLaVA, Qwen-VL, InternVL, MiniCPM-V, OpenVLA, BLIP-style models, Flamingo-style models, or similar systems.
  • Solid understanding of multimodal feature alignment, including vision-language alignment, cross-modal attention, visual tokenization, projection layers, query-based fusion, or embedding-space alignment.
  • Experience with efficient fine-tuning or adaptation methods, such as LoRA, QLoRA, Adapter, Prompt Tuning, Prefix Tuning, supervised fine-tuning, or instruction tuning.
  • Proficient in PyTorch and capable of modifying, training, debugging, and evaluating deep learning models.
  • Familiar with transformer architectures, attention mechanisms, temporal modeling, and large-scale training.
  • Experience with multimodal data, such as camera, radar, LiDAR, IMU, map, trajectory, language, or structured driving data.
  • Strong engineering ability in Python; C /CUDA/TensorRT experience is a plus.
  • Comfortable with Git, Docker, Linux, distributed training, and collaborative development workflows.
  • Strong communication skills and ability to work across perception, planning, data, simulation, and deployment teams.

Preferred Qualifications

  • Experience adapting or fine-tuning VLM/VLA models such as LLaVA, Qwen-VL, InternVL, MiniCPM-V, OpenVLA, or similar architectures.
  • Experience with Hugging Face Transformers, PEFT, DeepSpeed, FSDP, vLLM, SGLang, TensorRT-LLM, or similar training/inference frameworks.
  • Experience building multimodal instruction datasets, driving-scene QA datasets, grounding datasets, scene-reasoning datasets, or planner-oriented supervision signals.
  • Experience aligning multimodal model representations with BEV features, object queries, lane instances, occupancy grids, map vectors, trajectories, or planner inputs.
  • Experience with autonomous driving architectures such as BEVFormer, DETR/DINO, MapTR/MapQR, occupancy networks, diffusion planners, trajectory transformers, or similar models.
  • Experience with world models, generative models, video prediction, future BEV prediction, occupancy forecasting, learned simulation, or closed-loop evaluation.
  • Experience with efficient adaptation of large models, including LoRA/QLoRA, distillation, quantization, pruning, sparse attention, or lightweight adapter design.
  • Experience deploying deep learning models on automotive SoCs, ASICs, GPUs, or edge AI accelerators.
  • Publications or strong project experience in CVPR, ICCV, ECCV, NeurIPS, ICLR, ICML, CoRL, ICRA, IROS, RSS, or related autonomous driving and robotics venues.
  • Strong ability to convert research ideas into robust production systems.
  • Experience with AI agent tools and basic harness engineering, including building evaluation scripts, task runners, automated workflows, tool-use pipelines, and reproducible testing environments for model or agent development.

Salary.com Estimation for Perception Algorithm Engineer in San Jose, CA
$113,958 to $144,906
If your compensation planning software is too rigid to deploy winning incentive strategies, it’s time to find an adaptable solution. Compensation Planning
Enhance your organization's compensation strategy with salary data sets that HR and team managers can use to pay your staff right. Surveys & Data Sets

What is the career path for a Perception Algorithm Engineer?

Sign up to receive alerts about other jobs on the Perception Algorithm Engineer career path by checking the boxes next to the positions that interest you.
Income Estimation: 
$101,387 - $124,118
Income Estimation: 
$119,030 - $151,900
Income Estimation: 
$82,813 - $108,410
Income Estimation: 
$120,989 - $162,093
Income Estimation: 
$74,806 - $91,633
Income Estimation: 
$71,928 - $87,026
Income Estimation: 
$145,337 - $174,569
Employees: Get a Salary Increase
View Core, Job Family, and Industry Job Skills and Competency Data for more than 15,000 Job Titles Skills Library

Job openings at Black Sesame Technologies Inc

  • Black Sesame Technologies Inc San Jose, CA
  • About Black Sesame Technologies Founded in July 2016, Black Sesame Technologies is an AI digital imaging technology firm that creates solutions for real-wo... more
  • 3 Days Ago

  • Black Sesame Technologies Inc San Jose, CA
  • Engineer, AI Framework Software (ML Accelerator Compiler) About the Role We are looking for a motivated compiler engineer to join our AI compiler and toolc... more
  • 3 Days Ago

  • Black Sesame Technologies Inc San Jose, CA
  • About the Role: We are seeking a Corporate Accountant to join our team. The ideal candidate will be responsible for recording financial transactions, ensur... more
  • 3 Days Ago


Not the job you're looking for? Here are some other Perception Algorithm Engineer jobs in the San Jose, CA area that may be a better fit.

  • ByteDance San Jose, CA
  • Responsibilities As a technology brand with independent innovation and R&D capabilities, PICO is committed to becoming a leading XR platform, helping devel... more
  • 9 Days Ago

  • KLA Milpitas, CA
  • Company Overview KLA is a global leader in diversified electronics for the semiconductor manufacturing ecosystem. Virtually every electronic device in the ... more
  • 19 Days Ago

AI Assistant is available now!

Feel free to start your new journey!