What are the responsibilities and job description for the Staff Perception Engineer (US Based) position at Andromeda Robotics?
Location: San Francisco, CA Position: Full-time 5 days in the office.About Us:Andromeda Robotics is an ambitious social robotics company with offices in Melbourne and San Francisco, dedicated to creating robots that seamlessly and intelligently interact with the human world. Our first robot, Abi, is a testament to this vision—a custom-built platform designed from the ground up to be a helpful aid and intuitive partner in aged care homes. We are a passionate, collaborative team of engineers who solve some of the most challenging problems in AI and robotics.Our Values:Deeply empathetic - Kindness and compassion are at the heart of everything we do.Purposely playful - Play sharpens focus. It keeps us curious, fast and obsessed with the craft.Relentlessly striving - With relentless ambition, an action bias and constant curiosity, we don’t settle.Strong when it counts - Tenacious under pressure, we expect problems and stay in motion to adapt and progress.United in action - Different minds. Shared mission. No passengers.The Role:This role balances architectural leadership with strategic, hands-on work. You will lead the design and implementation of Abi’s perception stack, translating raw sensor data into semantic understanding that supports autonomous navigation and conversational AI. This dual challenge demands a systems thinker who can architect scalable perception pipelines and also implement foundational components to prove the architecture’s effectiveness. You will bridge multiple domains—sensor fusion, computer vision, audio processing, and ML deployment—while optimising for real-time performance on our embedded Jetson Orin AGX platform.The Team:You'll be the founding perception hire, collaborating deeply with autonomy, conversational AI, gestures, controls, audio engineering, and ML teams. You'll work with product owners to translate user needs into technical requirements, and with the broader engineering team to ensure your perception outputs enable downstream systems to flourish.Key Responsibilities:Architect the Perception Stack: Design and own the full system from raw sensors through semantic understanding, including interface contracts, compute and latency budgeting, and robust architectural decisions.Lead Cross-Functional Design: Facilitate design workshops with autonomy, conversational AI, audio engineering, ML, and hardware teams to align resources and interface definitions.Implement Strategically: Develop core perception capabilities like face recognition, speaker diarisation, person detection and tracking, and sensor fusion to validate and instantiate the architecture.Own Production Systems: Build with production-readiness in mind, including graceful degradation, monitoring, debugging tools, and deployment pipelines.Make Technical Decisions: Drive build vs. buy, algorithm choices, and embedded deployment optimisations, focusing on performance and resource constraints.Scale the Team: Support recruitment and mentoring as the perception team grows.What You'll Build:Scene Awareness Pipeline: Architect the unified perception system that serves navigation, gesture, and conversational needs.Person Detection & Tracking: Implement robust multi-person tracking using multiple cameras, LiDAR point clouds, and IMU data for both navigation obstacle avoidance and social scene understandingAudio-Visual Speaker Localisation: Fuse microphone arrays, face detection, gaze tracking, and skeleton pose to determine who is speaking and where Abi should direct her attentionSensor Fusion Architecture: Design the common data structures and synchronisation primitives that enable downstream systems to consume perception outputs efficiently, optimised to run on Jetson Orin AGX within strict latency budgetsYour work will enable other engineering teams to integrate their subsystems: Audio engineering teams to conduct audio signal processingMachine learning teams to develop specialist models, such as emotional sentiment prediction models, and face recognition that enable personalised memory recallHardware teams to readily change sensors as we upgrade Abi Your First 90 DaysMonth 1: Architect the Perception Stack. Lead design workshops with autonomy, conversational AI, audio engineering, and ML teams to define the complete perception architecture. Establish interface contracts between perception and downstream consumers. Allocate compute and latency budgets across the Jetson Orin platform. Define data schemas, synchronisation primitives, and failure mode handling. Deliver the architectural blueprint that guides all perception work.Month 2: Validate with Face Recognition & Speaker Diarisation. Implement face recognition and speaker diarisation systems to prove out the architecture. Use these as forcing functions to stress-test your interface definitions, budget allocations, and integration patterns. Iterate the architecture based on real deployment constraints.Month 3: Establish Production Infrastructure. Build monitoring, testing frameworks, and deployment pipelines. Codify the architectural patterns and interface contracts. Prepare the foundation for the team to scale perception capabilities independently.RequirementsIdeally, You Have:Bachelor’s or Master’s degree in Computer Science, Robotics, Electrical Engineering, or related field5-7 years of experience building and shipping perception systems for robotics or autonomous vehiclesDeep expertise in computer vision (object detection, tracking, 3D reconstruction, camera calibration)Strong fundamentals in sensor fusion (Kalman filtering, probabilistic estimation, multi-modal integration)Real-time embedded systems experience (CUDA, TensorRT, ROS2)Proven architectural skills with hands-on coding ability in C and PythonPragmatic mindset balancing off-the-shelf and custom solutionsBonus Points If You Have:Audio processing background, including beamforming and source localisationExperience with face recognition systems and liveness detectionExpertise with depth sensors such as stereo vision, structured light, or LiDARSkills in human pose estimation and social gaze prediction mechanismsExperience optimising ML models for edge deploymentA PhD in relevant fields (computer vision, robotics, signal processing)Interview Process:1st Round Interview: Recruiter phone screen to discuss background and fitTake-home assessment to architect the system and perform coding tasks involving data sets to fuse audio and visual inputs2nd Round Interview: Technical interview focusing on perception engineering concepts, system design, and implementation challengesFinal Round Interview: Cultural interview to ensure alignment with Andromeda Robotics' values and team fitBenefitsBenefits:The salary for this position may vary depending on factors such as job-related knowledge, skills, and experience. The total compensation package may also include additional benefits or components based on the specific role. Details will be provided if an employment offer is made.If you’re excited about this role but don’t meet every requirement, that’s okay—we encourage you to apply. At Andromeda Robotics, we celebrate diversity and are committed to creating an inclusive environment for all employees. Let’s build the future together.We are an equal opportunity employer and value diversity at our company. We do not discriminate on the basis of race, religion, colour, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.Please note: At this time, we are generally not offering visa sponsorship for this role.