What are the responsibilities and job description for the Senior Site Reliability Engineer position at Apptronik?
Apptronik is a human-centered robotics company developing AI-powered robots to support humanity in every facet of life. Our flagship humanoid robot, Apollo, is built to collaborate thoughtfully with people, starting with critical industries such as manufacturing and logistics, with future applications in healthcare, the home, and beyond.
We operate at the cutting edge of embodied AI, applying our expertise across the full robotics stack to solve some of society's most important problems. You will join a team dedicated to bringing Apollo to market at scale, tackling the complex challenges like safety, commercialization, and mass production to change the world for the better.
JOB SUMMARY
We are seeking an experienced Site Reliability Engineer to own and maintain the deployment of our cloud-based infrastructure to customer sites. In this role, you will work closely with our Applications Engineers, IT, and software teams to ensure the smooth deployment of our solution which collects training data and deploys models to real-time robotic systems. Providing a reliable framework will accelerate progress integrating Google DeepMind's Gemini Robotics Model to humanoid robot hardware.
ESSENTIAL DUTIES AND RESPONSIBILITIES or KEY ACCOUNTABILITIES
We operate at the cutting edge of embodied AI, applying our expertise across the full robotics stack to solve some of society's most important problems. You will join a team dedicated to bringing Apollo to market at scale, tackling the complex challenges like safety, commercialization, and mass production to change the world for the better.
JOB SUMMARY
We are seeking an experienced Site Reliability Engineer to own and maintain the deployment of our cloud-based infrastructure to customer sites. In this role, you will work closely with our Applications Engineers, IT, and software teams to ensure the smooth deployment of our solution which collects training data and deploys models to real-time robotic systems. Providing a reliable framework will accelerate progress integrating Google DeepMind's Gemini Robotics Model to humanoid robot hardware.
ESSENTIAL DUTIES AND RESPONSIBILITIES or KEY ACCOUNTABILITIES
- Partnering with customers and Applications Engineers to remove roadblocks to deployment success
- Writing and fixing Infrastructure as Code (Terraform, Helm, Ansible)
- Developing and maintaining code in Python / Typescript
- Troubleshooting networking, performance, and security challenges
- Collaborating with engineering and product teams to shape improvements
- Responding to outages, participating in on-call rotations and traveling occasionally to customer sites to support deployment and integration
- Strong communication skills, customer empathy and flexibility to adapt
- Hands-on Linux systems engineering and networking experience
- Proficiency with Infrastructure as Code tools (Terraform, Helm, Ansible)
- Development experience in Kubernetes / Python / Typescript / C
- Experience creating intuitive and high-utility dashboards and vizualizations (Grafana)
- Integrating real-time monitoring and alerting (eg PagerDuty)
- Takes initiative and seeks ownership of the end-to-end infrastructure solution
- Willingness to travel as needed for client support
- Bachelor's or Master's degree in Computer Science, Engineering, or a related technical field.
- Minimum of 5 years of professional, full-time experience building and maintaining reliable, scalable systems.
- Prolonged periods of sitting at a desk and working on a computer
- Must be able to lift 15 pounds at times
- Vision to read printed materials and a computer screen
- Hearing and speech to communicate
- This is a direct hire. Please, no outside Agency solicitations.