What are the responsibilities and job description for the Site Reliability Engineer (US Remote) position at Locus Robotics?
Company Description
Locus Robotics is a leader in the rapidly growing eCommerce order fulfillment optimization space. Our solution helps warehouse owners attain 2-3X efficiency improvement over cart-picking operations, by empowering pickers to work collaboratively with our robots. All this is accomplished while integrating with the operator’s Warehouse Management System, utilizing, and optimizing existing facility infrastructure.
We are seeking a Site Reliability Engineer(SRE) to empower our customers with a rich feature set, high availability, and stellar performance level to support warehouse picking operations as part of our Robots as a Service (RaaS) Warehouse Execution Solution. As we expand customer deployments, we’re seeking an experienced SRE to manage deployments and provide observability insights in real time. We are looking for someone who has an eye and interest in managing the growth of our fully integrated system that seamlessly orchestrates and manages all warehouse product movement needs.
Job Description
- Provide primary operational support and engineering for multiple large-scale Robots as a Service (RaaS) deployments
- Improve reliability, quality, and time-to-market of our Robots as Service (RaaS)
- Design, develop and maintain cloud-based and on-premises tools and infrastructure for warehouse execution system deployment, operation, and monitoring.
- Contribute to the design of a field-configurable and maintainable warehouse execution system.
- Run the production environments by enabling observability, monitoring availability and taking a holistic view of system health
- Build software and systems to manage platform infrastructure and applications
- Measure and optimize system performance, with an eye toward pushing our capabilities forward, getting ahead of customer needs, and innovating for continual improvement
- Gather and analyze metrics from operating systems as well as applications to assist in performance tuning and fault finding
- Partner with development teams to improve services through rigorous testing and release procedures
- Participate in system design consulting, platform management, and capacity planning
- Create sustainable systems and services through automation and uplifts
- Balance feature development speed and reliability with well-defined service-level objectives
- Apply integrated security practices and tooling to software build and deployment pipelines, adds practices and functions like scanning, threat intelligence, policy enforcement, static analysis, and compliance validation to the software development lifecycle
- Work cross-functionally within the engineering group on a variety of projects related to infrastructure automation.
Qualifications
- Bachelor’s degree (or equivalent) in computer science or related discipline
- Knowledge of best practices in development of infrastructure-as-code. Terraform experience is a plus.
- 8 years of development experience with a scripting language (Python, GO, JavaScript, Bash, PowerShell).
- Strong experience with leveraging cloud services to develop infrastructure.
- Familiarity with tools for the Windows and Linux ecosystem, including packaging and deployment.
- Experience with fault-tolerant serialized structured data, queue-based messaging patterns and high-performance Remote Procedure Call (RPC) frameworks.
- Experience with Network and Platform Security
- Experience with applying security and vulnerability scanning
- Experience with configuration management and provisioning tools.
- Possess data modeling and data structure design skills.
- Proactive approach to identifying problems, performance bottlenecks, and areas for improvements
Additional Information
Locus Robotics is an Equal Opportunity Employer