What are the responsibilities and job description for the Senior System Engineer position at HailTrace?
HailTrace’s mission is to simplify weather by fostering innovation, passion, and excellence. We embrace taking risks to improve our technology and products for the industries we serve. Our vision is to be a trusted, independent source for forensic weather information, positively impacting everyone we interact with. At HailTrace, our values of Curiosity, Excellence, Risk, and Respect drive us to continuously improve and make a difference.
Role Overview
The Senior Systems Engineer at HailTrace will help define and operate the core infrastructure foundations that keep our platforms secure, reliable, and scalable. This role is responsible for the design, organization, and operational health of our cloud systems, identity and access controls, networking, compute environments, storage, disaster recovery posture, and infrastructure standards.
This person will serve as a senior technical owner of the systems layer of the business. Rather than focusing primarily on application delivery pipelines, this role emphasizes building and maintaining dependable infrastructure that our engineering teams and products can safely run on. You will work closely with engineering leadership and software teams to ensure HailTrace’s systems architecture supports current operations and future growth.
Key Responsibilities
- Infrastructure Architecture: Design and evolve HailTrace’s core cloud and infrastructure architecture with a focus on reliability, security, maintainability, and cost-awareness.
- Cloud Systems Management: Own and improve the health of our cloud environments, including compute, networking, storage, secrets management, identity, and access patterns.
- Reliability and Availability: Ensure systems are designed for high availability and resilience, with strong operational practices around redundancy, backup, recovery, and failure planning.
- Identity and Access Management: Establish and maintain secure, well-governed access controls across infrastructure, services, and operational tooling.
- Security Hardening: Drive infrastructure security best practices, including system hardening, secrets handling, vulnerability reduction, and secure configuration standards.
- Monitoring and Incident Readiness: Improve infrastructure visibility through monitoring, logging, alerting, and operational runbooks so that issues can be detected and addressed quickly.
- Operational Standards: Define and maintain standards for infrastructure configuration, environment consistency, change management, and system lifecycle management.
- Disaster Recovery and Business Continuity: Develop and maintain practical backup, restoration, and recovery approaches for critical systems and services.
- Cross-Functional Partnership: Partner with software engineers and technical leadership to ensure infrastructure decisions support application needs without sacrificing stability or security.
- Mentorship and Stewardship: Provide senior guidance on infrastructure and systems practices, helping raise the bar for operational discipline and technical decision-making across the organization.
Qualifications
- Experience: 5 years of experience in systems engineering, cloud infrastructure, platform operations, or a related infrastructure-focused role.
- Infrastructure Depth: Strong experience with cloud platforms such as GCP, AWS, or Azure, especially in the design and operation of production infrastructure.
- Systems Expertise: Strong understanding of networking, Linux systems, identity and access management, storage, backup and recovery, and infrastructure reliability.
- Automation Mindset: Comfortable using automation and infrastructure-as-code tools to improve consistency and reduce manual operational risk, while keeping the emphasis on sound systems design.
- Observability: Experience implementing and improving logging, monitoring, and alerting for infrastructure and platform health.
- Security Awareness: Strong understanding of infrastructure security best practices and operational safeguards.
- Problem Solving: Able to think clearly about systems tradeoffs, failure modes, scaling concerns, and long-term maintainability.
- Leadership: Demonstrated ability to take ownership of infrastructure domains, influence standards, and help guide technical direction.
- Communication: Able to communicate clearly with both engineers and non-technical stakeholders about infrastructure risks, priorities, and decisions.
Why Join HailTrace
- Be a key player in shaping the reliability and long-term health of the infrastructure that powers HailTrace.
- Help build secure, dependable systems that support meaningful products serving the restoration, insurance, and disaster recovery industries.
- Work with a team that values curiosity, excellence, thoughtful risk-taking, and respect.
- Competitive salary and benefits package.