What are the responsibilities and job description for the Production Engineer - Entry Level to Mid-Level position at Hunter by HiringAgents.ai?
Job title: Production Engineer - Entry Level to Mid-Level Client: Hunter Scouts Location: Bellevue, Washington, United States - Hybrid (Remote considered for candidates outside 30 miles of an office) Contract type: Full-time Contract duration: Permanent Salary: $109,000 – $145,000 base salary per year
About The Role
Hunter Scouts is seeking a Production Engineer to join an exciting, innovative AI scaleup working on breakthrough cloud-based solutions for accelerated computing. This role is focused on maintaining and improving the reliability and stability of the cloud infrastructure and offers the opportunity to advance in the dynamic field of AI hyperscaling. If you’re looking for a challenging yet rewarding role where you can collaborate with industry leaders and work on cutting-edge technology, this is your chance.
Responsibilities
Represented by Hunter Scouts, this is a direct-hire opportunity offering growth, competitive compensation, and exciting work in cloud-based AI technology. Submit your application now to join this innovative team.
About The Role
Hunter Scouts is seeking a Production Engineer to join an exciting, innovative AI scaleup working on breakthrough cloud-based solutions for accelerated computing. This role is focused on maintaining and improving the reliability and stability of the cloud infrastructure and offers the opportunity to advance in the dynamic field of AI hyperscaling. If you’re looking for a challenging yet rewarding role where you can collaborate with industry leaders and work on cutting-edge technology, this is your chance.
Responsibilities
- Respond to service disruptions by assisting senior engineers in resolution efforts
- Document, analyze, and contribute to post-incident reviews and playbooks
- Monitor platform performance and detect system health issues using tools like Prometheus and Grafana
- Drive automation and process improvements to streamline operational efficiencies and system recovery
- Collaborate on reliability and disaster recovery advancements with cross-functional engineering teams
- Facilitate the creation and adherence to operational KPIs and SLAs
- Troubleshoot system issues and optimize workflows for long-term stability
- Must be located within commuting distance to Bellevue, WA (remote may be considered for candidates located more than 30 miles from a CoreWeave office)
- Must be authorized to work in the U.S. without current or future employer sponsorship and meet export control requirements to access restricted information
- 4 years of experience in cloud operations, site reliability engineering, or similar technical roles
- Proficiency in cloud platforms such as Kubernetes and AWS or GCP
- Familiarity with monitoring and alerting tools like Prometheus and Grafana
- Hands-on experience with scripting/automation tools (Python, Bash, Terraform, and/or Ansible)
- Exposure to Kubernetes and containerization technologies
- Knowledge of change management processes and post-incident analysis
- Experience with self-healing, automated infrastructure
- Passion for growth and learning in incident management and reliability engineering
Represented by Hunter Scouts, this is a direct-hire opportunity offering growth, competitive compensation, and exciting work in cloud-based AI technology. Submit your application now to join this innovative team.
Salary : $109,000 - $145,000