What are the responsibilities and job description for the Senior Operations / Reliability Engineer position at Team Red Dog?
Team Red Dog is hiring a Senior Operations / Reliability Engineer for our client, a global technology leader and enterprise software innovator. This role is ideal for engineers with strong experience in production operations, telemetry monitoring, incident response, software validation, and live service reliability who enjoy working in fast-moving engineering environments supporting both hardware and software systems. You’ll play a critical role in monitoring prototype device health, troubleshooting operational issues, validating releases, analyzing telemetry and logs, and improving overall system stability for a confidential next-generation device initiative. This opportunity offers hands-on exposure to live operations, release readiness, device support, cloud-connected services, and real-world reliability engineering challenges within a highly collaborative engineering organization.
Top Required Skills (Must Haves):
- Strong experience in software engineering, DevOps, SRE, or production operations environments, including monitoring live systems, troubleshooting operational issues, and supporting release stability initiatives.
- Hands-on experience with telemetry analysis, dashboards, alerts, logs, and operational monitoring tools to diagnose service, application, or device health issues in production or pre-production environments.
- Experience supporting software releases, deployment validation, incident response, and operational readiness across cloud, hybrid, or device-connected environments.
- Ability to independently investigate technical issues, summarize findings, communicate operational risks clearly, and collaborate cross-functionally with engineering, QA, PM, and infrastructure teams.
Opportunity Overview:
This role offers the opportunity to work on a highly confidential device initiative focused on operational reliability, telemetry monitoring, release validation, and prototype device stability. You’ll support live operations across software, services, and hardware environments while partnering closely with experienced engineers, infrastructure teams, and product stakeholders. The position combines hands-on troubleshooting, operational analysis, production monitoring, and release support in an engineering-focused environment where your work directly impacts product readiness and system reliability. Candidates who enjoy solving real-world operational problems, analyzing system behavior, and improving service health in fast-moving environments will find this role especially rewarding.
How you will make an impact:
• Monitor telemetry dashboards, alerts, logs, and operational metrics to assess service and device health.
• Investigate live operational issues, diagnose failures, and support troubleshooting efforts across services, applications, and prototype devices.
• Support software releases by validating deployments, monitoring system stability, and identifying post-release issues.
• Analyze telemetry trends and recurring failure patterns to improve operational visibility and reliability.
• Assist with incident response activities, including gathering logs, summarizing impact, and tracking mitigation efforts.
• Perform on-site troubleshooting and validation for prototype devices and test environments.
• Collaborate closely with software engineers, QA teams, PMs, and infrastructure partners to support operational readiness.
• Document incidents, release observations, operational procedures, known issues, and troubleshooting workflows.
• Recommend improvements to monitoring coverage, alert quality, release validation, and operational processes.
• Help improve service reliability, reduce operational toil, and enhance overall system stability.
The expertise you bring:
• Bachelor’s degree in Computer Science, Software Engineering, Computer Engineering, or related technical field, or equivalent practical experience.
• 5–7 years of experience in software engineering, DevOps, SRE, production operations, infrastructure, or related operational engineering roles.
• Experience monitoring live services, applications, cloud infrastructure, or connected device environments.
• Strong troubleshooting skills using telemetry, logs, dashboards, metrics, and operational monitoring tools.
• Experience supporting production deployments, release validation, CI/CD workflows, or operational readiness activities.
• Ability to diagnose technical issues, summarize findings clearly, and communicate effectively across technical and non-technical stakeholders.
• Experience documenting incidents, operational procedures, troubleshooting workflows, and known issues.
• Familiarity with cloud services, hybrid infrastructure, release management, and incident response practices.
• Ability to work independently while managing multiple operational priorities in fast-paced engineering environments.
• Experience supporting prototype hardware, mobile operating systems, or device-connected systems is highly valued.
What makes a candidate highly successful in this role:
Successful candidates combine strong operational troubleshooting skills with the ability to remain calm and methodical during live issue investigations and release events. They are comfortable analyzing telemetry, logs, alerts, and deployment signals to identify meaningful operational risks while clearly communicating findings to engineering and product teams. Candidates who thrive in this role are highly self-directed, detail-oriented, and collaborative, with the ability to balance hands-on troubleshooting, documentation, release validation, and cross-functional communication. Experience supporting Android-based systems, prototype hardware environments, or device-focused operational workflows will help candidates stand out.
Why Work with Team Red Dog?
At Team Red Dog, people are at the heart of everything we do. Our commitment to personalized service and our deep experience in matching talented professionals with meaningful roles at some of the world’s most inspiring companies is what sets us apart. We take the time to understand your unique skills, strengths, and passions—because we believe your career should reflect who you are.
Whether you're looking to grow, pivot, or simply find a place where your work truly matters, we offer opportunities that empower you to make a positive impact. With excellent benefits, a supportive team, and a role where you can thrive while doing what you love, we’re here to help you take the next step with confidence. Join us—and discover what it means to be genuinely valued in your career.
Generous benefits package for qualified employees includes:
• Health insurance (medical, dental, vision, and life)
• Employer-matched 401K plan
• Paid time off
• Paid holidays
Estimated Start Date:
Immediately
Location:
Redmond, WA (100% onsite)
Job #: 2515
Job Type and Estimated Duration:
W2 contract opportunity through 6/20/2027 - with additional extension possible (subject to performance, budget and client discretion.)
Rate:
Up to $7,400/month
Team Red Dog is committed to providing equal opportunities to everyone, regardless of race, ethnicity, gender, age, religion, sexual orientation, disability, or any other characteristic. If you need accommodation during the recruitment process, reach out to hr@teamreddog.com, and we will work to ensure an accessible experience. We strictly adhere to federal, state, and local laws to maintain a workplace free from discrimination and harassment.
We offer competitive compensation aligned with U.S. industry standards, and our final offer will reflect the candidate’s location, job-specific skills, experience, and knowledge.
• All applicants must be authorized to work in the U.S. without the need for sponsorship.
• Team Red Dog is an E-Verify employer.
• Employment is contingent upon the successful completion of a reference and background check.
• Please no solicitations from C2C or recruiting firms.