What are the responsibilities and job description for the Production Support Engineer position at The Planet Group?
The Planet Group is looking for a Staff Operational Support Engineer (L2) to join our well-known Fortune 500 client on a 6-months contract, working out of Atlanta, GA office.
Staff Operational Support Engineer (L2) | Qualifications:
- 5 years of relevant experience in operational, support, or similar customer-facing roles.
- Strong experience supporting production video streaming platforms, OTT services, live systems.
- Solid troubleshooting skills across distributed systems (APIs, microservices, cloud infrastructure)
- Familiarity with HLS, DASH, CMAF, WebRTC, DRM and CDN architectures
- Experience working with monitoring, alerting, and logs to diagnose live incidents (Grafana, Kibana/ELK, Prometheus, Loki)
- Correlate backend streaming metrics, player telemetry, and CDN signals to diagnose live customer issues end-to-end.
- Comfort performing controlled changes in production environments
- Working knowledge of incident management and on-call operations
Pre-Event Planning & Operational Readiness
- Participate in pre-event readiness planning for critical customer events
- Validate system readiness through:
- Runbook checks
- Monitoring coverage validation
- Risk identification and mitigation planning
- Define and rehearse incident response strategies for high-risk scenarios
- Collaborate with customers and internal teams to ensure smooth event execution
On-Call & 24/7 Operations
- Participate in a 24/7 on-call rotation, including nights, weekends, and holidays, as part of a global support model
- Ensure smooth handovers between shifts and regions
- Respond to critical alerts within defined SLAs for stream health, player errors, and delivery infrastructure
- Root Cause & Continuous Improvement
- Perform or contribute to root cause analysis (RCA) for production incidents
- Document findings, corrective actions, and preventive measures
- Identify recurring issues and work with Engineering and Product teams to eliminate them permanently
- Contribute to and improve runbooks, operational playbooks, and knowledge bases for all OptiView products (Player, ads, live and real time streaming)
- Collaboration & Engineering Feedback Loop
- Work closely with Engineering teams to escalate defects, validate fixes, and support production deployments
- Provide feedback on system observability, tooling gaps, and operational risks
- Act as the operational voice during post-incident reviews
Salary : $70 - $76