What are the responsibilities and job description for the Observability Engineer position at Elite IT Solutions Inc?
Hi,
Please find the job description below
Job Title: Observability Engineer
Location: Westlake Village, CA (Hybrid)
Job Description
We’re looking for an experienced, forward-thinking Observability Engineer to expand our Observability team in Core Services Engineering, and strengthen our observability capabilities across Pennymac environments. In this role, you will be responsible for designing, implementing, and maintaining our observability platform, with a strong focus on New Relic. You will leverage your expertise in Infrastructure as Code (IaC) to automate and manage our monitoring and alerting infrastructure, ensuring our systems are reliable, performant, and transparent.
You will work closely with Core Services, DevOps, Development and Operation teams to foster a culture of proactive monitoring and data-driven decision-making. If you’re passionate about automation, cloud-native patterns, and making systems run smarter and safer, we want to hear from you.
Duties / Responsibilities
The Sr Observability Engineer will:
Design & Implement Observability Solutions
- Architect, build, and scale comprehensive monitoring solutions using the New Relic platform, including APM, Infrastructure, Logs, Synthetics, and custom instrumentation (NRQL).
Automate with IaC
- Develop, manage, and maintain observability configurations—including alerts, dashboards, and synthetic checks—using Infrastructure as Code (IaC) tools such as Terraform/OpenTofu.
Develop Dashboards & Alerts
- Create and refine insightful dashboards and actionable alerting policies in New Relic to provide real-time visibility into infrastructure and application health.
Promote Best Practices
- Act as a subject matter expert on observability, guiding teams on best practices for logging, metrics, and tracing to improve system reliability and reduce mean time to resolution (MTTR).
Troubleshoot & Optimize
- Analyze performance data and telemetry to identify bottlenecks, troubleshoot production issues, and drive performance optimization efforts across the stack.
Required Skills & Experience
- Bachelor’s degree in Computer Science, Engineering, or a related field (or equivalent experience).
- 7 years of experience in a Cloud Engineering role (Observability, DevOps, SRE, etc).
- Proven New Relic expertise: 3 years of hands-on experience with the New Relic platform (Dashboards, NRQL, APM, alerting).
- Infrastructure-as-Code (IaC): 3 years with Terraform / OpenTofu (preferred), AWS CDK, CloudFormation, Chef, or Ansible.
- Cloud experience: Strong hands-on work with AWS (preferred), GCP, or Azure.
- Scripting: Python, Go, or Bash for automation.
- Systems knowledge: Cloud architecture, networking, Windows/Linux servers, microservices (SaaS), containerization (Docker, Kubernetes), CI/CD.
- Security & Compliance: Deep understanding of best practices and regulated environments.
- Excellent problem-solving, troubleshooting, communication, and collaboration skills.