What are the responsibilities and job description for the Sr. Observability Engineer position at MphasiS Corporation USA?
Job Title: Sr. Observability Engineer
Location: Charlotte, NC Irving, TX, Iselin, NJ
Employment Type: Full-Time
About this Role
An observability engineer designs, implements, and maintains systems to monitor, analyze, and report on the health and performance of software applications and infrastructure, ensuring high availability, performance, and security. They are crucial in understanding complex IT systems and proactively addressing potential issues
In this Role, You Will:
- Designing and Implementing Observability Pipelines: Observability engineers create robust pipelines to collect, aggregate, and analyze data from various sources.
- Monitoring and ing: They establish monitoring systems and s to detect anomalies and performance issues in real-time.
- Metric & Instrumentation Standards: Defining common metric standards for every stage of the Application Lifecycle process and Instrumentation standards and scripting including OTel standards alignment
- Data Analysis and Visualization: They analyze telemetry data (logs, metrics, traces) to gain insights into system behavior and identify trends.
- Incident Response: They investigate and troubleshoot incidents, using observability data to understand the root cause and implement solutions.
- Collaboration and Communication: They collaborate with development, SRE, and other teams to ensure observability practices are integrated into workflows and to share insights.
- Staying Up-to-Date: They stay current with the latest trends in observability, logging, monitoring, and cloud technologies.
- Documentation and Knowledge Sharing: They create comprehensive documentation for observability systems and processes and share knowledge with other teams.
Skills and Knowledge:
- Strong understanding of distributed systems: They need to understand the complexities of modern architectures, including microservices, cloud-native environments, and hybrid infrastructure.
- Proficiency in observability tools: They are familiar with tools for logging, metrics, and tracing, such as ELK Stack, Prometheus, Grafana, and distributed tracing systems.
- Data analysis and visualization skills: They can analyze telemetry data to identify trends and patterns and create visualizations to communicate insights.
- Scripting and automation: They can automate tasks and create scripts to manage observability infrastructure.
- Problem-solving skills: They can diagnose and troubleshoot system issues using observability data.
- Communication skills: They can effectively communicate technical information to both technical and non-technical audiences.
- Experience with cloud platforms: They have experience with cloud platforms like AWS, Azure, and Google Cloud Platform.
- Understanding of IT service management practices: They understand IT service management practices like change management, release management, incident management, and problem management.
Required Qualifications:
- Demonstrated experience in Observability monitor, analyze, and report on the health and performance of software applications and infrastructure .
Desired Qualifications:
- 8 years of experiencein observability, monitoring, and reliability engineering across largescale enterprise or cloudnative environments.
- Strongexpertise in observability tools and platformssuch as Prometheus, Grafana, ELK/OpenSearch, Splunk, Dynatrace, AppDynamics, or equivalent.
- Handson experience designing and implementing observability pipelinesfor logs, metrics, and traces in distributed systems.
- Deepunderstanding of OpenTelemetry (OTel), including instrumentation standards, collectors, exporters, and vendorneutral telemetry architectures.
- Stronganalytical and troubleshooting skills, using telemetry data for incident investigation, rootcause analysis, and performance optimization.
- Proficiencyin scripting and automation(Python, Go, Bash/PowerShell) with strong collaboration skills to work across Dev, SRE, and Platform teams.
Work Environment & Benefits:
- Hybrid Work Model: Combination of on-site and remote work, depending on business needs.
- Collaborative Culture: Work closely with cross-functional teams, vendors, and senior leadership.
- Professional Development: Access to training programs, certifications, and career advancement opportunities.
- Global Impact: Support a mission-critical network infrastructure serving millions of customers worldwide.
Salary : $80,000 - $100,000