What are the responsibilities and job description for the Hybrid || SRE with Finance || Austin TX position at Techridge, Inc.?
Hello
Position :SRE
Location : Austin, TX
Employment Type : Contract
- We are looking for a Site Reliability Engineer (SRE) with a strong application engineering background to improve application reliability, observability, and incident resolution across a complex enterprise landscape.
- This role will focus on understanding application behavior, diagnosing performance issues, and reducing Mean Time to Resolution (MTTR), rather than solely managing infrastructure or CI/CD pipelines.
Key Responsibilities
Application Reliability & Issue Resolution
- Analyze and troubleshoot application failures, latency issues, and degraded performance across distributed systems
- Perform deep-dive root cause analysis (RCA) to identify underlying application-level issues
- Work with engineering teams to quickly isolate failing components and dependencies
- Reduce MTTR (Mean Time to Resolution) through improved diagnostics and runbooks
Application Observability & Diagnostics
- Assess current application landscape and identify gaps in logging, tracing, and monitoring
- Implement and enhance application-level observability (logs, metrics, traces)
- Enable faster issue identification by improving service visibility and dependency mapping
- Define and standardize health checks and alerting strategies for applications
System Understanding & Mapping
- Develop a clear understanding of application architecture, data flows, and service dependencies
- Build and maintain application topology and dependency maps
- Identify single points of failure and performance bottlenecks
Performance Engineering
- Analyze application performance and recommend improvements for scalability and responsiveness
- Identify issues related to threading, memory, database interactions, and API latency
- Work with developers to optimize code paths, queries, and service interactions
Incident Management & Process Improvement
- Lead or support incident triage and war-room calls
- Improve incident response processes and escalation paths
- Create and maintain runbooks, playbooks, and troubleshooting guides
- Identify recurring issues and drive permanent fixes vs temporary patches
Collaboration & Engineering Enablement
- Partner with application development teams to embed reliability best practices
- Provide guidance on error handling, resiliency patterns, and fault tolerance
- Enable teams with tools and practices for self-service diagnostics
Required Skills & Experience
- 5 10 years of experience in application engineering, production support, or SRE roles
- Strong experience in application troubleshooting and debugging (Java/.NET/Node.js preferred)
- Solid understanding of distributed systems and microservices architectures
- Experience with application logs, debugging tools, and performance profiling
- Familiarity with observability tools (Splunk, Dynatrace, AppDynamics, Datadog, etc.)
- Strong understanding of API behavior, database interactions, and system integrations
- Experience working in production support / incident management environments
Preferred Skills
- Experience implementing distributed tracing (Open Telemetry, Jaeger, Zipkin)
- Knowledge of cloud environments (AWS/Azure/Google Cloud Platform)
- Exposure to resiliency patterns (circuit breakers, retries, fallbacks)
- Experience with performance tuning and load analysis
Thanks and regards
Sonali Silswal
Email: Sonali.silswal
Contact :
Address: 2591 Dallas Pkwy, Ste 300, Frisco, TX 75034
Website:
Salary : $30 - $70