What are the responsibilities and job description for the Senior Consultant - SRE Architect position at The Value Maximizer?
Position: Senior Consultant - SRE Architect (Observability & Transaction Reliability)Location: Austin, TXType: Full-Time
About the company:Incedo is a global AI and data transformation specialist empowering companies to realize sustainable business impact from their digital investments by delivering ROI from AI@Scale.
As a long-term partner for strategy to execution, we operate at the intersection of business and technology. Our integrated services and platforms are built on the foundation of AI & Data, digital engineering, and operations transformation, bringing deep domain expertise and full stack capabilities together.
With over 4,000 people in the US, Canada, Latin America and India and a large, diverse portfolio of Fortune 500 enterprises and fast-growing clients worldwide, we work across banking & payments, wealth management, telecom, hi-tech and life sciences.
Job OverviewWe are seeking a highly experienced Senior Consultant / SRE Architect to lead the strategy, design, and implementation of enterprise-wide observability and reliability frameworks supporting business-critical transaction flows across distributed systems.
In this role, you will act as a thought leader and architect, driving end-to-end transaction visibility, resilience, and performance optimization across microservices, APIs, databases, and third-party integrations. You will partner with engineering, architecture, and business stakeholders to define standards, influence technical direction, and implement scalable observability solutions.
This is a high-impact role focused on transforming SRE maturity, improving advisor experience, and enabling proactive, data-driven operations through modern observability practices. The ideal candidate is passionate about SRE, observability, and system design, with a proven ability to drive large-scale transformation initiatives.
Required Qualifications
About the company:Incedo is a global AI and data transformation specialist empowering companies to realize sustainable business impact from their digital investments by delivering ROI from AI@Scale.
As a long-term partner for strategy to execution, we operate at the intersection of business and technology. Our integrated services and platforms are built on the foundation of AI & Data, digital engineering, and operations transformation, bringing deep domain expertise and full stack capabilities together.
With over 4,000 people in the US, Canada, Latin America and India and a large, diverse portfolio of Fortune 500 enterprises and fast-growing clients worldwide, we work across banking & payments, wealth management, telecom, hi-tech and life sciences.
Job OverviewWe are seeking a highly experienced Senior Consultant / SRE Architect to lead the strategy, design, and implementation of enterprise-wide observability and reliability frameworks supporting business-critical transaction flows across distributed systems.
In this role, you will act as a thought leader and architect, driving end-to-end transaction visibility, resilience, and performance optimization across microservices, APIs, databases, and third-party integrations. You will partner with engineering, architecture, and business stakeholders to define standards, influence technical direction, and implement scalable observability solutions.
This is a high-impact role focused on transforming SRE maturity, improving advisor experience, and enabling proactive, data-driven operations through modern observability practices. The ideal candidate is passionate about SRE, observability, and system design, with a proven ability to drive large-scale transformation initiatives.
Required Qualifications
- 10 years of experience in SRE, Observability, or related roles, with a strong focus on architecture and strategy
- Deep hands-on expertise with observability platforms such as Dynatrace, ELK, Datadog, Splunk, OpenTelemetry, Jaeger
- Proven experience designing observability solutions in cloud environments (AWS, Azure, GCP)
- Strong understanding of microservices architecture, APIs, and distributed systems
- Proficiency in programming/scripting (e.g., Python, Go, Java) for automation and integration
- Demonstrated ability to lead cross-functional initiatives and influence technical direction
- Dynatrace Associate or Professional Certification
- Experience implementing OpenTelemetry standards at scale
- Strong background in chaos engineering and resiliency testing
- Familiarity with AIOps platforms and intelligent automation solutions
- Consulting experience or prior role as an architect / technical advisor
- Define and lead the enterprise observability strategy for end-to-end transaction traceability across distributed systems
- Architect scalable solutions leveraging tools such as Dynatrace, OpenTelemetry, ELK, Grafana, Datadog, Splunk, Jaeger
- Establish standardized frameworks for logging, metrics, tracing, and telemetry collection
- Design and implement dependency mapping and service topology visualization across complex ecosystems
- Provide architectural guidance for monitoring latency, throughput, and error rates across critical transaction paths
- Lead root cause analysis using distributed tracing and telemetry data to resolve systemic performance issues
- Partner with application and database teams to optimize system performance and scalability
- Drive adoption of performance engineering best practices across teams
- Define and implement resiliency strategies for business-critical transaction flows
- Architect fault-tolerant systems, including failover, redundancy, and self-healing mechanisms
- Lead and design chaos engineering initiatives to validate system resilience
- Establish and govern Service Level Objectives (SLOs) and Service Level Indicators (SLIs) aligned to business outcomes
- Act as a trusted advisor to engineering teams, architects, and leadership on observability and SRE best practices
- Define and enforce standards, policies, and governance models for monitoring and tracing
- Lead cross-functional initiatives to drive adoption of observability frameworks
- Mentor engineers and SRE teams, fostering a culture of continuous improvement and operational excellence
- Drive measurable improvements including:
- 30% reduction in MTTD and MTTR within the first year
- ≥70% root cause identification within 1 hour
- ≥90% proactive issue detection via monitoring systems
- Develop executive-level reporting on system health, reliability trends, and performance metrics
- Build reusable frameworks, accelerators, and playbooks for incident management and observability adoption
- Establish comprehensive documentation for transaction flows, system dependencies, and observability architectures
- Develop and standardize incident response playbooks and runbooks
- Lead training and enablement initiatives to scale observability expertise across teams