What are the responsibilities and job description for the Site Reliability Engineering (SRE) Lead position at Gotham Technology Group?

SRE Lead

Hybrid – 2- 3 days onsite, Contract to hire role

Our direct client is seeking an SRE Lead to help define and scale observability and reliability capabilities across an enterprise environment.

This is a hands-on leadership role where you will shape observability strategy, build scalable monitoring solutions, and drive adoption across infrastructure, application, and platform teams.

Some key highlights:

Opportunity to drive a critical reliability and observability function across the organization
High visibility role with influence across engineering and architecture teams
Strong investment in cloud, platform engineering, and modernization initiatives
Competitive compensation and benefits

What You’ll Do

Lead the design and implementation of observability and monitoring solutions across cloud, on-prem, and hybrid environments
Define and drive standards and best practices for reliability, monitoring, and telemetry
Build and scale telemetry pipelines including metrics, logs, and traces
Implement modern observability frameworks
Partner with infrastructure, application, security, and data teams to embed observability into system design
Establish governance around telemetry lifecycle including data retention, granularity, and cost optimization
Evaluate, implement, and optimize tools such as Prometheus, Grafana, ELK, and Azure Monitor
Act as a technical leader, influencing architecture decisions and driving adoption across engineering teams

Qualifications

Experience in SRE, observability, infrastructure engineering, and/or DevOps environments
Strong hands-on experience with observability and monitoring tools such as Prometheus, Grafana, ELK stack, and Azure Monitor
Experience with OpenTelemetry, eBPF, and modern telemetry standards
Proven experience building or improving observability platforms in enterprise environments
Strong understanding of cloud platforms (Azure preferred), networking, and distributed systems
Experience with infrastructure-as-code tools such as Terraform or Ansible and CI/CD pipelines
Exposure to Kubernetes or other containerized environments
Strong architectural and problem-solving skills with the ability to design scalable solutions
Excellent communication skills and ability to work across multiple teams and stakeholders

Nice to Have

Experience with tools such as SolarWinds, OpsRamp, or ExtraHop
Experience in large-scale or regulated environments
Prior experience leading initiatives or mentoring engineers

Apply for this job

Receive alerts for other Site Reliability Engineering (SRE) Lead job openings

Site Reliability Engineering (SRE) Lead

What are the responsibilities and job description for the Site Reliability Engineering (SRE) Lead position at Gotham Technology Group?

What is the career path for a Site Reliability Engineering (SRE) Lead?

Job openings at Gotham Technology Group

Not the job you're looking for? Here are some other Site Reliability Engineering (SRE) Lead jobs in the York, NY area that may be a better fit.

We don't have any other Site Reliability Engineering (SRE) Lead jobs in the York, NY area right now.

AI Assistant is available now!