Demo

Platform Operations and Site Reliability Lead

eTelligent Group
Lanham, MD Full Time
POSTED ON 6/3/2026
AVAILABLE BEFORE 7/1/2026
Company Overview:

Over the past 15 years, eTel has delivered essential solutions for the federal government by securing and managing data, providing scalable identity access, modernizing legacy systems, and building high-performance platforms. By integrating new technologies and ensuring reliable operations we help agencies stay prepared for future challenges As a premier technology solutions and services company to the US federal government, eTel possesses longstanding relationships across the federal civilian marketplace. Other customers include the broader Treasury Department, Commerce Department, and State Department.

eTel offers integrated CMMI Level 3 processes, tools, and techniques with innovative, cost-efficient, and secure solutions to address complex challenges. eTel also holds ISO 9001:2015, ISO/IEC 27001:2013, and ISO/IEC 20000-1:2018 certifications, and offers dedicated subject matter experts (SMEs) and thought leaders that possess a deep understanding of customers' environments and challenges.

Place of Performance: Remote and/or IRS facilities in Lanham, MD; Martinsburg, WV; Memphis, TN; Washington, D.C.; Austin, TX; Dallas, TX.

Citizenship: US Citizen (MUST)

Security Clearance: Must be eligible to possess MBI (IRS Background Investigation) clearance. Active IRS MBI clearance is preferred.

Role Overview:

The Platform Operations and Site Reliability Lead is responsible for ensuring the reliability, availability, performance, scalability, and operational excellence of the Enterprise Data Platform. The Operations Lead oversees 24x7 platform operations, observability, incident response, disaster recovery, performance optimization, and AI enabled operational automation across AWS and Databricks environments.



Key Responsibilities:

  • Lead operations and maintenance activities supporting AWS cloud infrastructure and Databricks E2 services.
  • Manage observability frameworks including monitoring, logging, tracing, and alerting.
  • Implement Site Reliability Engineering practices including SLIs, SLOs, error budgets, and reliability metrics.
  • Coordinate incident response, root cause analysis, and service restoration activities.
  • Develop operational runbooks, playbooks, and automated remediation procedures.
  • Lead disaster recovery planning, testing, backup validation, and continuity activities.
  • Support AI driven operational intelligence and predictive monitoring capabilities.
  • Track and report service levels, uptime metrics, and operational performance indicators.



Minimum Qualifications:

  • Minimum 8 years managing enterprise production environments.
  • Minimum 5 years supporting AWS cloud operations.
  • Experience supporting Databricks, analytics platforms, or enterprise data environments.
  • Experience implementing enterprise monitoring, observability, and Site Reliability Engineering practices.



Preferred Certifications:

  • AWS Certified SysOps Administrator
  • AWS Solutions Architect Associate
  • Databricks Platform Administrator

Commitment to Diversity -

eTelligent Group provides equal employment opportunities (EEO) to all applicants without regard to race, color, religion, gender, sexual orientation, gender identity, nations origin, age, disability, genetic information, marital status, amnesty, status as a covered veteran, and any other characteristic provided in accordance with applicable, federal, state and local laws.

Salary.com Estimation for Platform Operations and Site Reliability Lead in Lanham, MD
$168,062 to $201,431
If your compensation planning software is too rigid to deploy winning incentive strategies, it’s time to find an adaptable solution. Compensation Planning
Enhance your organization's compensation strategy with salary data sets that HR and team managers can use to pay your staff right. Surveys & Data Sets

What is the career path for a Platform Operations and Site Reliability Lead?

Sign up to receive alerts about other jobs on the Platform Operations and Site Reliability Lead career path by checking the boxes next to the positions that interest you.
Income Estimation: 
$64,490 - $82,642
Income Estimation: 
$90,932 - $119,676
Employees: Get a Salary Increase
View Core, Job Family, and Industry Job Skills and Competency Data for more than 15,000 Job Titles Skills Library

Job openings at eTelligent Group

  • eTelligent Group Lanham, MD
  • Company Overview: Over the past 15 years, eTel has delivered essential solutions for the federal government by securing and managing data, providing scalab... more
  • 1 Day Ago

  • eTelligent Group Lanham, MD
  • Company Overview: Over the past 15 years, eTel has delivered essential solutions for the federal government by securing and managing data, providing scalab... more
  • 1 Day Ago

  • eTelligent Group Lanham, MD
  • Company Overview: Over the past 15 years, eTel has delivered essential solutions for the federal government by securing and managing data, providing scalab... more
  • 1 Day Ago

  • eTelligent Group Lanham, MD
  • Company Overview: Over the past 15 years, eTel has delivered essential solutions for the federal government by securing and managing data, providing scalab... more
  • 1 Day Ago


Not the job you're looking for? Here are some other Platform Operations and Site Reliability Lead jobs in the Lanham, MD area that may be a better fit.

  • Jobs via Dice Washington, DC
  • Description & Requirements Maximus is seeking a Site Reliability Engineer (SRE) & Operations Lead to serve as the primary onsite technical lead supporting ... more
  • 8 Days Ago

  • Jobs via Dice Vienna, VA
  • Overview Design, implement, and maintain infrastructure platforms for delivering applications. To build and manage scalable, reliable, and reusable platfor... more
  • 18 Days Ago

AI Assistant is available now!

Feel free to start your new journey!