What are the responsibilities and job description for the Senior Site Reliability Engineer position at Themesoft Inc.?

POSITION DETAILS:

Role: Cloud SRE

Location: Charlotte, NC Irving, TX Chandler, AZ

Duration : 12 Months (Extension Converts or Direct Hire)

Hybrid

Pay rate: $70/hr on W2 Benefits

Seeking a senior engineer for L2/L3 application middleware production support with an SRE mindset (shift from reactive to proactive reliability) across VM and container-adjacent/OpenShift (OCP) environments. Role owns incident response, problem management, and runbook-driven ops, and drives observability, automation/IaC, compliance guardrails, and CI/CD-integrated operational automation to reduce toil and improve stability/MTTR.

Core responsibilities: L2/L3 escalation recovery; reliability signals & alert quality; blameless post-incident learning; logs/metrics/traces/dashboards actionable alerting; IaC/config-as-code; standardized automation (status/start/stop/restart); intelligent automation/AI-assisted ops with guardrails; drift/compliance checks remediation; CI/CD integration; runbooks & operational documentation.

seeking a Senior Systems Operations Engineer in technology as part of Consumer Lending Operations Technology. This role is focused on application and middleware production support with a Site Reliability Engineering (SRE) mindset—shifting from reactive operations to proactive reliability engineering through strong observability, automation, and continuous improvement. The position supports mission critical platforms across VM-based and container-adjacent environments, including OpenShift (OCP), and partners closely with application, middleware, infrastructure, network, and security teams to improve stability, reduce toil, and strengthen operational readiness. This includes hands-on ownership of incident response, problem management, and runbook-driven operations, while building automation and standardized patterns that make platform operations repeatable, auditable, and resilient. In this role, you will:

• Provide senior-level application and middleware support for complex, high-availability services; act as an escalation point for L2/L3 incidents; lead disciplined troubleshooting, recovery, and stabilization.

• Embed SRE practices into day-to-day operations: define reliability signals, improve alert quality, drive blameless post-incident learning, and prioritize systemic fixes and toil reduction.

• Implement and continuously improve observability across applications and middleware (logs, metrics, traces, dashboards, and actionable alerting) to improve detection, diagnosis, and MTTR.

• Design, develop, and maintain infrastructure-as-code and configuration-as-code capabilities supporting VM-based and container-adjacent workloads, including OpenShift (OCP) enablement.

• Build and support automation for operational actions across middleware components (standardized status checks, start/stop/restart patterns) to enable safer self-service and reduce dependency bottlenecks.

• Design and implement intelligent automation for platform and middleware operations, including integrating AI/agent-based approaches into workflows where appropriate (triage assistance, predictive signals, and automated remediation guardrails).

• Monitor configuration drift; support automated compliance checks; implement remediation patterns aligned to enterprise change management, security, and risk controls.

• Integrate infrastructure and operational automation with CI/CD pipelines to enable repeatable, auditable deployments and safer rollouts.

• Support core platform components that enable applications and container platforms, including ingress patterns, load balancing integration, and shared supporting services.

• Develop and maintain runbooks, operational documentation, and validation/testing approaches for automation and platform procedures to ensure operational readiness and consistent execution.

Required Qualifications

• 4 years of Systems Engineering or Technology Infrastructure/Operations Engineering experience, or equivalent demonstrated through work experience, training, military experience, or education. Desired Qualifications

• 4 years of application and/or middleware production support in complex, high-availability environments, including incident response and problem management with strong root cause discipline.

• 4 years of hands-on automation and configuration management experience (Ansible preferred or similar), plus strong scripting skills (Python, Bash, PowerShell, or similar).

• 4 years of Linux administration (RHEL preferred) and/or Windows Server administration supporting enterprise production workloads.

• 4 years of Git-based version control practices, including pull requests and peer review, with a focus on repeatability and code quality.

• Working experience with infrastructure-as-code concepts, including modular design and environment consistency.

• Experience supporting hybrid/private cloud platforms and container-adjacent hosting models; familiarity with OpenShift (OCP) or Kubernetes-based platforms.

• Experience implementing SRE operating practices (reliability metrics, reduction of manual toil, continuous improvement via post-incident learnings).

• Experience supporting common middleware platforms and shared services; ability to build automation patterns that standardize operational actions and reduce manual intervention.

• Familiarity with enterprise observability and operational support practices (service health dashboards, alert engineering, actionable telemetry).

• Exposure to responsible AI usage in operations (security, validation, accuracy, and appropriate guardrails for automation/agents).

• Strong cross-functional communication skills; experience operating in regulated environments.

Job Expectations

• Deliver assigned operational engineering and automation outcomes with a strong focus on stability, resiliency, and measurable toil reduction.

• Participate in on-call rotations and operational support coverage as required.

• Follow enterprise change management, risk, and compliance processes.

• Continuously improve platform reliability and automation maturity through standardization, documentation, and repeatable delivery.

• This position offers a hybrid work schedule.

• This position is not eligible for Visa sponsorship.

• Relocation assistance is not available for this position.

• Flexibility to work in a 24/7 environment, including weekends and holidays.

• Flexibility to frequently be on call beyond normal working hours.

Thanks,

Yuvi

Senior Technical Recruiter

Email: yuvi@themesoft.com

Web: www.themesoft.com

_______________________

Salary : $70

Apply for this job

Receive alerts for other Senior Site Reliability Engineer job openings

Senior Site Reliability Engineer

What are the responsibilities and job description for the Senior Site Reliability Engineer position at Themesoft Inc.?

What is the career path for a Senior Site Reliability Engineer?

Job openings at Themesoft Inc.

Not the job you're looking for? Here are some other Senior Site Reliability Engineer jobs in the Irving, TX area that may be a better fit.

We don't have any other Senior Site Reliability Engineer jobs in the Irving, TX area right now.

AI Assistant is available now!