What are the responsibilities and job description for the Senior Site Reliability Engineer position at Themesoft Inc.?
POSITION DETAILS:
Role: Cloud SRE
Location: Charlotte, NC Irving, TX Chandler, AZ
Duration : 12 Months (Extension Converts or Direct Hire)
Hybrid
Pay rate: $70/hr on W2 Benefits
Seeking a senior engineer for L2/L3 application middleware production support with an SRE mindset (shift from reactive to proactive reliability) across VM and container-adjacent/OpenShift (OCP) environments. Role owns incident response, problem management, and runbook-driven ops, and drives observability, automation/IaC, compliance guardrails, and CI/CD-integrated operational automation to reduce toil and improve stability/MTTR.
Core responsibilities: L2/L3 escalation recovery; reliability signals & alert quality; blameless post-incident learning; logs/metrics/traces/dashboards actionable alerting; IaC/config-as-code; standardized automation (status/start/stop/restart); intelligent automation/AI-assisted ops with guardrails; drift/compliance checks remediation; CI/CD integration; runbooks & operational documentation.
seeking a Senior Systems Operations Engineer in technology as part of Consumer Lending Operations Technology. This role is focused on application and middleware production support with a Site Reliability Engineering (SRE) mindset—shifting from reactive operations to proactive reliability engineering through strong observability, automation, and continuous improvement. The position supports mission critical platforms across VM-based and container-adjacent environments, including OpenShift (OCP), and partners closely with application, middleware, infrastructure, network, and security teams to improve stability, reduce toil, and strengthen operational readiness. This includes hands-on ownership of incident response, problem management, and runbook-driven operations, while building automation and standardized patterns that make platform operations repeatable, auditable, and resilient. In this role, you will:
• Provide senior-level application and middleware support for complex, high-availability services; act as an escalation point for L2/L3 incidents; lead disciplined troubleshooting, recovery, and stabilization.
• Embed SRE practices into day-to-day operations: define reliability signals, improve alert quality, drive blameless post-incident learning, and prioritize systemic fixes and toil reduction.
• Implement and continuously improve observability across applications and middleware (logs, metrics, traces, dashboards, and actionable alerting) to improve detection, diagnosis, and MTTR.
• Design, develop, and maintain infrastructure-as-code and configuration-as-code capabilities supporting VM-based and container-adjacent workloads, including OpenShift (OCP) enablement.
• Build and support automation for operational actions across middleware components (standardized status checks, start/stop/restart patterns) to enable safer self-service and reduce dependency bottlenecks.
• Design and implement intelligent automation for platform and middleware operations, including integrating AI/agent-based approaches into workflows where appropriate (triage assistance, predictive signals, and automated remediation guardrails).
• Monitor configuration drift; support automated compliance checks; implement remediation patterns aligned to enterprise change management, security, and risk controls.
• Integrate infrastructure and operational automation with CI/CD pipelines to enable repeatable, auditable deployments and safer rollouts.
• Support core platform components that enable applications and container platforms, including ingress patterns, load balancing integration, and shared supporting services.
• Develop and maintain runbooks, operational documentation, and validation/testing approaches for automation and platform procedures to ensure operational readiness and consistent execution.
Required Qualifications
• 4 years of Systems Engineering or Technology Infrastructure/Operations Engineering experience, or equivalent demonstrated through work experience, training, military experience, or education. Desired Qualifications
• 4 years of application and/or middleware production support in complex, high-availability environments, including incident response and problem management with strong root cause discipline.
• 4 years of hands-on automation and configuration management experience (Ansible preferred or similar), plus strong scripting skills (Python, Bash, PowerShell, or similar).
• 4 years of Linux administration (RHEL preferred) and/or Windows Server administration supporting enterprise production workloads.
• 4 years of Git-based version control practices, including pull requests and peer review, with a focus on repeatability and code quality.
• Working experience with infrastructure-as-code concepts, including modular design and environment consistency.
• Experience supporting hybrid/private cloud platforms and container-adjacent hosting models; familiarity with OpenShift (OCP) or Kubernetes-based platforms.
• Experience implementing SRE operating practices (reliability metrics, reduction of manual toil, continuous improvement via post-incident learnings).
• Experience supporting common middleware platforms and shared services; ability to build automation patterns that standardize operational actions and reduce manual intervention.
• Familiarity with enterprise observability and operational support practices (service health dashboards, alert engineering, actionable telemetry).
• Exposure to responsible AI usage in operations (security, validation, accuracy, and appropriate guardrails for automation/agents).
• Strong cross-functional communication skills; experience operating in regulated environments.
Job Expectations
• Deliver assigned operational engineering and automation outcomes with a strong focus on stability, resiliency, and measurable toil reduction.
• Participate in on-call rotations and operational support coverage as required.
• Follow enterprise change management, risk, and compliance processes.
• Continuously improve platform reliability and automation maturity through standardization, documentation, and repeatable delivery.
• This position offers a hybrid work schedule.
• This position is not eligible for Visa sponsorship.
• Relocation assistance is not available for this position.
• Flexibility to work in a 24/7 environment, including weekends and holidays.
• Flexibility to frequently be on call beyond normal working hours.
Thanks,
Yuvi
Senior Technical Recruiter
Email: yuvi@themesoft.com
Web: www.themesoft.com
_______________________
Salary : $70