What are the responsibilities and job description for the SRE/MQ Infrastructure Lead position at Stash Talent Services?

Position: SRE/MQ Infrastructure Lead

Location: Plano, TX (3 days in office, 2 days remote)

Duration: 12-month contract (ext. up to 36 months)

Job Details:

We are seeking an experienced Site Reliability Engineer (SRE)/ MQ Infrastructure Lead for Messaging Services to drive platform reliability, observability, and operational excellence across IBM MQ and Kafka environments. This role combines production engineering and reliability leadership, platform security and resilience engineering, and ownership of large-scale, distributed messaging runtimes. The position has a hybrid schedule requiring a minimum of 3 days per week on-site.

Responsibilities:

Leading reliability engineering for high-scale messaging platforms supporting tens of thousands of runtimes and high-volume message throughput
Driving EOL remediation, patching, and stabilization across MQ queue managers and Kafka clusters
Implementing SRE best practices such as SLIs / SLOs focused on message delivery, latency, and availability, and incident management, escalation, and postmortem culture
Enhancing observability and monitoring for messaging flows, queue depths, lag, and throughput
Designing proactive fault detection and auto-remediation strategies (e.g., DLQ handling, backlog mitigation, failover recovery)
Building resilient messaging platforms capable of supporting real-time, event-driven workloads
Supporting global production messaging environments with on-call rotation and escalation ownership
Partnering with engineering, application, and security teams to ensure reliability, scalability, and secure message transport

Requirements:

Strong experience in Site Reliability Engineering / Production Engineering
Hands-on expertise with IBM MQ (queue managers, clustering, channels, DLQ management), Kafka / Confluent platform (topics, brokers, partitions, consumer groups), and large-scale distributed messaging systems and runtime management
Deep understanding of system reliability, scalability, and high availability design; messaging reliability patterns (guaranteed delivery, retry handling, replay, ordering); and incident management, root cause analysis, and problem management
Experience with observability tools (Dynatrace, Splunk, Prometheus, Grafana) for messaging platforms and event and anomaly detection in high-volume systems
Strong scripting/automation skills in Shell, Python, PowerShell
Experience managing Linux/Unix and Windows production environments
Knowledge of event-driven architecture and messaging-based integration patterns
Understanding of messaging platform security (TLS, certificates, channel auth, encryption) and vulnerability remediation and risk mitigation in production systems
Excellent troubleshooting skills in high-pressure, real-time environments (e.g., message backlog, latency spikes, connection failures)

Desired skills:

Experience implementing SRE frameworks (SLIs, SLOs, error budgets) specifically for messaging workloads
Familiarity with Kubernetes / containerized messaging platforms
Experience with Kafka ecosystem components (Schema Registry, Connect, Streams) and IBM MQ advanced features (Native HA, clustering)
Exposure to AI-driven operations (AIOps), anomaly detection, or automated remediation and large-scale messaging modernization or migration programs
Messaging or middleware certifications (IBM MQ, Kafka, or equivalent)
Experience in regulated environments (e.g., financial services)

Salary : $60

Apply for this job

Receive alerts for other SRE/MQ Infrastructure Lead job openings

SRE/MQ Infrastructure Lead

What are the responsibilities and job description for the SRE/MQ Infrastructure Lead position at Stash Talent Services?

What is the career path for a SRE/MQ Infrastructure Lead?

Job openings at Stash Talent Services

Not the job you're looking for? Here are some other SRE/MQ Infrastructure Lead jobs in the Plano, TX area that may be a better fit.

We don't have any other SRE/MQ Infrastructure Lead jobs in the Plano, TX area right now.

AI Assistant is available now!