Demo

Site Reliability Engineer - Monitoring Specialist

xAI
Memphis, TN Full Time
POSTED ON 11/8/2025
AVAILABLE BEFORE 12/6/2025
About xAI

xAI’s mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge. Our team is small, highly motivated, and focused on engineering excellence. This organization is for individuals who appreciate challenging themselves and thrive on curiosity. We operate with a flat organizational structure. All employees are expected to be hands-on and to contribute directly to the company’s mission. Leadership is given to those who show initiative and consistently deliver excellence. Work ethic and strong prioritization skills are important. All engineers are expected to have strong communication skills. They should be able to concisely and accurately share knowledge with their teammates.

About the Role

As an SRE - Monitoring Specialist, you will focus on developing and managing monitoring solutions, with heavy emphasis on Grafana for creating dashboards that provide visibility into datacenter health. You will leverage programming skills to automate monitoring, analyze data, and scale business operations through insightful visualizations. This role requires collaboration with datacenter teams to deliver actionable insights and minimize downtime in xAI's infrastructure.

Responsibilities
  • Design, build, and maintain Grafana dashboards tailored for datacenter technician organizations, providing real-time views into system health, performance metrics, and monitoring alerts.
  • Develop automation scripts and tools using languages such as Java, Golang, Python, C/C /C#, Bash, or Linux shell scripting to integrate monitoring systems and process data in JSON formats.
  • Collaborate with Datacenter Operations Technicians to identify monitoring needs, troubleshoot issues, and ensure dashboards support efficient incident response and preventive maintenance.
  • Evaluate and optimize existing dashboards for scalability, drawing from past experiences in creating monitoring solutions that have driven business growth.
  • Manage dashboard lifecycle, including version control, updates, and performance tuning to handle large-scale datacenter environments.
  • Participate in on-call rotations, incident analysis, and root cause investigations using monitoring data to improve system reliability.
  • Document monitoring strategies, dashboard designs, and best practices to foster knowledge sharing within the team.
Required Qualifications
  • Bachelor's degree in Computer Science, Software Engineering, or a related field (or equivalent experience).
  • 5 years of experience in site reliability engineering or monitoring roles, preferably in datacenter or cloud environments.
  • Proficiency in at least two of the following programming languages: Java, Golang, Python, C/C /C#, with strong skills in Linux and Bash scripting.
  • Hands-on experience working with JSON for data parsing, integration, and API interactions.
  • Expert-level knowledge of Grafana, including creating complex dashboards, queries, and integrations with data sources like Prometheus or InfluxDB.
  • Proven track record of developing dashboards that provide health and monitoring views for operational teams, with examples of how they scaled business operations.
  • Experience managing monitoring tools and dashboards, including optimization, alerting, and integration into CI/CD pipelines.
  • Strong problem-solving skills with a focus on data-driven decision-making and collaboration in fast-paced environments.
Preferred Qualifications
  • Experience in AI/ML infrastructure or high-performance computing monitoring.
  • Familiarity with other monitoring tools (e.g., Grafana) and observability practices.
  • Prior work in a startup or tech company like xAI, with contributions to scalable monitoring systems.

xAI is an equal opportunity employer.

California Consumer Privacy Act (CCPA) Notice

Salary.com Estimation for Site Reliability Engineer - Monitoring Specialist in Memphis, TN
$86,232 to $109,341
If your compensation planning software is too rigid to deploy winning incentive strategies, it’s time to find an adaptable solution. Compensation Planning
Enhance your organization's compensation strategy with salary data sets that HR and team managers can use to pay your staff right. Surveys & Data Sets

What is the career path for a Site Reliability Engineer - Monitoring Specialist?

Sign up to receive alerts about other jobs on the Site Reliability Engineer - Monitoring Specialist career path by checking the boxes next to the positions that interest you.
Income Estimation: 
$137,568 - $176,908
Income Estimation: 
$158,960 - $205,707
Income Estimation: 
$71,493 - $96,419
Income Estimation: 
$92,369 - $122,605
Income Estimation: 
$92,369 - $122,605
Income Estimation: 
$117,024 - $149,811
Income Estimation: 
$117,024 - $149,811
Income Estimation: 
$137,568 - $176,908
View Core, Job Family, and Industry Job Skills and Competency Data for more than 15,000 Job Titles Skills Library

Job openings at xAI

xAI
Hired Organization Address Palo Alto, CA Full Time
About xAI xAI’s mission is to create AI systems that can accurately understand the universe and aid humanity in its purs...
xAI
Hired Organization Address Memphis, TN Full Time
About xAI xAI’s mission is to create AI systems that can accurately understand the universe and aid humanity in its purs...
xAI
Hired Organization Address York, NY Full Time
X is the digital townsquare of what’s happening and what people are talking about right now. For us, life's not about a ...
xAI
Hired Organization Address Palo Alto, CA Full Time
About xAI xAI’s mission is to create AI systems that can accurately understand the universe and aid humanity in its purs...

Not the job you're looking for? Here are some other Site Reliability Engineer - Monitoring Specialist jobs in the Memphis, TN area that may be a better fit.

AI Assistant is available now!

Feel free to start your new journey!