What are the responsibilities and job description for the Senior Site Reliability Engineer position at ThoughtSpot?
Job Role: Sr. SRE (Customer-Facing)
About the Role
We are seeking a customer-centric SRE who thrives in solving complex problems and delivering exceptional customer experiences. This role requires a strong balance of technical expertise, problem-solving skills, and clear communication. You will play a key part in helping customers deploy, maintain, and troubleshoot distributed services and applications across both cloud and on-premise environments.
As part of the ThoughtSpot SRE team, you will not only ensure service reliability but also act as a trusted partner for our customers by providing timely updates, meaningful solutions, and proactive improvements. If you are passionate about ownership, customer success, and building resilient systems, this role is for you.
Responsibilities
- Act as the primary point of contact for customer-facing issues, ensuring a customer-first approach to troubleshooting, debugging, and diagnosis.
- Provide timely, accurate, and clear updates to customers, meeting SLAs and driving issues through to resolution.
- Create and maintain knowledge-base articles to empower customer self-service and improve support efficiency.
- Maintain, monitor, and troubleshoot ThoughtSpot cloud infrastructure using tools like Grafana, Prometheus, and other monitoring solutions.
- Collaborate with Engineering teams to define and implement tools that enhance debuggability, supportability, availability, scalability, and performance.
- Participate in on-call rotations, lead incident reviews, and conduct root cause analyses to ensure continuous improvement.
- Develop and implement automation and best practices to streamline operations and strengthen system reliability.
- Understand and apply NetOps and SecOps principles for cloud and on-premise deployments.
- Contribute to improving the overall customer experience by translating complex technical issues into clear, concise updates.
What You’ll Bring
- B.S. in Computer Science or equivalent relevant experience.
- Proven experience in troubleshooting Linux systems and managing virtualization & cloud platforms (VMware, AWS, Azure, GCP).
- Hands-on experience with Grafana or similar monitoring tools (e.g., Prometheus, Datadog, Splunk).
- Strong problem-solving and algorithmic thinking with a solid understanding of system internals.
- Prior experience in enterprise customer support, including on-call rotations and incident management.
- Excellent verbal and written communication skills, with the ability to explain technical concepts clearly to both technical and non-technical stakeholders.
- Familiarity with automation, scripting, and programming languages such as Python, Go, Java, or C/C .
- Exposure to infrastructure and service monitoring frameworks, and the ability to analyze data to ensure high availability.
- Strong collaboration skills, with the ability to work independently and cross-functionally in fast-paced environments.
Salary : $75,000 - $120,000