You haven't searched anything yet.
We are looking for a Senior Site Reliability Engineer based in Georgia. In this role, you will be responsible for:
Running the production environment to provide the highest levels of uptime, performance, and reliability.
Identify toil in the day-to-day operations and automate whatever can be automated
Work with development teams to make sure the applications are production-ready, scalable, reliable, and observable from day zero
Identify and drive opportunities to improve automation for code deployment, management, and visibility of application services
Establish end-to-end monitoring and alerting on all critical components within the platform
Participate in the on-call rotation, supporting the platform and production applications
Manage end-to-end availability and performance of critical services and build automation
Perform root cause analysis on issues, and participate in blameless post-mortems so we can learn from incidents and automate them out of recurrence
Independently troubleshoot complex systems and environments including applications, microservices, DNS, and networking components
Create load test scenarios and streamline their execution so performance regressions can be caught pre-production
Enable developers and product teams to move rapidly with features without sacrificing reliability, availability, and overall performance of our systems
Participate in architecture reviews and work cross-functionally with Engineering teams on operational readiness and tactical day-to-day scenarios
Work with engineering teams to better address needs and enable more effective and efficient developer throughput
Identify performance bottlenecks and triage with Engineering teams to design and implement a secure and performant solution
Guide development teams towards security, reliability, and availability best practices during the SDLC
Daily and Monthly Responsibilities
Gather and analyze metrics from both operating systems and applications to assist in performance tuning and fault finding
Partner with development teams to improve services through rigorous testing and release procedures
Participate in system design consulting, platform management, and capacity planning
Create sustainable systems and services through automation and uplifts
Balance feature development speed and reliability with well-defined service level objectives and service-level indicators to honor SLAs
If you’re looking for a real challenge in terms of mission criticality, multi-geographic region deployments, diversity of managed services, and the chance to work with cutting edge technologies like Kubernetes, Kafka, Serverless, ArgoCD and more, then this might be the position for you!
Experience administering Kubernetes-based microservices, ingress controllers, web servers (nginx), and databases (Postgres, MySql, MongoDB; Desirable - Redis, Clickhouse)
Strong experience with AWS technologies such as EKS, ELB, RDS, S3/EBS/Glacier and VPC
Experience architecting highly scalable, fault tolerant, secure, and available systems within the AWS ecosystem
Strong troubleshooting experience in the realm of networking fundamentals, web applications, and DNS
Hands-on experience developing automation to streamline development processes
Experience working with modern CI/CD tools such as CircleCI, ArgoCD, CodeShip, GitHub Actions, or similar solutions
Experience with Infrastructure as Code tools (e.g. Terraform, CloudFormation)
BS or MS from a top-notch CS program (or equivalent experience)
5 years professional experience in hands-on engineering roles (DevOps/SRE);
3 years operating high-traffic production environments in public clouds: AWS, GCP, or Azure
Python programming experience in production environments
Experience with modern cloud environments: containerization, infrastructure-as-code, devops, CI/CD pipelines and general automation
Hands on experience with network security, databases systems and related tools
English speaking and writing
Operating Kubernetes clusters in a compliance regulated environment
Experience performing stress-testing, failure analysis, and load testing apps
Experience with cloud and infrastructure security regulations & compliance programs: SOC2, ISO27001, HIPAA, GDPR, CCPA
Experience with ML Ops: Spark, TensorFlow, GPUs
IT Outsourcing & Consulting
MOUNTAIN VIEW, CA
200 - 500
$50M - $200M
IT Outsourcing & Consulting
Workato develops an AI-based workflow automation platform that enables enterprises to integrate applications and automate business processes.
The job skills required for Senior Site Reliability Engineer include AWS, Kubernetes, Python, DevOps, Java, Troubleshooting, etc. Having related job skills and expertise will give you an advantage when applying to be a Senior Site Reliability Engineer. That makes you unique and can impact how much salary you can get paid. Below are job openings related to skills required by Senior Site Reliability Engineer. Select any job title you are interested in and start to search job requirements.
The following is the career advancement route for Senior Site Reliability Engineer positions, which can be used as a reference in future career path planning. As a Senior Site Reliability Engineer, it can be promoted into senior positions as a Corrosion Engineer III that are expected to handle more key tasks, people in this role will get a higher salary paid than an ordinary Senior Site Reliability Engineer. You can explore the career advancement for a Senior Site Reliability Engineer below and select your interested title to get hiring information.
If you are interested in becoming a Senior Site Reliability Engineer, you need to understand the job requirements and the detailed related responsibilities. Of course, a good educational background and an applicable major will also help in job hunting. Below are some tips on how to become a Senior Site Reliability Engineer for your reference.
Step 1: Understand the job description and responsibilities of an Accountant.
Quotes from people on Senior Site Reliability Engineer job description and responsibilities
Spend a considerable amount of time fixing cases related to support escalation.
03/01/2022: Carson City, NV
Analyzing, troubleshooting and designing vital services, platforms and infrastructure.
03/20/2022: Indianapolis, IN
Use computers to produce and analyze designs, to simulate and test how a machine, structure or system operates, to generate specifications for parts, to monitor the quality of products and to control the efficiency of processes.
03/02/2022: Philadelphia, PA
Responsible for keeping all user-facing services and other GitLab production systems running smoothly.
03/01/2022: Troy, NY
Ensure the maintainability and reliability of equipment, buildings and facilities to achieve a high level of asset preservation at a reduced total operating cost.
03/22/2022: Dayton, OH
Step 2: Knowing the best tips for becoming an Accountant can help you explore the needs of the position and prepare for the job-related knowledge well ahead of time.
Career tips from people on Senior Site Reliability Engineer jobs
With 1 to 4 years of experience.
04/26/2022: Altoona, PA
Bachelor's degree in mechanical or electrical engineering, master's degree preferred.
03/30/2022: Chillicothe, OH
Professional experience in manufacturing/production and reliability.
02/27/2022: Riverside, CA
Familiarity with Six Sigma methodology.
04/18/2022: Davenport, IA
Requires a bachelor’s degree and at least three years of experience.
05/01/2022: Springfield, IL
Step 3: View the best colleges and universities for Senior Site Reliability Engineer.