Location: San Diego, CA
The CXone Expert product is a multi-tenant SaaS platform, designed to handle millions of requests with high performance and reliability. Each Expert site can easily host a complex hierarchy of tens of thousands of pages (articles), with layers of fine-grained permissioning, server- and client-side customizations and branding, and other complex business logic. Our enterprise customers have a global presence, and delivering their content with low latency across the globe with near-zero downtime is what they expect.
CXone Expert is an agile engineering organization, and QA is fully automated. We release new versions of our platform every week through our CI/CD pipeline. Our application infrastructure runs on AWS and is almost entirely containerized and orchestrated by Kubernetes.
We need a DevOps Engineer to round out our Site Reliability / DevOps team. This person will be the go-to person for research and development of architectural changes from the infrastructure up. We have our AWS, cloudformation, Kubernetes, and Linux experts on the team already, and need someone to partner with them to design and build improvements to our platform to help us scale reliably as we expand our customer base over the coming years. Another important part of this role is helping other engineers on the team design and implement software that scales well and is highly reliable. You will get your hands dirty and refactor existing system / application code yourself (this is a hands-on role).
- Analyze system reliability and performance to address and prevent issues.
- Support services before they go live through activities such as system design consulting, developing software platforms and frameworks, capacity planning and launch reviews.
- Maintain services once they are live by measuring and monitoring availability, latency and overall system health.
- Scale systems sustainably through mechanisms like automation, and evolve systems by making code and configuration changes that improve reliability and velocity.
- Support the Engineering team by being a member of the on-call rotation and develop a plan where the responsibilities can be shared with a larger group.
- Engage in and improve the whole lifecycle of services—from inception and design, through deployment, operation and refinement.
- Practice sustainable incident response, blameless postmortems, and root cause analysis.
- Defining and developing continuous integration and deployment pipelines
- Building Infrastructure as Code
- Coordinating build and release activities with other stakeholders
- Training and mentoring other DevOps engineers
- Working with teams to develop code quality metrics and meters
- Identifying, researching, and prototyping new technologies to improve DevOps processes
- Troubleshooting & responding to downtime, performance degradation and outside attacks
- BS degree in Computer Science or related technical field involving coding (e.g., physics or mathematics), or equivalent practical experience.
- 5 years experience designing, analyzing and troubleshooting large-scale distributed systems
- Sustained track record of creating major improvements in large business-critical systems around stability, security, performance, and scalability.
- Excellent communication and analytical skills
- Ability to work independently, as well as part of a team, on multiple competing projects
- Ability to debug, profile, and optimize code and automate routine tasks.
- Can effectively facilitate cross-team work and are influential far beyond his or her individual group.
- Strong sense of ownership.
- Life long learner able to quickly grow new frameworks, architectures and languages
- Experience running production systems on AWS, Azure or Google Cloud
- A deep understanding of REST and network programming
- Experience scaling high traffic SaaS applications
- Experience with Kubernetes
- Experience with Application Monitoring Metrics (AWS X-Ray, Cloudwatch, Datadog, etc)
1 Day Ago
5 Days Ago
1 Week Ago