What are the responsibilities and job description for the Site Reliability Engineer position at ComTec Information Systems?
Production Engineer
6 months C-H
Plano, TX
Onsite
Responsibilities:
- 3-4 years of experience in production engineering and site reliability engineering (SRE) to design, implement, and maintain highly available, scalable, and resilient systems.
- Own end-to-end operational responsibilities include monitoring, incident response, root cause analysis, capacity planning, and automation to ensure optimal system performance and reliability in production environments.
- Collaborate cross-functionally with development, QA, and infrastructure teams to streamline CI/CD pipelines, automate deployments, and enforce best practices for security, compliance, and disaster recovery.
- Utilize a broad set of tools and technologies to proactively detect, troubleshoot, and resolve production issues, minimizing downtime and improving service-level objectives (SLOs) and service-level agreements (SLAs).
- Requirements:
Requirements:
Java, JavaScript, Cloud-based Microservices, Spring Boot, AWS
- Build, deploy, and maintain cloud-native microservices using Java, Spring Boot, and JavaScript frameworks, ensuring high availability and scalability.
- Design and implement RESTful APIs and event-driven architectures using AWS services such as Lambda, ECS/EKS, SQS, and SNS.
- Develop and maintain CI/CD pipelines with Jenkins, GitLab CI, or AWS CodePipeline for automated testing and deployment.
- Monitor application and infrastructure health using AWS CloudWatch, Prometheus, Grafana, and distributed tracing tools like Jaeger or AWS X-Ray.
- Troubleshoot production issues, perform root cause analysis, and implement fixes to improve system reliability.
- Implement security controls including IAM roles, OAuth2, JWT, and encryption for data in transit and at rest.
- Collaborate with cross-functional teams to design fault-tolerant, resilient systems with automated failover and recovery.
- Optimize cloud resource usage and cost through rightsizing and autoscaling configurations.
- Automate operational tasks and incident response using scripting and infrastructure as code (Terraform, CloudFormation).
- Maintain detailed documentation of system architecture, deployment processes, and operational runbooks.
Salary : $60 - $66