What are the responsibilities and job description for the Application Production Support - Site Reliability Engineer and DevOps position at Smart IT Frame LLC?
Job Title: Application Production Support - Site Reliability Engineer and DevOps
Location: Berkeley Heights, NJ (5 Days onsite) - Locals Prefer
Employment Type: Fulltime
About Smart IT Frame:
At Smart IT Frame, we connect top talent with leading organizations across the USA. With over a decade of staffing excellence, we specialize in IT, healthcare, and professional roles, empowering both clients and candidates to grow together.
Role Overview:
We are looking for an AppOps Engineer with strong experience in release management, deployment, and production support in a cloud-native environment.
Key Responsibilities:
1). Manage release and deployment activities across environments (especially PPD and Production)
2). Create, review, and manage Merge Requests (MRs) in GitLab as part of deployment workflows
3). Follow Git branching strategies and ensure proper version control practices
4). Monitor deployments using Argo CD (GitOps) and ensure successful rollout of applications
5). Handle production readiness, release coordination, and deployment windows
6). Work closely with Dev teams, QA, and stakeholders during deployments and testing cycles.
Technical Skills Required:
1). Cloud & Kubernetes: Strong hands-on experience with AWS (EKS), CloudWatch and other services.
2). Solid understanding of: Kubernetes architecture
3). Pod lifecycle: Deployments, Services, ConfigMaps
4). Experience using kubectl commands for troubleshooting
5). Strong experience with: GitLab and Git branching strategies.
6). Experience with Argo CD for application deployment and monitoring, Understanding of GitOps-based deployment model.
7). Hands-on experience with Helm charts, Helm lint, Managing application configurations using Helm.
8). Monitoring & Observability: Dynatrace (APM monitoring), Splunk (logs & debugging) and Moogsoft (alerts & incidents)
9). Scripting & OS: Linux and Shell / Bash / PowerShell
10). Ability to troubleshoot at system and application level, Messaging & Integration
11). Basic knowledge of Kafka (producers, brokers, connectivity issues)
12). Scheduling / Batch: Experience with Control-M for job scheduling and monitoring
13). Experience working in Agile environment (Scrum / SAFe)
14). Participation in: Sprint planning and PI planning
15). Use of Jira (task tracking) and Confluence (documentation)
16). Continuous Integration and Continuous Integration
17). DAST Scanning mechanism.
Good to Have:
1). Understanding of banking/payment systems and environments
2). Experience working with multi-environment setups (QA, PPD, PROD)
3). Exposure to Chaos Testing / resilience validation