What are the responsibilities and job description for the Site Reliability Engineer position at Blankfactor?

This position is as a full time position supporting the financial services/payments space and is fully onsite 5 days per week with some on call support (rotation basis) in Berkeley Heights, NJ. Please apply only if you have experience with a valid work authorization. Unfortunately we cannot work via C2C or C2H, this is W2.

About Blankfactor

At Blankfactor, we are dedicated to engineering impact. We build high-quality tech solutions for companies looking to innovate and grow—especially in fast-moving industries like payments, banking, capital markets, and life sciences.

About the Role

As a Site Reliability Engineer, you will ensure the reliability, availability, and performance of mission-critical platforms by building scalable systems, robust automation, and data-driven operations. You will partner closely with development, cloud, infrastructure, and security teams to deliver resilient, high-performing services that support the way people live and work today.

Design and implement solutions that enhance application reliability, performance, scalability, and resilience.
Build and maintain monitoring, alerting, observability, and telemetry to drive proactive detection and rapid incident response.
Lead incident management efforts, perform root cause analysis, and implement action- oriented post-mortem improvements.
Automate operational workflows using scripting, IaC, and configuration management tools.
Analyze capacity, performance, and usage trends to forecast demand and optimize cloud costs.
Collaborate with engineering teams to embed operability, resilience, and security into application and architecture designs.
Support safe, reliable deployments through CI/CD pipelines, release governance, and change control.
Maintain clear runbooks, architecture diagrams, and operational documentation that enable efficient production support.

Required Qualifications

Managing Kubernetes and containerized workloads (EKS, AKS, GKE), including scaling, networking, upgrades, and orchestration.
Experience in public cloud platforms (AWS, Azure, or GCP) across compute, storage, networking, IAM, and cost governance.
Using observability and APM tools such as Dynatrace, Splunk, Prometheus, Grafana, Datadog, ExtraHop, etc.
Implementing security and compliance controls in regulated environments (e.g., PCI DSS, SOC 2), including secrets management and vulnerability remediation.
Infrastructure as Code experience using Terraform, CloudFormation, Ansible, or similar tools.
Designing and maintaining CI/CD pipelines using Jenkins, GitLab CI, GitHub Actions, or Azure DevOps.
Scripting and automation using Bash, PowerShell, or Python.
Equivalent combination of education, experience, and/or military background

Nice to Have

Certifications such as AWS SysOps Administrator, AWS DevOps Engineer, Google Cloud DevOps Engineer, or CKA.
Experience with Premier applications, IBM iSeries, and/or Unisys systems.
Hands-on database operations and performance tuning (Oracle, SQL Server, PostgreSQL).
Proven experience in major incident command, stakeholder communication, and cross-team coordination.
Experience with ITIL and ServiceNow (change, problem, and configuration management).

Apply for this job

Receive alerts for other Site Reliability Engineer job openings

Site Reliability Engineer

What are the responsibilities and job description for the Site Reliability Engineer position at Blankfactor?

Not the job you're looking for? Here are some other Site Reliability Engineer jobs in the Berkeley, NJ area that may be a better fit.

We don't have any other Site Reliability Engineer jobs in the Berkeley, NJ area right now.

AI Assistant is available now!