Demo

Cloud Platform/SRE #11154 (Dallas)

Ecco Select
Dallas, TX Full Time
POSTED ON 4/15/2026
AVAILABLE BEFORE 6/14/2026

Position Title: Cloud Platform/SRE Engineer

Start Date: April/May

Duration: 6 month contract-to-hire

Schedule: Hybrid, 2-days weekly onsite

Location: Dallas

Summary: The Platform Engineer / Site Reliability Engineer is a hands-on, hybrid role responsible for building, operating, and continuously improving Client’s cloud platform and production reliability posture. You will work at the intersection of infrastructure engineering, DevOps, and SRE designing and automating AWS-based platforms with Terraform, establishing observability and incident response practices, and embedding reliability into every layer of the stack.

This role also carries a forward-looking mandate: you will evaluate and integrate AI-powered tooling including large language models and AWS Bedrock to accelerate operations, improve developer experience, and drive intelligent automation across the platform.

You will collaborate closely with cloud engineers, application teams, and security stakeholders to deliver infrastructure that is secure, observable, cost-effective, and built for scale. As the Client continues its migration to AWS and modernization of its platform engineering practices, this role is central to establishing the reliability and automation standards that will define the next chapter of our infrastructure.

Education & Experience:

We’re targeting a strong hands-on engineer who blends platform/infrastructure depth with SRE discipline and a genuine curiosity for AI-driven operations. Ideal profile:

· Bachelor’s degree preferred; High School Diploma or Equivalent with relevant experience required.

· 4–7 years of combined cloud infrastructure, platform engineering, DevOps, or SRE experience with a primary AWS focus.

· Strong Terraform proficiency: modules, state management, workspaces, CI integration, and policy-as-code patterns.

· Hands-on experience with container orchestration (ECS/Fargate, EKS) and serverless (Lambda, API Gateway) in production environments.

· Demonstrated experience defining SLIs/SLOs, building observability pipelines (CloudWatch, X-Ray, or equivalent), and participating in on-call and incident response.

· Proficiency with Git/GitHub, GitHub Actions or equivalent CI/CD platforms, and infrastructure pipeline design.

· Linux systems administration fundamentals and scripting (Bash and/or Python).

· Docker image lifecycle management, registry operations (ECR), and container security basics.

· Practical understanding of networking: VPCs, subnets, routing, load balancers, DNS, hybrid connectivity, and multi-account architectures.

· Security fundamentals: least-privilege IAM, KMS, WAF, Secrets Manager, tagging discipline.

· Familiarity with AI/ML concepts; hands-on experience with LLMs, prompt engineering, or AWS Bedrock is strongly preferred.

· Experience with AI coding assistants, automation agents, or retrieval-augmented generation (RAG) patterns is a plus.

· FinOps basics: cost allocation, right-sizing, Savings Plans, and budget alerting.

· Azure familiarity (AKS, Key Vault, Firewall, Sentinel, Monitor) is helpful during the near-term transition but not a core requirement.

Tools & Technologies You’ll Use:

· AWS: EC2, ECS/Fargate, EKS, ECR, Lambda, API Gateway, RDS/Aurora, DynamoDB, ElastiCache, S3, EFS, CloudFront, Route 53, ALB/NLB, IAM, KMS, WAF, Secrets Manager, CloudWatch/X-Ray, Organizations/Control Tower, Bedrock, Kendra, OpenSearch, Step Functions.

· IaC/CI: Terraform (primary), CloudFormation (as needed), GitHub/GitHub Actions, OPA/Sentinel policy-as-code.

· SRE/Observability: CloudWatch (metrics, logs, alarms, dashboards), X-Ray, approved third-party APM/logging tools, PagerDuty or equivalent.

· Scripting/OS/Containers: Linux, Bash, Python, Docker, Helm.

· Networking: VPC, Transit Gateway, Direct Connect/VPN, PrivateLink, Route 53, DNS patterns.

· AI/ML: AWS Bedrock, Agents for Bedrock, LLM APIs, prompt engineering frameworks, RAG architectures.

· ITSM/FinOps: ServiceNow, Cost Explorer, Savings Plans/RI planning, budget/anomaly alerting.

Essential Duties & Responsibilities:

· Champion site reliability engineering practices across the organization: define and enforce service level objectives (SLOs) backed by meaningful service level indicators (SLIs) and error budgets; use reliability metrics to balance feature velocity against system stability and drive prioritization of engineering work.

· Build and maintain observability across production systems encompassing metrics, logs, traces, and dashboards to ensure teams have real-time visibility into system health, can detect and diagnose issues quickly, and can make data-driven decisions about capacity, performance, and reliability improvements.

· Author and maintain operational runbooks; participate in on-call rotations and lead post-incident reviews (blameless retrospectives) that drive measurable improvements to MTTR and MTTD.

· Drive reliability improvements through capacity planning, chaos and resilience testing patterns, dependency mapping, and progressive rollout strategies.

· Build and maintain a Terraform-first infrastructure platform: develop reusable modules, enforce state management strategy, implement policy-as-code (OPA/Sentinel), and maintain consistent tagging and resource governance across a multi-account AWS environment (Organizations/Control Tower).

· Design and operate CI/CD pipelines using GitHub Actions; create golden-path templates and paved-road workflows for build, test, scan, and deploy — improving developer experience and reducing friction for application teams.

· Architect and manage core AWS services including VPC networking (subnets, routing, security groups/NACLs, ALB/NLB, PrivateLink, NAT, Transit Gateway, Managed Firewall), compute (EC2, ECS/Fargate, EKS, Lambda), storage (S3, EFS), and data services (RDS/Aurora, DynamoDB).

· Standardize container image build and deployment patterns (blue/green, canary, rolling), autoscaling policies (HPA, target tracking), and serverless deployment models across the organization.

· Implement least-privilege IAM policies, KMS encryption, WAF rules, and Secrets Manager baselines; integrate Security Hub, GuardDuty, and Inspector findings into automated response and remediation workflows.

· Enforce guardrails via Service Control Policies (SCPs), tagging standards, and auditability requirements to meet internal policy and regulatory obligations.

· Evaluate and deploy AI-powered tooling to enhance platform operations and developer experience — including LLM-based assistants, AI coding agents, and retrieval-augmented workflows.

· Leverage AWS Bedrock (including Agents for Bedrock), Lambda, Step Functions, and retrieval services (Kendra/OpenSearch) to build and operate AI agent architectures that support DevOps automation, incident triage, and knowledge management.

· Monitor AI workload performance including accuracy, latency, token cost, and drift; establish guardrails and evaluation frameworks for production AI integrations.

· Stay current with the rapidly evolving AI/ML landscape and translate emerging capabilities into practical platform improvements.

· Apply FinOps practices: right-size resources, implement autoscaling, plan Savings Plans/Reserved Instances, and configure budget and anomaly alerts.

· Maintain accurate cloud asset and configuration data in ServiceNow CMDB; participate in change, incident, and problem management processes.

· Design resilient hybrid connectivity patterns (Direct Connect/VPN) and DNS architectures (Route 53) supporting private service access, cross-account resolution, and failover.

Certificates, Licenses, Registrations:

· AWS certifications (Solutions Architect Associate, SysOps Administrator, or Developer Associate) preferred; AWS DevOps Professional or Security Specialty a strong plus.

· Kubernetes certifications (CKA/CKAD) or Docker certifications are a plus.

· AWS AI Practitioner or Machine Learning Specialty certification is a differentiator.

· Azure certifications are optional and considered a plus during the transition period.

About Us

ECCO Select is certified as a Women-owned, Minority-owned, Small Business Enterprise. We are a talent acquisition and advisory consulting company, specializing in providing people, process, and technology solutions for our clients’ needs. ECCO Select has experience in assisting our commercial and government clients successfully manage projects and programs that transform their business operations through a variety of IT solutions. We’re the talent behind the technology. To find out more about ECCO visit www.eccoselect.com.

Our Commitment
We would love to have you join our team! ECCO Select is committed to hiring and retaining a diverse workforce. ECCO Select’s policy is to provide equal opportunity to all people without regard to race, color, religion, national origin, ancestry, marital status, veteran status, age, disability, pregnancy, genetic information, citizenship status, sex, sexual orientation, gender identity or any other legally protected category.

Equal Employment Opportunity is The Law

This Organization Participates in E-Verify

Job Types: Full-time, Contract

Benefits:

  • 401(k)
  • 401(k) matching
  • Dental insurance
  • Health insurance
  • Paid time off
  • Vision insurance

Application Question(s):

  • Can you work W2 without current sponsorship or future Visa transfer?
  • Are you experienced with AWS Bedrock?
  • Are you experienced with AI/ML in production, LLMs, ML Operations, or RAG?
  • Do you have experience was a Cloud Platform Engineer primarily with AWS and/or a background as a Site Reliability Engineer?
  • Any experience with Terraform?

Work Location: Hybrid remote in Dallas, TX 75254

Salary.com Estimation for Cloud Platform/SRE #11154 (Dallas) in Dallas, TX
$90,157 to $106,721
If your compensation planning software is too rigid to deploy winning incentive strategies, it’s time to find an adaptable solution. Compensation Planning
Enhance your organization's compensation strategy with salary data sets that HR and team managers can use to pay your staff right. Surveys & Data Sets

What is the career path for a Cloud Platform/SRE #11154 (Dallas)?

Sign up to receive alerts about other jobs on the Cloud Platform/SRE #11154 (Dallas) career path by checking the boxes next to the positions that interest you.
Income Estimation: 
$95,407 - $122,738
Income Estimation: 
$118,163 - $145,996
Income Estimation: 
$120,777 - $151,022
Income Estimation: 
$129,363 - $167,316
Income Estimation: 
$86,891 - $130,303
Income Estimation: 
$92,877 - $110,401
Income Estimation: 
$120,933 - $155,034
Income Estimation: 
$114,618 - $136,401
Employees: Get a Salary Increase
View Core, Job Family, and Industry Job Skills and Competency Data for more than 15,000 Job Titles Skills Library

Job openings at Ecco Select

  • Ecco Select Kansas, MO
  • Target Salary Range: $150-155K range Discretionary Annual Bonus: 15% Telecommuting: Mon thru Thu on-site / Fri remote Enterprise Cloud Architect team was r... more
  • 16 Days Ago

  • Ecco Select Kansas, KS
  • About Us ECCO Select is certified as a Women-owned, Minority-owned, Small Business Enterprise. We are a talent acquisition and advisory consulting company,... more
  • 1 Day Ago


Not the job you're looking for? Here are some other Cloud Platform/SRE #11154 (Dallas) jobs in the Dallas, TX area that may be a better fit.

  • WEX Dallas, TX
  • About The Team & Role We are looking for a highly motivated and high-potential Site Reliability Engineering (SRE) Manager to lead a team of engineers, lead... more
  • 5 Days Ago

  • Apex Systems Irving, TX
  • Job#: 3026106 Job Description: Client:Financial Services Team: Platform Engineering / SRE (Harness CD) Job Title: Systems Operations Engineer 4 – Harness C... more
  • 29 Days Ago

AI Assistant is available now!

Feel free to start your new journey!