Demo

DevOps Engineer, AI Infrastructure

Oscar Faye
Manhattan, NY Full Time
POSTED ON 6/3/2026
AVAILABLE BEFORE 7/2/2026

DevOps Engineer, AI Infrastructure | NYC | Confidential — Global Alternative Asset Manager

We're partnering with a leading alternative asset management holding company to find a DevOps Engineer who will own the infrastructure layer underneath their AI platform. This firm operates across asset management, reinsurance, alternative credit, and energy — and they are actively shipping AI agents into production across all of it.

The modern AWS foundation is in place and the AI platform is taking shape. What's being built now is the deployment, runtime, and operational backbone that the rest of the firm will build on. This is not a support role. You will own meaningful, hard problems at the center of a fast-moving AI engineering org.

The Role

You'll sit on the AI team at the holdings level, embedded at the center of the firm's AI buildout. You'll partner closely with platform engineers on shared infrastructure decisions, with forward-deployed AI engineers on what colleagues across the firm need to ship, and with existing infrastructure and security teams on how AI workloads fit the firm's broader posture.

This is a hands-on, high-ownership role. You will carry real responsibility from day one.

What You'll Own

Deployment Infrastructure Build and operate deployment pipelines purpose-built for agentic systems — where what's shipping isn't a deterministic service but a system whose behavior depends on prompts, tools, models, and context that all change independently.

Runtime & Orchestration Build runtime infrastructure for agentic workloads on Kubernetes, including orchestration of long-running multi-step jobs, autoscaling for bursty agent traffic, and the lifecycle management these workloads demand.

Observability Make agent behavior observable end-to-end. When an agent takes ten steps to accomplish something, you can see every step, every tool call, every input and output — and trace failures to root cause in minutes, not hours.

Security Posture Own the security posture of agentic systems in production: the secrets and permissioning model that scopes what agents can do, defenses against prompt injection and data exfiltration, tool-use sandboxing, and clear audit trails for every action an agent takes.

Incident Response & On-Call Carry your share of incident response for AI workloads in production and write the runbooks that let the rest of the team respond as confidently as you do.

Platform Primitives Spot where infrastructure should be productized into shared tooling, and partner with platform engineers to build it once, well.

What We're Looking For

  • 5 years running production infrastructure that real users depend on, with hands-on experience owning deploys, on-call, and incident response
  • Deep AWS experience and infrastructure-as-code discipline, with the judgment to know when to use a managed service versus building your own
  • Strong Kubernetes fluency — operating clusters in production, debugging workload issues, reasoning about networking, scheduling, and security primitives
  • First-principles debugging instincts: when something fails intermittently, you can trace it through the load balancer, TLS handshake, DNS resolver, OIDC flow, and upstream API without guessing
  • Security mindset built in from the start — you think about blast radius and least privilege before you ship, not after
  • Strong scripting and automation skills, with enough range to read and contribute to application code when the problem calls for it
  • Clear communicator who can translate what AI engineers need into infrastructure that actually serves them

Nice to Have

  • Hands-on experience building agents, including tool-use orchestration and multi-step workflows
  • Familiarity with workflow orchestration for durable, long-running execution (Temporal or similar)
  • Experience deploying and operating LLM applications in production, including evaluation harnesses, guardrails, and rollback strategies
  • Experience building developer platforms or internal tooling that engineers actually enjoy using
  • Familiarity with Snowflake


Salary.com Estimation for DevOps Engineer, AI Infrastructure in Manhattan, NY
$109,230 to $141,894
If your compensation planning software is too rigid to deploy winning incentive strategies, it’s time to find an adaptable solution. Compensation Planning
Enhance your organization's compensation strategy with salary data sets that HR and team managers can use to pay your staff right. Surveys & Data Sets

What is the career path for a DevOps Engineer, AI Infrastructure?

Sign up to receive alerts about other jobs on the DevOps Engineer, AI Infrastructure career path by checking the boxes next to the positions that interest you.
Income Estimation: 
$101,387 - $124,118
Income Estimation: 
$119,030 - $151,900
Income Estimation: 
$92,369 - $122,605
Income Estimation: 
$117,024 - $149,811
Employees: Get a Salary Increase
View Core, Job Family, and Industry Job Skills and Competency Data for more than 15,000 Job Titles Skills Library

Job openings at Oscar Faye

  • Oscar Faye York, NY
  • A top-tier proprietary trading firm has engaged us to find a Quant Strategist for their quantitative analytics group. The firm has been active across globa... more
  • 5 Days Ago


Not the job you're looking for? Here are some other DevOps Engineer, AI Infrastructure jobs in the Manhattan, NY area that may be a better fit.

  • Gatik AI Mountain View, CA
  • Who we are Gatik, the leader in autonomous middle-mile logistics, is revolutionizing the B2B supply chain with its autonomous transportation-as-a-service (... more
  • 2 Days Ago

  • NVIDIA AI Santa Clara, CA
  • Job Requisition ID JR2017386 Job Category Engineering Time Type Full time Become a Senior System Software Engineer on NVIDIA's AI Inference Operations Team... more
  • 29 Days Ago

AI Assistant is available now!

Feel free to start your new journey!