Demo

Software Development Engineer III - Infrastructure

Valiant Harbor International, LLC
Washington, DC Full Time
POSTED ON 5/15/2026
AVAILABLE BEFORE 6/13/2026
Valiant Harbor International is seeking a Software Development Engineer III – Infrastructure (SDE III) to support the Director’s Office at the Advanced Research Projects Agency for Health (ARPA-H). This candidate will contribute to the General Research Assistant and Content Engine (GRACE) development team in building the next generation of agentic AI to transform how ARPA-H Program Managers accelerate research, make decisions, and ship products at scale. GRACE is ARPA-H’s production AI assistant, and ARPA-H’s intention is to evolve it into an ecosystem of autonomous, multi-agent systems. This is a full-time, remote position. The candidate must be able to travel within the U.S.

Key Responsibilities

  • Manage end-to-end backend infrastructure for GRACE on Microsoft Azure:
    • Azure Functions, Azure API Management, Azure Container Apps, and Azure OpenAI Service.
    • Manage storage, retrieval pipelines, vector databases, and document indexing that power GRACE's internal knowledge search.
    • Authentication and identity integration, including ARPA-H Entra ID and application-level access control.
    • Implement and maintain infrastructure as code for all environments.
    • Own CI/CD pipelines, deployment automation, and release processes including canary and gradual rollouts.
    • Be responsible for production system basics (e.g., monitoring, alerting, logging, distributed tracing, SLOs, and incident response runbooks).
    • Manage secrets, API keys, and credential rotation across all integrations with external providers.
    • Monitor for cost-related efficiencies across all LLM providers; track spending, set budgets, build guardrails, and optimize for cost-per-query without sacrificing quality.
  • Agentic AI and Protocol Infrastructure:
    • Manage the backend implementation of MCP, including MCP server hosting, tool registration, versioning, and lifecycle management on Azure.
    • Implement and evolve A2A communication patterns to enabling GRACE agents interoperability with internal/external systems.
    • Design and maintain LLM orchestration, routing, and multi-model switching infrastructure across OpenAI GPT, Anthropic Claude, and Google Gemini families.
    • Build and operate RAG pipelines; document ingestion, chunking, embedding, and semantic search.
    • Implement robust fallback, retry, circuit-breaker, and graceful degradation patterns for all AI service dependencies.
    • Manage tool-calling infrastructure; to include registration, execution, error handling, and observability for all GRACE tools.
  • Manage observability and production quality:
    • Build and maintain end-to-end observability for agentic workflows: latency, throughput, error rates, token usage, and LLM quality metrics.
    • Implement LLM evaluation pipelines including safety checks, regression monitoring, and grounding assessment.
    • Define and enforce system-level SLOs for availability, response time, and tool call reliability.
    • Manage alerting and on-call runbooks.
  • Collaborate and foster teamwork:
    • Establish and improve coding standards, design review processes, and testing practices.
    • Communicate technical decisions in writing and in conversation to both engineers and non-engineers.
    • Mentor and guide other engineers.
    • Think inventively and consider other perspectives; work backward from the user to understand problems before proposing solutions.
    • Ensure strict privacy, security, and compliance in all systems, integrations, and data handling.
Required Qualifications

  • Bachelor's or Master's in Computer Science, Software Engineering, or related field, or equivalent practical experience.
  • 7 years of professional software engineering experience building and operating production systems.
  • Proven experience in high-velocity environments shipping complex products end-to-end.
  • Strong proficiency in backed languages (to include Python); familiarity with modern backend frameworks and async patterns.
  • Solid understanding of distributed systems, APIs, data pipelines, and software design patterns.
  • Hands-on experience on Microsoft Azure: Azure Functions, API Management, Container Apps, and Azure OpenAI Service.
  • Experience with containerization, CI/CD, and infrastructure as code.
  • Strong understanding of authentication and identity systems (OAuth2, OIDC, Azure Entra ID or equivalent).
  • Demonstrated experience/ability with production systems (having been on-call, debugged incidents, etc.).
  • Excellent communication and team building skills; focused on making others around them better.

Preferred Qualifications

  • Hands-on experience building and operating MCP servers in production, including tool registration, versioning, and hosting on Azure Functions or equivalent serverless infrastructure.
  • Experience implementing A2A communication patterns and multi-agent orchestration frameworks.
  • Significant experience building on top of LLMs in production (tool-calling, RAG, multi-step reasoning, multi-model routing, and context window management).
  • Ability to demonstrate considerations for cost-per-query, context budgets, and prompt efficiency as first-class engineering concerns.
  • Experience managing multi-provider LLM integrations, including rate limits, fallback routing, and API versioning.
  • Experience in security-conscious engineering within regulated or government environments.
  • Previous track record in startup or early-stage environments (0-to-1 product building, comfort with ambiguity, and a high sense of urgency).
  • Experience in big tech building customer-facing platforms or developer infrastructure at scale.
  • Familiarity with vector databases, embedding pipelines, and semantic search infrastructure.

Salary Range: Negotiable

EEO Statement: Valiant Harbor International, LLC is an Equal Opportunity/Affirmative Action employer. Valiant Harbor International prohibits discrimination with respect to the hiring or promotion of individuals, conditions of employment, disciplinary and discharge practices, or any other aspect of employment on the basis of sex, race, color, age, national origin, religion, disability, marital status, sexual orientation, gender identity, pregnancy, veteran status, or any other protected class. If you are an individual with a disability and require a reasonable accommodation to complete any part of the application process, or are limited in the ability or unable to access or use this online application process and need an alternative method for applying, you may contact (202) 417-6705 for assistance.

This is a full time position

Salary.com Estimation for Software Development Engineer III - Infrastructure in Washington, DC
$107,655 to $130,149
If your compensation planning software is too rigid to deploy winning incentive strategies, it’s time to find an adaptable solution. Compensation Planning
Enhance your organization's compensation strategy with salary data sets that HR and team managers can use to pay your staff right. Surveys & Data Sets

What is the career path for a Software Development Engineer III - Infrastructure?

Sign up to receive alerts about other jobs on the Software Development Engineer III - Infrastructure career path by checking the boxes next to the positions that interest you.
Income Estimation: 
$123,167 - $152,295
Income Estimation: 
$146,673 - $180,130
Income Estimation: 
$146,673 - $180,130
Income Estimation: 
$176,149 - $220,529
Employees: Get a Salary Increase
View Core, Job Family, and Industry Job Skills and Competency Data for more than 15,000 Job Titles Skills Library

Not the job you're looking for? Here are some other Software Development Engineer III - Infrastructure jobs in the Washington, DC area that may be a better fit.

  • Akima Infrastructure Services Alexandria, VA
  • SUVI Global Services is looking for a Software Engineer to support IT across all DoD OIG networks. To join our team of outstanding professionals, apply tod... more
  • 1 Day Ago

  • Amazon Arlington, VA
  • Description Join us at the forefront of Amazon's sustainability initiatives to work on environmental and social advancements to support Amazon's long term ... more
  • 18 Days Ago

AI Assistant is available now!

Feel free to start your new journey!