What are the responsibilities and job description for the Senior AI Systems Engineer position at ARA Brand?
Essential Functions:
- Lead the deployment, integration, and operational support of AI platforms, tools, and services, ensuring compatibility with existing systems and enterprise processes.
- Design, implement, monitor, and optimize AI infrastructure, working with server, cloud, and platform engineering teams.
- Operationalize machine learning workflows and support AI-enabled applications from development through production deployment and sustainment.
- Build and maintain CI/CD and MLOps pipelines for model packaging, testing, deployment, rollback, and lifecycle management.
- Implement infrastructure automation using scripting, Infrastructure as Code, and configuration management practices.
- Provide ongoing technical support, troubleshooting, root cause analysis, and documentation for AI platforms and user-facing AI services.
- Maintain observability across AI systems through logging, metrics, performance monitoring, alerting, and incident response practices.
- Ensure security, compliance, and governance requirements are met, including participation in audits, vulnerability management, and secure architecture reviews.
- Assess and implement system enhancements to improve performance, scalability, reliability, and cost efficiency.
- Collaborate across divisions to support diverse AI initiatives and align technical implementations with mission and business objectives.
- Evaluate emerging AI tools, frameworks, and infrastructure approaches for operational fit, supportability, and long-term value.
- Develop and maintain technical documentation, runbooks, architecture diagrams, and operational procedures.
Experience and Skills Required:
- Bachelor’s degree in computer science, Engineering, Information Technology, or a related STEM field with 8-10 years of engineering experience.
- 2 years of experience supporting AI/ML platforms, MLOps workflows, model deployment, or AI-enabled infrastructure.
- Strong coding and automation skills in Python, Bash, or similar scripting languages.
- Experience with AI/ML frameworks and tooling such as PyTorch, Hugging Face, or similar ecosystems.
- Proficiency with DevOps and MLOps practices, including CI/CD pipelines, Git-based workflows, containerization, and Kubernetes.
- Experience deploying AI/ML models or AI services into operational environments, including containerized, cloud, or high-performance computing environments.
- Familiarity with security frameworks and compliance standards such as NIST and CMMC.
- Familiarity with AI security functionality in enterprise environments including OAuth
- Strong communication skills and the ability to collaborate effectively across technical and non-technical teams.
Preferred:
- Advanced degree or certifications related to AI or machine learning.
- Experience integrating AI models into scientific workflows.
- Familiarity with large language model (LLM) APIs and orchestration frameworks such as OpenAI, Hugging Face, LangGraph, or LangChain.
- Experience with model serving, inference optimization, or AI platform tools such as MLflow, Kubeflow, vLLM, or similar.
- Experience with simulations for scientific or engineering projects, particularly physical systems simulations.
- Experience with GPU-based systems or running AI models in HPC environments.
- Experience writing and deploying MCP Servers on Kubernetes
- DoD experience
- Secret Security Clearance – Active or Inactive
Education:
- Bachelor’s degree in CS, Software Engineering or other IT-related field or equivalent experience
REMOTE WORK NOTICE: This position may be performed fully remote, hybrid, or onsite at an ARA office. Preference will be given to candidates located onsite in the Albuquerque, NM and Raleigh, NC area.