What are the responsibilities and job description for the Enterprise Solutions Architect (Cloud) position at Arch Systems?
Job Title: Enterprise Solutions Architect (Cloud)Company: Arch SystemsClient: FederalLocation: RemotePosition Type: Full-Time Job SummaryArch Systems is seeking a hands-on Enterprise Solutions Architect (Cloud) to design, build, and deliver secure, production-grade AI systems supporting federal civilian missions. This is a player-coach role requiring deep technical execution combined with executive-level solutioning and stakeholder engagement.You will lead end-to-end AI solution delivery—from discovery and architecture through production deployment and ATO—while spending significant time writing code, reviewing designs, and mentoring teams. You will also present solutions to federal stakeholders, clearly articulating mission impact, benefits, risk posture, and ROI.This role demands strong expertise in cloud-native AI/ML systems on AWS and/or Azure, federal compliance environments, and modern GenAI/RAG architectures. Core ResponsibilitiesHands-On Engineering & Delivery (Player-Coach)Write production-grade Python and build FastAPI-based microservices for AI/ML and GenAI workloads.Design and implement GenAI/RAG pipelines, including embeddings, vector databases, prompt orchestration, and evaluation frameworks.Containerize workloads using Docker and deploy to Kubernetes (EKS and/or AKS) using CI/CD pipelines.Spend ≥50% of active build phases coding, pairing with engineers, and conducting design and code reviews. Cloud-Native Architecture & OperationsDesign and operate AI systems on AWS and/or Azure, including GovCloud or Azure Government environments when required.Implement Infrastructure as Code using Terraform and/or CloudFormation/Bicep.Architect secure cloud networking (VPC/VNet, private endpoints, VPN/ExpressRoute/Direct Connect).Integrate cloud AI services such as SageMaker, Bedrock, OpenSearch, EKS and/or Azure ML, Azure OpenAI, AKS, Cognitive Search.Define and implement HA/DR strategies, autoscaling, and reliability patterns across regions and availability zones.MLOps & Platform EngineeringStand up and operate MLOps platforms using MLflow, Databricks, SageMaker, or Azure ML.Manage model lifecycle: experimentation, registry, gated promotions, canary releases, rollback.Implement automated testing, monitoring, and alerting for model drift, bias, robustness, and performance.GenAI Safety, Quality & EvaluationEngineer prompt flows, grounding strategies, guardrails, and policy enforcement.Define offline and online evaluation using golden datasets and human-in-the-loop workflows.Monitor and optimize factuality, relevance, toxicity, latency, and token usage.Security, Compliance & ATO by DesignEmbed NIST RMF (800-37), 800-53, 800-171, FISMA, and FedRAMP controls into system design.Implement IAM, encryption at rest and in transit, secrets management, logging, and auditing.Contribute to SSPs, POA&Ms, continuous monitoring, and coordinate with ISSOs and 3PAOs.Observability, SRE & Cost ManagementInstrument systems using OpenTelemetry, logs, metrics, and traces.Define SLIs/SLOs, dashboards, alerts, and conduct game days and post-mortems.Model total cost of ownership (TCO) and manage cloud spend using FinOps practices.Optimize performance and cost through right-sizing, autoscaling, caching, batching, quantization, and distillation.Executive Communication & Capture SupportBuild executive-ready decks and demos for federal stakeholders.Clearly communicate mission value using KPIs (cycle-time reduction, precision/recall, latency, cost per query, compliance posture).Support RFIs, RFPs, technical volumes, and orals; lead technical Q&A with mixed audiences.Leadership, Reuse & DocumentationLead cross-functional agile teams and mentor engineers through pairing and reviews.Define “definition of done” including tests, documentation, security scans, and performance baselines.Publish reusable accelerators: reference architectures, IaC modules, pipeline templates, and security baselines.Maintain ADRs, runbooks, data contracts, user guides, and ensure Section 508 compliance. Minimum Qualifications (Must-Have)10 years of software and/or ML engineering experience; 5 years cloud experience.Hands-on, recent experience building production AI systems using Python, PyTorch or TensorFlow, and FastAPI.Proven delivery of GenAI/RAG solutions, including embeddings, vector databases (FAISS, Milvus, pgvector), and evaluation frameworks.Strong cloud experience on AWS and/or Azure, including security, networking, monitoring, and operations.Production experience running AI workloads on Kubernetes (EKS and/or AKS).Implemented cloud-native MLOps using SageMaker, Databricks, or Azure ML with CI/CD and model registries.Federal delivery experience with FISMA/FedRAMP/RMF and ATO processes.Excellent communication skills with the ability to translate technical solutions into mission impact and ROI. Preferred QualificationsExperience supporting HHS, DHS, USDA, NOAA, or IRS.Active or eligible Public Trust clearance.Experience in AWS GovCloud and/or Azure Government environments.Certifications: AWS Solutions Architect (Associate or Professional), Azure Solutions Architect Expert, DP-100, DP-203.Experience with model risk management, adversarial testing/red teaming, and Section 508 evaluations.Company DescriptionHIGH Growth, small business. Check us out at https://archsystemsinc.com/Arch Systems LLC is committed to diversity in its workforce and is proud to be an equal opportunity employer. Arch Systems LLC considers qualified applicants without regard to race, color, religion, creed, gender, national origin, age, disability, veteran status, marital status, pregnancy, sex, gender expression or identity, sexual orientation, or any other legally protected class. Arch Systems LLC is an Affirmative Action and Equal Opportunity Employer.
Salary : $150,000 - $180,000