What are the responsibilities and job description for the Senior AWS Agentcore Platform Engineer position at InvestM Technology LLC?
Role: Senior AWS Agentcore Platform Engineer
Position Type: Contract 6 months
Location: Reading, PA or Exton, PA (Hybrid 2-3 days a week from office)
Job Description 2. Cost Tracking & TCO (Total Cost of Ownership) 3. Monitoring & Incident Management 4. Security & Governance
Position Type: Contract 6 months
Location: Reading, PA or Exton, PA (Hybrid 2-3 days a week from office)
Job Description 2. Cost Tracking & TCO (Total Cost of Ownership) 3. Monitoring & Incident Management 4. Security & Governance
- Observability & Distributed Tracing
- Gap Analysis: Assess AWS CloudWatch, X-Ray, Bedrock logging, and AgentCore traces against agentic workflow requirements; produce a comprehensive gap analysis and lead the setup of observability within Dynatrace.
- Validation Pipelines: Design and implement post-deployment validation pipelines for agents and Model Context Protocol (MCP) servers, ensuring deployment health and successful tool registration.
- Tracing & Logging: Implement distributed tracing and structured logging to capture LLM decision logic, tool selections, sub-agent calls, and MCP interactions.
- Architecture Strategy: Evaluate LangFuse and LiteLLM proxies against AWS-native solutions; deliver a target-state observability architecture recommendation.
- Taxonomy Expansion: Extend tagging taxonomy to capture costs across agent runtimes, MCP servers, vector databases, and Bedrock token consumption per namespace.
- Cost Modeling: Design a granular cost visibility model to aggregate expenses for agents, MCPs, and LLM tokens by team and department.
- Dashboards & Alerting: Build CloudWatch (or equivalent) dashboards for per-team spending; configure AWS Budgets with proactive alerting thresholds.
- Automation: Automate cost reporting via email and Microsoft Teams, incorporating anomaly detection rules to identify spend spikes.
- Alerting Framework: Define and implement P1 P4 alerting rules covering deployment failures, runtime errors, tool invocation failures, and MCP connectivity issues.
- Incident Integration: Integrate alert notifications with Microsoft Teams and email, utilizing resource ownership tags for intelligent routing.
- Operational Excellence: Author detailed runbooks for every alert; publish and maintain these in Confluence to facilitate developer self-service resolution.
- Stack Evaluation: Compare AWS-native vs. third-party monitoring stacks to deliver a long-term recommendation aligned with the broader observability architecture.
- Risk Assessment: Evaluate current IAM and tagging strategies for multi-team isolation; identify scalability gaps and potential security risks.
- Policy Engines: Assess the Cedar policy engine (AgentCore) for fine-grained tool access control and document gaps for enterprise-scale deployment.
- Identity Architecture: Design a scalable Attribute-Based Access Control (ABAC) identity model to ensure multi-team isolation without IAM policy sprawl; deliver production-ready Terraform modules.