What are the responsibilities and job description for the Platform Engineer position at Shields Group Search?
Site Reliability Engineer / Platform Engineer
Location: San Francisco, CA on site
Compensation: $200,000 to $250,000 base salary, plus bonus and equity
Overview
Shields Group Search is partnering with a fast-growing, Series A AI infrastructure company building the connective layer between AI agents and the tools people use every day, including GitHub, Gmail, Notion, Salesforce, and more.
The company is building core infrastructure that allows agents to safely and reliably communicate with external tools, execute workflows, manage authentication, run code, trigger actions, and operate across real-world software environments.
They recently raised a $25M Series A from top-tier investors and have seen rapid revenue growth, with customers ranging from early AI-native startups to major technology companies.
This role is for a hands-on Site Reliability Engineer / Platform Engineer who can help scale, harden, and own the company’s infrastructure as usage grows. The team is looking for someone with real production experience managing cloud infrastructure, reliability, observability, deployment systems, and high-availability backend services.
This is an individual contributor role. Management experience is not required.
The ideal candidate has hands-on experience across SRE, DevOps, backend engineering, infrastructure engineering, cloud platforms, distributed systems, and performance optimization. They should be comfortable owning infrastructure in a fast-moving startup environment and should have evidence that they build, experiment, and go deep outside of assigned work.
What You’ll Do
- Own reliability, scalability, observability, and performance across core production infrastructure
- Manage and improve infrastructure across cloud platforms such as AWS, Vercel, and related systems
- Build and improve the platform infrastructure supporting AI agent workflows, tool execution, authentication, triggers, APIs, sandboxes, and runtime orchestration
- Design and operate reliable backend systems that interact with many third-party tools and APIs
- Improve infrastructure supporting high-throughput, distributed, cloud-native services
- Work across cloud infrastructure, Linux systems, containers, deployment pipelines, service orchestration, CI/CD, and observability tooling
- Build automation that reduces operational burden and improves incident response
- Develop internal productivity tooling, runbooks, monitoring, alerting, dashboards, and reliability workflows
- Debug complex production issues across application, infrastructure, network, database, deployment, and runtime layers
- Improve system performance through tracing, profiling, database query optimization, workflow optimization, CPU/heap profiling, and deep root-cause analysis
- Help manage and improve multiple execution environments, including serverless runtimes, sandboxed code execution, and related backend systems
- Partner closely with product engineers and customers to support important workloads and improve the platform in the process
- Write clear documentation that explains complex systems, operational patterns, and infrastructure decisions
- Help define the reliability culture, infrastructure standards, and technical bar for a small, high-craft engineering team
What They’re Looking For
- 4 years of software engineering, site reliability engineering, infrastructure engineering, DevOps, platform engineering, or distributed systems experience preferred, but not a hard requirement for exceptional candidates
- Hands-on experience managing production infrastructure across cloud environments
- Experience with AWS, Vercel, Kubernetes, Linux, containers, deployment systems, observability tools, or similar infrastructure
- Strong backend engineering fundamentals and ability to write production-quality code
- Experience with monitoring, tracing, logging, alerting, incident response, and system performance
- Experience scaling and operating distributed systems, microservices, APIs, databases, queues, or high-throughput backend services
- Ability to debug hard production issues across many layers of the stack
- Strong systems thinking and ability to understand how infrastructure, application code, databases, deployments, and customer-facing workflows interact
- Ability to build automation and tooling that makes engineering teams faster and more reliable
- Clear written communication and ability to explain technical decisions simply
- High ownership, high urgency, and comfort operating without a playbook
- Strong engineering taste: simple systems, clean abstractions, pragmatic architecture, and reliable execution
Strong Signals
- Experience managing infrastructure for a fast-growing startup or high-scale technical product
- Experience with AWS, Vercel, Kubernetes, Docker, Terraform, CI/CD, observability, deployment automation, or cloud-native infrastructure
- Experience building or operating developer infrastructure, API platforms, automation platforms, workflow engines, internal tooling, or AI infrastructure
- Experience with performance engineering, capacity planning, service decoupling, cloud migrations, or reliability improvement initiatives
- Strong side projects, open-source work, technical writing, infrastructure tools, hardware experiments, embedded systems projects, or other evidence of building outside of work
- Startup experience or experience in fast-moving, ambiguous technical environments
- Evidence of being internet-native, builder-minded, and deeply curious about how systems work
- Ability to move between SRE, platform, backend, infrastructure, and product engineering work without being overly rigid about title or scope
You May Be a Fit If
- You have owned or managed production infrastructure directly
- You are comfortable debugging cloud systems, backend services, databases, deployments, and networking issues
- You can write code and do not see SRE as separate from engineering
- You care about uptime, performance, observability, developer experience, and clean operational workflows
- You build tools to eliminate repetitive operational work
- You like fast-moving startup environments with high ownership
- You have side projects, open-source contributions, technical writing, or other artifacts that show how you think
- You are comfortable being the person who figures things out when there is no playbook
- You want to work with a small team of intense, high-craft engineers building at the frontier of AI infrastructure
Salary : $200,000 - $250,000