What are the responsibilities and job description for the Infrastructure Solutions Architect- Sr. Infrastructure Engineer position at iFlow Inc?
Job Title: Sr. Infrastructure Engineer Location: Palo Alto, CA Duration: 6 Months Experience:7-15 YearsDescription: Responsibilities:Manage AWS environment using Control Tower, EKS, EC2, S3 and related services.Triage and resolve ServiceNow tickets (OS-level and cloud-level troubleshooting, vulnerability remediation) and meet SLA requirements.Commission / decommission cloud resources and manage lifecycle activities.Plan and execute DR activities with application teams; respond to RCAs and implement corrective actions.Perform backups, patching, and maintenance of instances and cloud resources.Carry out cloud migration and site externalization tasks.Perform cleanup, cost-optimization analysis, and implement cost-saving measures.Build small automations and PoCs to improve operational efficiency.Create and implement change controls; maintain thorough documentation and standards.Collaborate with Windows and Linux platform engineers for OS-level troubleshooting.Design, implement, and manage network infrastructure (LAN, WAN, WLAN, VPN, Firewalls, Routers, Switches).Architect, deploy, and manage cloud infrastructure across multiple providers (AWS, Azure), including IaaS, PaaS, and SaaS offerings.Integrate and maintain Windows-based systems as part of hybrid environments.Design and implement disaster recovery and business continuity plans for cloud and hybrid systems.Configure and secure network and cloud environments, including firewalls, routers, switches, and VPNs.Monitor infrastructure performance, address issues, and optimize for efficiency.Collaborate on and enhance existing network and system monitoring tools.Scale infrastructure to meet growing research and product demands while maintaining reliability and performanceImplement and maintain serverless architectures and container orchestration systemsCollaborate with research teams to understand requirements and translate them into robust infrastructure solutionsDevelop monitoring, alerting, and observability systems to ensure operational excellenceParticipate in on-call rotations and incident response to maintain system reliabilityContribute to infrastructure automation and tooling that improves developer productivityPartner with security teams to ensure production infrastructure maintains appropriate SLAs and SLOs.Design, deploy, and manage cloud infrastructure on AWS (EC2, VPC, IAM, S3, RDS, Lambda, ECS/EKS).Build and maintain Infrastructure as Code (IaC) using Terraform or CloudFormation.Implement and optimize CI/CD pipelines and automated deployment workflows.Manage containerized workloads using Docker and Kubernetes.Ensure cloud environment security, compliance, and cost optimization.Build monitoring and observability dashboards (CloudWatch, Grafana, Prometheus).Troubleshoot cloud performance issues and support production environments.Collaborate with development, security, and operations teams for smooth delivery.Strong proficiency in Python for automation, data handling, and tool developmentHands-on experience with monitoring and observability tools such as Prometheus, Grafana, Datadog, CloudWatch, or SplunkDemonstrated expertise in ITSM practices, including incident, problem, and process improvementAbility to implement secure and compliant offboarding procedures and manage access-related tasksBachelor’s degree in Computer Science, Information Security, or related field (or equivalent experience)3 years of experience in cloud security, with a focus on AWSStrong understanding of AWS services (EC2, S3, VPC, Lambda, RDS, etc.)Proficiency in scripting languages (Python, Bash, etc.) and infrastructure-as-code tools (Terraform, CloudFormation)Experience with security tools such as AWS WAF, KMS, Inspector, and MacieFamiliarity with SIEM tools and log analysisKnowledge of network security, encryption, and identity managementManages the day-to-day support, policy and engineering of the Endpoint Internet Access Control tools, Zscaler Cloud proxy for production and test environments.This includes incident, request and change control tickets, problem ticket response/resolution within ticket SLA, Zscaler support tickets, block and unblock policies, testing and deployment of policy ruleset, Proxy Access Control file management and Zscaler cloud platform clean up and maintenance, Splunk logging, querying and dashboarding.One to one work with teammates, teams, help desk, problem management and others to resolve issues or implement new policy or cloud platform scenarios. Team assistance and brainstorming.Certificate management, IdP and access control, traffic flow, network and firewall control integration for the proxies, cloud backup and disaster recovery and on-prem device management.Zscaler Agent deployment, support and management, with Group Policy Controls, authentication and SCIM/SAML/SSO.API support.Sanctioned SaaS and other application integration.F5 GSLB support.Process and procedural documentation creation, revisions to include PPM storage, help desk procedures and runbooks.Daily integrations with other teams including Cyber Incident Response, Antivirus, NAC, Firewall and Network.Requirements:AWS Control Tower (operational governance)EKS (Kubernetes on AWS) administration and troubleshootingEC2 instance lifecycle, patching, performance troubleshootingTerraform infrastructure as code for provisioning and change managementPython scripting/automation for operational tasksS3 lifecycle, permissions, and security best practicesStrong Windows and Linux troubleshooting skills (OS-level diagnostics)Experience working with ServiceNow (ticketing, SLAs, change management)Familiarity with vulnerability remediation workflows and security patchingExperience with cost-optimization tools and AWS billing insightsFamiliarity with additional AWS services (RDS, CloudWatch, IAM, GuardDuty)Prior experience running DR exercises and post-mortem RCA workExperience building operational runbooks and playbooksBachelor’s degree in Computer Science, Information Technology, or a related field (or equivalent experience).7 years of experience in network and cloud engineering with a focus on Windows integration.Strong understanding of networking protocols (TCP/IP, BGP, OSPF).Proficiency in configuring and administering Windows-based systems in hybrid environments, including Hyper-V, Clustering and Active Directory ServicesHands-on experience with major cloud platforms (AWS, Azure).Expertise in network security principles and practices.Skilled in using PowerShell scripting for automation.Strong troubleshooting abilities in complex, hybrid network and system setups.Excellent communication, collaboration, and time management skills.Experience designing or implementing end-to-end automation pipelines and internal operational toolsPrior experience in security-conscious or compliance-heavy environments (financial services, healthcare, SaaS, etc.)Expertise in creating comprehensive monitoring solutions, custom dashboards, and automated reporting mechanismsTrack record of success in fast-paced, high-growth environments with constantly evolving operational needsStrong documentation habits and demonstrated commitment to continuous improvement and knowledge managementExperience operating production systems on AWS using Kubernetes, Terraform, and observability tooling (e.g., Datadog, Prometheus, SumoLogic). Strong background in Postgres or other relational databases. Bonus for Python or Go scripting.Familiarity with compliance frameworks such as FedRAMP, PCI, and SOC2.Excellent communication, interpersonal, and problem-solving skills.Ability to work effectively in a fast-paced, dynamic environment.
Salary : $127,500 - $173,400