What are the responsibilities and job description for the Wireless Reliability Engineer (AP SRE) - San Jose, CA position at Nile?
Position: Wireless Reliability Engineer (AP SRE)
Location: San Jose, CA
Mission
Help eliminate bad Wi‑Fi experiences by making Nile’s access point platform measurably more reliable before it reaches production.
Nile delivers Connectivity as a Service for enterprise campuses. Instead of one‑off hardware and manual break/fix testing, we operate a service with strong reliability and security guarantees. This role sits at the intersection of wireless, systems, and software engineering to make that possible.
Role Overview
As a Wireless Reliability Engineer on the AP SRE team, you will own the reliability of Nile’s access point platform across performance, correctness, and security, primarily in pre‑production environments. You will:
What You’ll Do
Build the Machine That Tests the Machine
Must‑Have Experience
Location: San Jose, CA
Mission
Help eliminate bad Wi‑Fi experiences by making Nile’s access point platform measurably more reliable before it reaches production.
Nile delivers Connectivity as a Service for enterprise campuses. Instead of one‑off hardware and manual break/fix testing, we operate a service with strong reliability and security guarantees. This role sits at the intersection of wireless, systems, and software engineering to make that possible.
Role Overview
As a Wireless Reliability Engineer on the AP SRE team, you will own the reliability of Nile’s access point platform across performance, correctness, and security, primarily in pre‑production environments. You will:
- Design and evolve the automation, validation, and chaos frameworks that exercise our APs in CI/CD and in the lab.
- Drive deep L1–L7 investigations when complex wireless issues appear and convert those learnings into durable tests and platform changes.
- Partner closely with firmware, cloud SRE, and security to ensure reliability is engineered in, not bolted on.
What You’ll Do
Build the Machine That Tests the Machine
- Architect Python-based automation scripts that integrate into CI/CD to validate AP firmware, drivers, and wireless features at scale.
- Continuously increase automated coverage of functional, performance, and longevity tests for Wi‑Fi features (11ax/11be), roaming, QoS, and management-plane behavior.
- Define test strategy and guardrails: what must be covered on every change, on nightly runs, and on long‑running stress suites.
- Design and run chaos and stress scenarios against Wi‑Fi and datapath stacks: RF impairments, load patterns, roaming storms, congestion, and failure injections.
- Characterize and harden the system under realistic client and traffic mixes using tools like IxChariot/IxANVL, Spirent, or Alethea (or equivalent).
- Turn intermittent or “rare” failures into reproducible automated tests that block regressions.
- Validate 802.1X, WPA3, and Zero Trust campus behavior, including onboarding flows, policy enforcement, and failure/attack scenarios.
- Work with security engineering to translate threat models into repeatable test plans and automation (e.g., auth storms, misbehaving supplicants, rogue AP/client scenarios).
- Use silicon‑level telemetry and debug hooks on Qualcomm (or similar) AP SoCs to understand RF performance, power, and error behavior under load.
- Collaborate with silicon/firmware teams to correlate lab findings with driver/firmware changes and influence roadmap and design decisions.
- Lead deep technical investigations across L1–L7 when Nile or customer scenarios expose weird or hard‑to‑reproduce behavior.
- Produce clear RCAs that tie symptoms to root cause and result in concrete fixes in firmware, cloud controllers, or test systems.
- Feed every critical RCA back into automation, metrics, or specifications so the same class of issue does not recur.
- Collaboration: Work day‑to‑day with AP firmware, wireless systems, cloud SRE, security, and product teams. You will often be the bridge between RF realities, protocol behavior, and software implementation.
- Scope: Primary focus is pre‑production reliability and validation. You will also engage with selected high‑severity production incidents when deep wireless expertise is needed and then codify those learnings back into tests.
- On‑call: This role is not a traditional 24×7 production on‑call rotation, but you may be pulled into critical incident investigations where AP or wireless expertise is required.
- Design and roll out a next‑generation AP reliability test scripts in CI/CD for AP features that you own.
- Build and stabilize a set of chaos/stress scenarios that uncover new issues in Wi‑Fi performance, roaming, and security under load.
- Lead multiple complex RCAs on wireless issues (lab or field) that directly result in:
- Firmware or SoC configuration changes.
- New automated tests or monitors.
- Updated best practices for deployment or configuration.
Must‑Have Experience
- Experience: ~5 years in one or more of: wireless systems engineering, AP or client firmware validation, Wi‑Fi performance/reliability, or SRE for networking systems.
- Wireless proficiency:
- Hands‑on work with 802.11ac/ax (11ax required; 11be familiarity a plus).
- Comfortable reading and interpreting packet captures (e.g., Wireshark) and RF measurements; you know what a “clean” RF environment looks like and how to recognize common impairments.
- Software & Automation:
- Strong Python skills building test frameworks, harnesses, or tooling (not just one‑off scripts).
- Experience integrating tests into CI/CD pipelines (GitLab CI, Jenkins, etc.).
- Debugging mindset:
- Proven track record debugging complex multi‑component systems (AP clients backbone cloud).
- Ability to design experiments, isolate variables, and turn qualitative “it seems flaky” reports into measurable hypotheses.
- Security fundamentals:
- Working knowledge of 802.1X, EAP, WPA2/WPA3, and robust onboarding flows.
- Comfort validating security behavior (e.g., PMF, key management, misconfigurations, and downgrade/failure scenarios).
- Deep familiarity with 11be/EHT, OFDMA, MU‑MIMO, and multi‑band/multi‑link operation.
- Handson experience with Qualcomm or similar Wi‑Fi SoC platforms and corresponding debug/telemetry interfaces.
- Experience with one or more Wi‑Fi test tools/stacks:
- Ixia, IxChariot, IxANVL, Spirent, Alethea, or similar.
- RF impairers, channel emulators, or shielded chambers.
- Prior work in SRE‑style roles for networking/wireless services: SLIs/SLOs, error budgets, on‑call, and incident management.
- Experience using AI‑assisted development tools (Cursor, Copilot, etc.) as part of your daily workflow.
- Problem Space: Work on end‑to‑end campus Wi‑Fi and Zero Trust at scale, not just isolated AP features.
- Impact: Your work ships directly into Nile’s service, every improvement in your lab shows up as higher reliability for customers.
- Environment: Small, senior team, low bureaucracy, and tight feedback loops between design, implementation, and validation.
- Ownership: You’ll have genuine ownership over how we test and qualify our AP platform, and a strong voice in product and architecture decisions that affect reliability.