What are the responsibilities and job description for the Wireless Reliability Engineer (AP SRE) - San Jose, CA position at Nile Global Inc?
Position: Wireless Reliability Engineer (AP SRE)
Location: San Jose, CA
Mission:
Help eliminate bad Wi‑Fi experiences by making Nile’s access point platform measurably more reliable before it reaches production.
Nile delivers Connectivity as a Service for enterprise campuses. Instead of one‑off hardware and manual break/fix testing, we operate a service with strong reliability and security guarantees. This role sits at the intersection of wireless, systems, and software engineering to make that possible.
Role Overview:
As a Wireless Reliability Engineer on the AP SRE team, you will own the reliability of Nile’s access point platform across performance, correctness, and security, primarily in pre‑production environments. You will:
- Design and evolve the automation, validation, and chaos frameworks that exercise our APs in CI/CD and in the lab.
- Drive deep L1–L7 investigations when complex wireless issues appear and convert those learnings into durable tests and platform changes.
- Partner closely with firmware, cloud SRE, and security to ensure reliability is engineered in, not bolted on.
- Drive deep L1–L7 investigations when complex wireless issues appear and convert those learnings into durable tests and platform changes.
- Partner closely with firmware, cloud SRE, and security to ensure reliability is engineered in, not bolted on.
This is an individual contributor role at Senior / Staff level, with high technical ownership and visibility.
What You’ll Do
Build the Machine That Tests the Machine
- Architect Python-based automation scripts that integrate into CI/CD to validate AP firmware, drivers, and wireless features at scale.
- Continuously increase automated coverage of functional, performance, and longevity tests for Wi‑Fi features (11ax/11be), roaming, QoS, and management-plane behavior.
- Define test strategy and guardrails: what must be covered on every change, on nightly runs, and on long‑running stress suites.
Wireless Reliability & Chaos Engineering (Pre‑Production)
- Design and run chaos and stress scenarios against Wi‑Fi and datapath stacks: RF impairments, load patterns, roaming storms, congestion, and failure injections.
- Characterize and harden the system under realistic client and traffic mixes using tools like IxChariot/IxANVL, Spirent, or Alethea (or equivalent).
- Turn intermittent or “rare” failures into reproducible automated tests that block regressions.
Security & Zero Trust Validation
- Validate 802.1X, WPA3, and Zero Trust campus behavior, including onboarding flows, policy enforcement, and failure/attack scenarios.
- Work with security engineering to translate threat models into repeatable test plans and automation (e.g., auth storms, misbehaving supplicants, rogue AP/client scenarios).
Hardware, SoC, and Telemetry
- Use silicon‑level telemetry and debug hooks on Qualcomm (or similar) AP SoCs to understand RF performance, power, and error behavior under load.
- Collaborate with silicon/firmware teams to correlate lab findings with driver/firmware changes and influence roadmap and design decisions.
Deep‑Dive Debugging & RCAs
- Lead deep technical investigations across L1–L7 when Nile or customer scenarios expose weird or hard‑to‑reproduce behavior.
- Produce clear RCAs that tie symptoms to root cause and result in concrete fixes in firmware, cloud controllers, or test systems.
- Feed every critical RCA back into automation, metrics, or specifications so the same class of issue does not recur.
What You’ll Do
Build the Machine That Tests the Machine
- Architect Python-based automation scripts that integrate into CI/CD to validate AP firmware, drivers, and wireless features at scale.
- Continuously increase automated coverage of functional, performance, and longevity tests for Wi‑Fi features (11ax/11be), roaming, QoS, and management-plane behavior.
- Define test strategy and guardrails: what must be covered on every change, on nightly runs, and on long‑running stress suites.
Wireless Reliability & Chaos Engineering (Pre‑Production)
- Design and run chaos and stress scenarios against Wi‑Fi and datapath stacks: RF impairments, load patterns, roaming storms, congestion, and failure injections.
- Characterize and harden the system under realistic client and traffic mixes using tools like IxChariot/IxANVL, Spirent, or Alethea (or equivalent).
- Turn intermittent or “rare” failures into reproducible automated tests that block regressions.
Security & Zero Trust Validation
- Validate 802.1X, WPA3, and Zero Trust campus behavior, including onboarding flows, policy enforcement, and failure/attack scenarios.
- Work with security engineering to translate threat models into repeatable test plans and automation (e.g., auth storms, misbehaving supplicants, rogue AP/client scenarios).
Hardware, SoC, and Telemetry
- Use silicon‑level telemetry and debug hooks on Qualcomm (or similar) AP SoCs to understand RF performance, power, and error behavior under load.
- Collaborate with silicon/firmware teams to correlate lab findings with driver/firmware changes and influence roadmap and design decisions.
Deep‑Dive Debugging & RCAs
- Lead deep technical investigations across L1–L7 when Nile or customer scenarios expose weird or hard‑to‑reproduce behavior.
- Produce clear RCAs that tie symptoms to root cause and result in concrete fixes in firmware, cloud controllers, or test systems.
- Feed every critical RCA back into automation, metrics, or specifications so the same class of issue does not recur.
How You’ll Work:
- Collaboration: Work day‑to‑day with AP firmware, wireless systems, cloud SRE, security, and product teams. You will often be the bridge between RF realities, protocol behavior, and software implementation.
- Scope: Primary focus is pre‑production reliability and validation. You will also engage with selected high‑severity production incidents when deep wireless expertise is needed and then codify those learnings back into tests.
- On‑call: This role is not a traditional 24×7 production on‑call rotation, but you may be pulled into critical incident investigations where AP or wireless expertise is required.
In Your First 6–12 Months, You Will
- Design and roll out a next‑generation AP reliability test scripts in CI/CD for AP features that you own.
- Build and stabilize a set of chaos/stress scenarios that uncover new issues in Wi‑Fi performance, roaming, and security under load.
- Lead multiple complex RCAs on wireless issues (lab or field) that directly result in:
- Firmware or SoC configuration changes.
- New automated tests or monitors.
- Updated best practices for deployment or configuration.
- Design and roll out a next‑generation AP reliability test scripts in CI/CD for AP features that you own.
- Build and stabilize a set of chaos/stress scenarios that uncover new issues in Wi‑Fi performance, roaming, and security under load.
- Lead multiple complex RCAs on wireless issues (lab or field) that directly result in:
- Firmware or SoC configuration changes.
- New automated tests or monitors.
- Updated best practices for deployment or configuration.
What You Bring:
Must‑Have Experience
- Experience: ~5 years in one or more of: wireless systems engineering, AP or client firmware validation, Wi‑Fi performance/reliability, or SRE for networking systems.
- Wireless proficiency:
- Hands‑on work with 802.11ac/ax (11ax required; 11be familiarity a plus).
- Comfortable reading and interpreting packet captures (e.g., Wireshark) and RF measurements; you know what a “clean” RF environment looks like and how to recognize common impairments.
- Software & Automation:
- Strong Python skills building test frameworks, harnesses, or tooling (not just one‑off scripts).
- Experience integrating tests into CI/CD pipelines (GitLab CI, Jenkins, etc.).
- Debugging mindset:
- Proven track record debugging complex multi‑component systems (AP clients backbone cloud).
- Ability to design experiments, isolate variables, and turn qualitative “it seems flaky” reports into measurable hypotheses.
- Security fundamentals:
- Working knowledge of 802.1X, EAP, WPA2/WPA3, and robust onboarding flows.
- Comfort validating security behavior (e.g., PMF, key management, misconfigurations, and downgrade/failure scenarios).
Nice to Have:
- Deep familiarity with 11be/EHT, OFDMA, MU‑MIMO, and multi‑band/multi‑link operation.
- Handson experience with Qualcomm or similar Wi‑Fi SoC platforms and corresponding debug/telemetry interfaces.
- Experience with one or more Wi‑Fi test tools/stacks:
- Ixia, IxChariot, IxANVL, Spirent, Alethea, or similar.
- RF impairers, channel emulators, or shielded chambers.
- Prior work in SRE‑style roles for networking/wireless services: SLIs/SLOs, error budgets, on‑call, and incident management.
- Experience using AI‑assisted development tools (Cursor, Copilot, etc.) as part of your daily workflow.
Why Nile:
- Problem Space: Work on end‑to‑end campus Wi‑Fi and Zero Trust at scale, not just isolated AP features.
- Impact: Your work ships directly into Nile’s service, every improvement in your lab shows up as higher reliability for customers.
- Environment: Small, senior team, low bureaucracy, and tight feedback loops between design, implementation, and validation.
- Ownership: You’ll have genuine ownership over how we test and qualify our AP platform, and a strong voice in product and architecture decisions that affect reliability.
If you’re a hands-on engineer who enjoys combining RF, protocols, and software to make systems robust and you’d rather build reliable machines that test machines than run the same manual tests twice, we’d like to talk.