What are the responsibilities and job description for the Member of Technical Staff - Compilers position at Architect?

About Us

Architect is a frontier AI lab for chip design. We build AI models and tools for on-demand custom ASICs at scale. Our goal is to co-design custom ASICs alongside evolving ML workloads, and enable a new era of domain-specific chips that unlock capabilities impossible with current hardware paradigms. Born out of Stanford Research, our team blends AI with Silicon with a founding team from Anthropic, Google DeepMind, Meta SuperIntelligence, xAI, Apple and Intel.

We're looking for staff/principal-level compiler engineers with deep experience building code generation toolchains for custom AI accelerators. Ideal candidates have shipped production compilers at places like Apple, Google (XLA/TPU), Groq, Cerebras, Qualcomm, AMD, or similar.

What You'll Do

As a Member of the Technical Staff on the Compilers team at Architect, you'll own the compiler stack targeting our SIMD/VLIW NPU — from graph ingestion through code generation on production silicon. You'll work directly with the NPU architect to co-design the ISA, closing the loop between compiler needs and hardware decisions.

Own the compiler end-to-end: graph ingestion (ONNX, PyTorch) through IR optimization, AI-driven code generation, instruction scheduling, and register allocation for a SIMD/VLIW NPU.
Implement and own the memory management layer; for instance SW-managed on-chip scratchpad memory with the compiler handling data tiling, bank allocation, DMA scheduling, and double-buffering across SRAM banks.
Design and iterate on mid-end and backend optimization passes: operator fusion, loop transformations, vectorization, and software pipelining to close the gap between peak and achieved throughput.
Co-design the ISA and instruction encoding with the architect and silicon team. Feed real workload performance data back into architectural decisions.
Support quantization and mixed-precision lowering (32bit single-precision FP or INT, along with lower INT8/4, BF16, FP16/8/4 precisions) with correct numerics end-to-end.
Benchmark compiler output against cycle-accurate models, RTL simulation, and FPGA prototypes. Own QoR tracking.
Grow into a compiler team lead as the team scales.

What We'd Like To See

Qualifications & Skills:

Degree: Bachelor's, Master's, or PhD in Computer Science, Computer Engineering, or a closely related field.
Experience: 5 years building compilers or code generation toolchains for custom accelerators. Must have targeted ML/AI hardware compiler experience, as general-purpose (GCC/LLVM for CPUs) is not sufficient.
Domain Background: Hands-on experience on at least one of: Apple Neural Engine compiler, Google XLA / Edge TPU / TPU codegen, Groq TSP compiler (spatial scheduling, IR dialect design), Cerebras compiler stack, Qualcomm Hexagon NN / AI Engine, AMD AIE / Vitis AI, or similar/equivalent custom accelerator compiler(s).
Backend Mechanics: Strong grasp of instruction scheduling, register allocation, and software pipelining — especially for SIMD/VLIW or spatial architectures.
ML Optimizations: Experience with tiling strategies, loop nest optimization, and operator fusion for ML workloads (such as convolution, attention, element-wise ops, reduction, transpositions, etc.).
SW-Managed Memory: Experience with scratchpad type memory allocation, data layout, DMA orchestration, and multi-buffering.
Coding: Strong C . Python proficiency. Familiarity with MLIR or LLVM infrastructure.
Leadership: Ability to lead and grow the compiler team over time.

Bonus

HW/SW co-design experience: defining ISA features, instruction encodings, or hardware interfaces driven by compiler needs.
IR design for ML accelerators (custom dialects, MLIR-based flows, or graph-level IRs like XLA HLO).
ML framework experience (PyTorch, TensorFlow) and portable graph formats (ONNX).
Experience benchmarking and profiling compiler output on real hardware, FPGA, or cycle-accurate simulators.
Understanding of ML inference systems and workload-level optimizations: FlashAttention, RadixAttention, PagedAttention, continuous batching, speculative decoding, KV cache management, and prefill/decode scheduling.
Contributions to open-source ML compiler projects (TVM, MLIR, Triton, XLA).
Domain-specific expertise: Track record on energy-efficient, high-performance HW accelerator bring-up.

What We Offer

Competitive salary and meaningful equity stake
Fast-paced startup with autonomy and visible impact
Cutting-edge challenges at the intersection of AI and silicon design
Direct ownership of the compiler stack as we scale

Apply for this job

Receive alerts for other Member of Technical Staff - Compilers job openings

Member of Technical Staff - Compilers

What are the responsibilities and job description for the Member of Technical Staff - Compilers position at Architect?

What is the career path for a Member of Technical Staff - Compilers?

Job openings at Architect

Not the job you're looking for? Here are some other Member of Technical Staff - Compilers jobs in the Palo Alto, CA area that may be a better fit.

We don't have any other Member of Technical Staff - Compilers jobs in the Palo Alto, CA area right now.

AI Assistant is available now!