What are the responsibilities and job description for the Member of Technical Staff - Compilers position at Architect?
About Us
Architect is a frontier AI lab for chip design. We build AI models and tools for on-demand custom ASICs at scale. Our goal is to co-design custom ASICs alongside evolving ML workloads, and enable a new era of domain-specific chips that unlock capabilities impossible with current hardware paradigms. Born out of Stanford Research, our team blends AI with Silicon with a founding team from Anthropic, Google DeepMind, Meta SuperIntelligence, xAI, Apple and Intel.
We're looking for staff/principal-level compiler engineers with deep experience building code generation toolchains for custom AI accelerators. Ideal candidates have shipped production compilers at places like Apple, Google (XLA/TPU), Groq, Cerebras, Qualcomm, AMD, or similar.
What You'll Do
As a Member of the Technical Staff on the Compilers team at Architect, you'll own the compiler stack targeting our SIMD/VLIW NPU — from graph ingestion through code generation on production silicon. You'll work directly with the NPU architect to co-design the ISA, closing the loop between compiler needs and hardware decisions.
Qualifications & Skills:
Architect is a frontier AI lab for chip design. We build AI models and tools for on-demand custom ASICs at scale. Our goal is to co-design custom ASICs alongside evolving ML workloads, and enable a new era of domain-specific chips that unlock capabilities impossible with current hardware paradigms. Born out of Stanford Research, our team blends AI with Silicon with a founding team from Anthropic, Google DeepMind, Meta SuperIntelligence, xAI, Apple and Intel.
We're looking for staff/principal-level compiler engineers with deep experience building code generation toolchains for custom AI accelerators. Ideal candidates have shipped production compilers at places like Apple, Google (XLA/TPU), Groq, Cerebras, Qualcomm, AMD, or similar.
What You'll Do
As a Member of the Technical Staff on the Compilers team at Architect, you'll own the compiler stack targeting our SIMD/VLIW NPU — from graph ingestion through code generation on production silicon. You'll work directly with the NPU architect to co-design the ISA, closing the loop between compiler needs and hardware decisions.
- Own the compiler end-to-end: graph ingestion (ONNX, PyTorch) through IR optimization, AI-driven code generation, instruction scheduling, and register allocation for a SIMD/VLIW NPU.
- Implement and own the memory management layer; for instance SW-managed on-chip scratchpad memory with the compiler handling data tiling, bank allocation, DMA scheduling, and double-buffering across SRAM banks.
- Design and iterate on mid-end and backend optimization passes: operator fusion, loop transformations, vectorization, and software pipelining to close the gap between peak and achieved throughput.
- Co-design the ISA and instruction encoding with the architect and silicon team. Feed real workload performance data back into architectural decisions.
- Support quantization and mixed-precision lowering (32bit single-precision FP or INT, along with lower INT8/4, BF16, FP16/8/4 precisions) with correct numerics end-to-end.
- Benchmark compiler output against cycle-accurate models, RTL simulation, and FPGA prototypes. Own QoR tracking.
- Grow into a compiler team lead as the team scales.
Qualifications & Skills:
- Degree: Bachelor's, Master's, or PhD in Computer Science, Computer Engineering, or a closely related field.
- Experience: 5 years building compilers or code generation toolchains for custom accelerators. Must have targeted ML/AI hardware compiler experience, as general-purpose (GCC/LLVM for CPUs) is not sufficient.
- Domain Background: Hands-on experience on at least one of: Apple Neural Engine compiler, Google XLA / Edge TPU / TPU codegen, Groq TSP compiler (spatial scheduling, IR dialect design), Cerebras compiler stack, Qualcomm Hexagon NN / AI Engine, AMD AIE / Vitis AI, or similar/equivalent custom accelerator compiler(s).
- Backend Mechanics: Strong grasp of instruction scheduling, register allocation, and software pipelining — especially for SIMD/VLIW or spatial architectures.
- ML Optimizations: Experience with tiling strategies, loop nest optimization, and operator fusion for ML workloads (such as convolution, attention, element-wise ops, reduction, transpositions, etc.).
- SW-Managed Memory: Experience with scratchpad type memory allocation, data layout, DMA orchestration, and multi-buffering.
- Coding: Strong C . Python proficiency. Familiarity with MLIR or LLVM infrastructure.
- Leadership: Ability to lead and grow the compiler team over time.
- HW/SW co-design experience: defining ISA features, instruction encodings, or hardware interfaces driven by compiler needs.
- IR design for ML accelerators (custom dialects, MLIR-based flows, or graph-level IRs like XLA HLO).
- ML framework experience (PyTorch, TensorFlow) and portable graph formats (ONNX).
- Experience benchmarking and profiling compiler output on real hardware, FPGA, or cycle-accurate simulators.
- Understanding of ML inference systems and workload-level optimizations: FlashAttention, RadixAttention, PagedAttention, continuous batching, speculative decoding, KV cache management, and prefill/decode scheduling.
- Contributions to open-source ML compiler projects (TVM, MLIR, Triton, XLA).
- Domain-specific expertise: Track record on energy-efficient, high-performance HW accelerator bring-up.
- Competitive salary and meaningful equity stake
- Fast-paced startup with autonomy and visible impact
- Cutting-edge challenges at the intersection of AI and silicon design
- Direct ownership of the compiler stack as we scale