What are the responsibilities and job description for the Software Engineer position at techire.®?
We’re looking for an Inference Engineer to design and optimize the systems that power our models in production.
This role sits at the intersection of:
- ML systems
- distributed systems
- hardware-aware performance engineering
You’ll take cutting-edge models and make them fast, scalable, and efficient in real-world environments.
What You’ll Work On
Inference Systems & Serving
- Design and build low-latency inference pipelines for large multimodal models
- Implement advanced serving techniques such as:
- continuous batching
- KV cache optimization
- Work with modern inference frameworks (e.g. vLLM, SGLang, TensorRT-LLM, Triton)
Performance Optimization
- Optimize inference across:
- model level (quantization, architecture-aware tuning)
- hardware level (GPU / accelerator utilization, kernel optimization)
- Improve latency, throughput, and cost efficiency for production systems
- Profile and debug bottlenecks using tools like Nsight, nsys, or similar
Distributed & Real-Time Systems
- Build high-throughput, distributed inference infrastructure
- Design systems for real-time workloads with strict latency constraints
- Optimize multi-GPU / multi-node inference using:
- tensor parallelism
- pipeline parallelism
- distributed scheduling
Infrastructure & Observability
- Develop robust monitoring, benchmarking, and evaluation systems
- Track metrics such as:
- GPU utilization
- Build tooling to support rapid iteration and production reliability
Research → Production
- Work closely with research teams to productionize new model architectures
- Translate experimental ideas into high-performance serving systems
- Contribute to the design of next-generation inference stacks
Why This Role
- Work on cutting-edge AI systems that go beyond current model limitations
- Solve hard systems problems at the core of how modern AI runs
- Join a team that values:
- speed
- ownership
- technical excellence
Compensation & Benefits
- Competitive salary equity
- Full medical, dental, and vision coverage
- In-office meals and a highly collaborative environment
How to Apply
- If you’re excited about building high-performance inference systems and pushing the limits of real-time AI, we’d love to hear from you.
Salary : $200,000 - $350,000