What are the responsibilities and job description for the Machine Learning Engineer (Staff Level) position at Tykhe Inc?
One of our clients who are a leading provider of Revenue Cycle Management (RCM) for the healthcare industry are looking to fill "ML Engineer" (various levels - Senior, Lead & Staff) who has experience owning training and/or serving in production at scale.
Hybrid role (3 days onsite either from San Jose, CA or Austin, TX)
Educational Qualifications:
- Bachelor's in computer science, Electrical/Computer Engineering, or a related
field required; Master’s preferred (or equivalent industry experience).
- Strong systems/ML engineering with exposure to distributed training and inference optimization.
Industry Experience:
- 3–5 years in ML/AI engineering roles owning training and/or serving in production at scale.
- Demonstrated success delivering high-throughput, low-latency ML services with reliability and cost improvements.
- Experience collaborating across Research, Platform/Infra, Data, and Product functions.
Technical Skills:
Familiarity with deep learning frameworks: PyTorch (primary), TensorFlow.
Exposure to large model training techniques (DDP, FSDP, ZeRO, pipeline/tensor
parallelism); distributed training experience a plus
Optimization: experience profiling and optimizing code execution and model
inference: (PTQ/QAT/AWQ/GPTQ), pruning, distillation, KV-cache optimization, Flash Attention
Scalable serving: autoscaling, load balancing, streaming, batching, caching;
collaboration with platform engineers.
Data & storage: SQL/NoSQL, vector stores (FAISS/Milvus/Pinecone/pgvector),
Parquet/Delta, object stores.
Write performant, maintainable code
Understanding of the full ML lifecycle: data collection, model training, deployment, inference, optimization, and evaluation.