What are the responsibilities and job description for the Senior Machine Learning Engineer position at Mathpix?
Location: Brooklyn, NY or Bay Area preferred
Mathpix is looking for a Senior Machine Learning Engineer with deep expertise in computer vision, sequence modeling, and multimodal AI. As a leader on our ML team, you'll play a pivotal role in advancing the state of the art in OCR and related applications, building custom models that push the boundaries of what's possible in text recognition, document understanding, and multimodal learning.
The ideal candidate has a PhD in CS, ML, CV, NLP, or a related field, and many years of experience designing, training, and deploying deep learning models at scale. They have worked on sequence-to-sequence models, attention mechanisms, and large multimodal systems, and are motivated by the challenge of building production-grade AI models for mission-critical applications.
Responsibilities:
- Research, design, and implement custom deep learning models for OCR and multimodal document understanding tasks
- Build and train sequence-to-sequence and attention-based architectures for text recognition, translation, and generation tasks
- Lead development of multimodal language models that combine vision and text for real-world applications (e.g., image-to-text, document parsing)
- Optimize and extend PyTorch-based training pipelines for large-scale datasets and high-performance inference
- Collaborate with product and engineering teams to integrate models into production systems, ensuring scalability, robustness, and efficiency
- Work closely with the in-house data team to define, generate, and curate high-quality training data, enabling rapid iteration on bug fixes and the development of new features
- Mentor junior engineers and provide technical leadership in model architecture, experimentation, and deployment best practices
Required skills:
- PhD in Computer Science, Machine Learning, Computer Vision, NLP, or a related field
- 3 years of hands-on experience in deep learning research and development
- Strong expertise in sequence-to-sequence models, attention mechanisms, and Transformer-based architectures
- Proven experience building and training custom models in PyTorch (not using off-the-shelf models)
- Track record of work in one or more of the following areas: machine translation, text generation, speech-to-text, OCR, image captioning, or related multimodal tasks
- Deep understanding of core ML concepts: optimization, regularization, model scaling, and distributed training
- Demonstrated ability to take models from research to production in a high-stakes environment
Nice to have:
- Experience with large-scale multimodal foundation models and techniques for fine-tuning/adaptation
- Knowledge of advanced evaluation methodologies for sequence and multimodal models
- Publications in top ML/AI/vision conferences or journals (e.g., NeurIPS, CVPR, ACL, ICML)
- Experience mentoring teams and driving research agendas in applied AI settings
- Experience working at a startup or high-growth company is a strong plus - bonus points if you've been part of a founding or early engineering team
- Contributions outside of work (personal projects, open-source work, published articles, or blog posts) are a strong plus and speak for themselves