What are the responsibilities and job description for the Machine Learning Engineer position at NextGenPros Inc?
Requirements:
· 8 years of experience building production ML inference systems.
· Demonstrated ownership of deep-learning inference optimization in production (quantization, distillation, compilation, kernel/profile-level performance work) for transformer NLP and/or CV models.
· Experience with both TensorFlow (SavedModel, tf.data, XLA, TFLite) and PyTorch (TorchScript, ONNX, FastAPI/TorchServe)
· Hands-on experience optimizing inference pipelines on AWS infrastructure, ideally across different types of media assets.
· Experience with video frameworks/tools (e.g., FFmpeg), and working with large-scale frame-level inference.
· Demonstrated experience monitoring and debugging model latency, memory, and pipeline throughput.
· Experience with hybrid search architectures (BM25 vector search cross-encoder reranking).
· Familiarity with OpenAI APIs or other foundation model providers.
· Familiarity with open source HuggingFace LLMs.
· Experience with data pipeline and workflow orchestration tools (e.g., Airflow)