What are the responsibilities and job description for the Python Insfrastructure Engineer - Model Evaluation position at Alignerr?
Python Infrastructure Engineer — Model Evaluation (AI Training)
About The Role
What if your Python expertise could directly shape how the world's most advanced AI models are built, evaluated, and improved? We're looking for a Senior Python Infrastructure Engineer to design and build the data pipelines, evaluation harnesses, and annotation tooling that power next-generation AI systems at leading research labs.
This is a fully remote contract role with serious technical depth — the kind of work that ships to production and influences model quality at scale.
About The Role
What if your Python expertise could directly shape how the world's most advanced AI models are built, evaluated, and improved? We're looking for a Senior Python Infrastructure Engineer to design and build the data pipelines, evaluation harnesses, and annotation tooling that power next-generation AI systems at leading research labs.
This is a fully remote contract role with serious technical depth — the kind of work that ships to production and influences model quality at scale.
- Organization: Alignerr
- Type: Hourly Contract
- Location: Remote
- Commitment: 20–40 hours/week
- Design, build, and optimize high-performance Python systems supporting AI data pipelines and model evaluation workflows
- Develop full-stack tooling and backend services for large-scale data annotation, validation, and quality control
- Build and maintain evaluation harnesses that integrate with inference frameworks and benchmarking pipelines
- Improve reliability, performance, and safety across existing Python codebases
- Instrument systems with observability tooling and metrics collection to monitor model performance and system health
- Identify bottlenecks and edge cases in data and system behavior, and implement scalable, maintainable fixes
- Collaborate with data, research, and engineering teams through synchronous design reviews and async communication
- Native or fluent English speaker with clear written and verbal communication skills
- 3–5 years of professional experience writing production-grade Python
- Full-stack developer with a strong systems programming background
- Experienced building evaluation harnesses for ML models and integrating with inference frameworks
- Strong grasp of observability, metrics collection, and system reliability practices
- Able to commit 20–40 hours per week with consistent availability
- Prior experience with data annotation pipelines, data quality systems, or model evaluation infrastructure
- Familiarity with AI/ML workflows, model training, or benchmarking frameworks
- Experience with distributed systems or internal developer tooling
- Background working directly with AI labs or ML research teams
- Work on real production systems at the frontier of AI development alongside leading research labs
- Fully remote and flexible — work from wherever you do your best work
- Freelance autonomy with the structure of high-impact, technically challenging projects
- Make a direct, measurable contribution to how next-generation AI models are evaluated and improved
- Potential for ongoing work and contract extension as new projects launch
Salary : $50 - $75