What are the responsibilities and job description for the Research Intern (LLM) position at abakaai?
Responsibilities
-
Design and construct high-quality, sufficiently challenging QA datasets (graduate/PhD level) inspired by GPQA, HLE, and AI4Sci families, collaborating with a global network of talented researchers.
-
Evaluate large language models on reasoning, factuality, and problem-solving benchmarks.
-
Develop review pipelines and quality-control criteria for expert-level question generation.
-
Analyze model outputs, conduct error taxonomy studies, and summarize insights for internal reports and research papers.
-
Collaborate with the 2077AI Foundation’s open-source benchmark teams on public dataset releases.
Qualifications
-
Strong background in computer science, data engineering, artificial intelligence, or related fields, with hands-on experience in large-scale data systems.
-
1 years of experience with LLMs, prompt engineering, and evaluation frameworks (e.g., LM Eval Harness, OpenCompass).
-
Excellent written and verbal English skills and analytical reasoning.
-
Strong execution and team management skills—able to translate high-level objectives into actionable plans and drive team outcomes.
-
(Preferred) Experience with formal methods, chain-of-thought evaluation, or curriculum generation.
-
(Preferred) Relevant publications in top conferences.