What are the responsibilities and job description for the Research Intern (Video) position at abakaai?
Responsibilities
-
Build and refine datasets for video understanding and multimodal reasoning, including temporal QA, action recognition, event prediction, and spatial understanding.
-
Evaluate video-language models (Video-LLMs) and audio-visual datasets, including those derived from large-scale sources such as HowTo100M.
-
Conduct experiments analyzing long-context modeling efficiency, compression strategies, and data optimization techniques.
-
Contribute to benchmark standardization efforts and assist in setting up public leaderboards for evaluation and comparison.
Qualifications
-
Strong background in computer vision, video analytics, or multimodal learning.
-
Proficient in building and managing video data processing pipelines.
-
Understanding of transformer-based temporal models (e.g., TimeSformer, VideoGPT, etc.).
-
(Preferred) Experience with video-QA, action recognition, or multimodal reasoning datasets.
-
(Preferred) Relevant publications in top-tier conferences.