What are the responsibilities and job description for the Research Intern (Video) position at Abaka AI?
- ]:pointer-events-auto [content-visibility:auto] supports-[content-visibility:auto]:[contain-intrinsic-size:auto_100lvh] scroll-mt-[calc(var(--header-height) min(200px,max(70px,20svh)))]" data-turn-id="request-WEB:05e34dc7-a5e1-4b82-bbe9-dd79952f9b5e-4" data-testid="conversation-turn-10" data-scroll-anchor="true" data-turn="assistant">
- ]:pointer-events-auto [content-visibility:auto] supports-[content-visibility:auto]:[contain-intrinsic-size:auto_100lvh] scroll-mt-[calc(var(--header-height) min(200px,max(70px,20svh)))]" data-turn-id="bf8392ec-c980-4b90-9597-3f6a7b197afa" data-testid="conversation-turn-14" data-scroll-anchor="true" data-turn="assistant">
Responsibilities
- Build and refine datasets for video understanding and multimodal reasoning, including temporal QA, action recognition, event prediction, and spatial understanding.
- Evaluate video-language models (Video-LLMs) and audio-visual datasets, including those derived from large-scale sources such as HowTo100M.
- Conduct experiments analyzing long-context modeling efficiency, compression strategies, and data optimization techniques.
- Contribute to benchmark standardization efforts and assist in setting up public leaderboards for evaluation and comparison.
- Strong background in computer vision, video analytics, or multimodal learning.
- Proficient in building and managing video data processing pipelines.
- Understanding of transformer-based temporal models (e.g., TimeSformer, VideoGPT, etc.).
- (Preferred) Experience with video-QA, action recognition, or multimodal reasoning datasets.
- (Preferred) Relevant publications in top-tier conferences.