What are the responsibilities and job description for the Research Intern (Vision / VLM) position at Abaka AI?
- ]:pointer-events-auto [content-visibility:auto] supports-[content-visibility:auto]:[contain-intrinsic-size:auto_100lvh] scroll-mt-[calc(var(--header-height) min(200px,max(70px,20svh)))]" data-turn-id="request-WEB:05e34dc7-a5e1-4b82-bbe9-dd79952f9b5e-4" data-testid="conversation-turn-10" data-scroll-anchor="true" data-turn="assistant">
- ]:pointer-events-auto [content-visibility:auto] supports-[content-visibility:auto]:[contain-intrinsic-size:auto_100lvh] scroll-mt-[calc(var(--header-height) min(200px,max(70px,20svh)))]" data-turn-id="bf8392ec-c980-4b90-9597-3f6a7b197afa" data-testid="conversation-turn-14" data-scroll-anchor="true" data-turn="assistant">
Responsibilities
- Design workflows to curate high-quality image editing and generation datasets for controllable diffusion and instruction tuning.
- Conduct evaluations of vision-language models, including image understanding, caption alignment, and editing precision.
- Assist in the training and evaluation of diffusion models or reward models.
- Explore visual reasoning datasets that bridge images and text prompts.
- Strong background in computer science, data engineering, artificial intelligence, or related fields, with hands-on experience in large-scale vision data systems.
- 1 years of experience in computer vision or multimodal machine learning (e.g., PyTorch, Diffusers, CLIP, BLIP, etc.).
- Solid understanding of image-text alignment and latent-space editing.
- (Preferred) Familiarity with aesthetic models, diffusion-based editing, vision-language modeling (VLM), or visual question answering (VQA) tasks.
- (Preferred) Relevant publications in top conferences.