What are the responsibilities and job description for the Research Intern (Vision / VLM) position at abakaai?
Responsibilities
-
Design workflows to curate high-quality image editing and generation datasets for controllable diffusion and instruction tuning.
-
Conduct evaluations of vision-language models, including image understanding, caption alignment, and editing precision.
-
Assist in the training and evaluation of diffusion models or reward models.
-
Explore visual reasoning datasets that bridge images and text prompts.
Qualifications
-
Strong background in computer science, data engineering, artificial intelligence, or related fields, with hands-on experience in large-scale vision data systems.
-
1 years of experience in computer vision or multimodal machine learning (e.g., PyTorch, Diffusers, CLIP, BLIP, etc.).
-
Solid understanding of image-text alignment and latent-space editing.
-
(Preferred) Familiarity with aesthetic models, diffusion-based editing, vision-language modeling (VLM), or visual question answering (VQA) tasks.
-
(Preferred) Relevant publications in top conferences.