May 11, 20251 yr These repositories specialize in images, video, annotations, and multimodal resources. Ideal for training models in object detection, segmentation, captioning, and scene understanding. Tools: Visual Genome – Image dataset with region descriptions, relationships, and attributes for scene understanding. Commonly used in VQA and multimodal AI. Million Song Dataset – A comprehensive music dataset for recommendation systems, beat analysis, and music intelligence. Tabby Vision Dataset – Offers labeled images for vision research, especially in medical or environmental contexts.
Create an account or sign in to comment