Visual Representation and Recognition without Human Supervision

Abstract: Visual recognition models have seen great advancements by relying on large-scale, carefully curated datasets with human annotations. Most computer vision models leverage human supervision to either construct strong initial representations (e.g. using the ImageNet dataset) or for modeling the visual concepts relevant for downstream tasks (e.g. MS-COCO for object detection). In this thesis, we [...]