VASC Seminar
Scene Understanding
Abstract: Accurate and efficient scene understanding is a fundamental task in a variety of computer vision applications including autonomous driving, human-machine interaction, and robot navigation. Reducing computational complexity and memory use is important to minimize response time and power consumption for portable devices such as robots and virtual/augmented devices. Also, it is beneficial for vehicles [...]
Relating First-person and Third-person Videos
Abstract: Thanks to the availability and increasing popularity of wearable devices such as GoPro cameras, smart phones and glasses, we have access to a plethora of videos captured from the first person perspective. Capturing the world from the perspective of one's self, egocentric videos bear characteristics distinct from the more traditional third-person (exocentric) videos. In [...]
Towards better methods of video generation
Abstract: Learning to generate future frames of a video sequence is a challenging research problem with great relevance to reinforcement learning, planning and robotics. Existing approaches either fail to capture the full distribution of outcomes, or yield blurry generations, or both. In this talk I will address two important aspects of video generations: (i) what [...]
Acquiring and Transferring Generalizable Vision-based Robot Skills
Abstract: In recent years, there have been great advances in policy learning for goal-oriented agents. However, there are still major challenges brought by real-world constraints for teaching highly generalizable and versatile robot policies in a cost efficient and safe manner. In this talk, I will argue that instead of aiming to teach large motion repertoires [...]
Learning to localize and anonymize objects with indirect supervision
Abstract: Computer vision has made great strides for problems that can be learned with direct supervision, in which the goal can be precisely defined (e.g., drawing a box that tightly-fits an object). However, direct supervision is often not only costly, but also challenging to obtain when the goal is more ambiguous. In this talk, I [...]
Video Compression for Recognition & Video Recognition for Compression
Abstract: Training robust deep video representations has proven to be much more challenging than learning deep image representations. One reason is: videos are huge and highly redundant. The 'true' and interesting signal often drowns in too much irrelevant data. In the first part of the talk, I will show how to train a deep network [...]
Tracking Beyond Detection
Abstract: The majority of existing vision-based methods perform multi-object tracking in the image domain. Yet, in mobile robotics and autonomous driving scenarios, pixel-precise object localization and trajectory estimation in 3D space are of fundamental importance. Furthermore, the leading paradigms for vision-based multi-object tracking and trajectory prediction heavily rely on object detectors and effectively limit tracking [...]
Exploiting Deviations from Ideal Visual Recurrence
Abstract: Visual repetitions are abundant in our surrounding physical world: small image patches tend to reoccur within a natural image, and across different rescaled versions thereof. Similarly, semantic repetitions appear naturally inside an object class within image datasets, as a result of different views and scales of the same object. We studied deviations from these [...]
Attending to Pixels, Embedding Pixels, Predicting Pixels
Abstract: Nowadays splashy applications heavily depend on meticulously annotated datasets, data-driven and learning-based methods, among which pixel labeling plays an important role yet often lacks interpretability. In this talk, I will discuss how we deal with pixels with better interpretability. Firstly, I'll introduce the pixel embedding framework that allows for clustering pixels into discrete groups [...]
Automatically Supervised Learning: Two more steps on a long journey
Abstract: I will talk about two recent pieces of work that attempt to move towards learning with less reliance on labeled data. In the first, part, I will talk about how the surrogate task of predicting the motion of objects can induce complex representations in neural networks without any labeled data. In the second part of [...]