ELLIS Life / NCT Data Science Seminar: Andrea Vedaldi

Discovering Actionable Interpretations from Raw Visual Data: From 2D Clustering to 3D Reconstruction

Machines can repeat what they are told; is this intelligence?

Andrea Vedaldi, University of Oxford
July 28, 11:00 AM (CEST)


In this talk, I will discuss the problem of discovering interpretable and actionable representations of visual data without supervision. I will start by looking at recent approaches such as deep clustering that can learn high-quality image and video representations with no labels. Yet, I will shows that the resulting representations are interpretable only to a limited extent: they often do not map directly to useful concepts or behaviours. Nor it is reasonable to expect a nuanced and informative understanding of images to emerge from simple high-level task such as clustering. I will then suggest that achieving first a lower-level understanding rooted in 2D and 3D geometry may be propaedeutic to developing a more abstract interpretation of visual data without resorting to external supervision. I will discuss some of our recent work on learning the 3D shape of object categories and their correspondences from still images, videos and 3D scans, discussing some of the principles that can be used to design such algorithms. I will also introduce CO3D, a new in-the-wild dataset of videos of 3D objects that can support research in this domain.


Andrea Vedaldi is Professor of Computer Vision and Machine Learning and a co-lead of the VGG group at the Engineering Science department of the University of Oxford. I research computer vision and machine learning methods to understand the content of images and videos automatically, with little to no manual supervision, in terms of semantics and 3D geometry. I am also the leading author of the VLFeat and MatConvNet computer vision and deep learning libraries.

Website: https://www.robots.ox.ac.uk/~vedaldi/