Multi-Modal Self-Supervised Learning from Videos

  Рет қаралды 1,501

COMPUTER VISION TALKS

COMPUTER VISION TALKS

3 жыл бұрын

Abstract:
In this talk, I will show how good visual representations can be learned without manual  annotations by simply leveraging the multimodal nature of videos. I will illustrate this by going through two of our recent results. First, we demonstrate that a text-video embedding trained on HowTo100M, a large uncurated dataset of narrated videos, leads to state-of-the-art results for text-to-video retrieval and action localization tasks [1]. Second, I will introduce our recent MultiModal Versatile (MMV) Networks [2] that learn state-of-the-art self-supervised representations by leveraging three modalities naturally present in videos: vision, audio, and language. 
[1] Antoine Miech, Jean-Baptiste Alayrac, et al. ''End-to-End Learning of Visual Representations from Uncurated Instructional Videos'', CVPR 2020.
[2] Jean-Baptiste Alayrac et al, ''Self-Supervised MultiModal Versatile Networks'', NeurIPS 2020.
Short bio:
Jean-Baptiste Alayrac is a senior research scientist at DeepMind working in the Vision group led by Andrew Zisserman. He obtained a Ph.D. from Ecole Normale Superieure in Paris in 2018, an MSc degree in Mathematics, Machine Learning, and Computer Vision from Ecole Normale Superieure in Cachan in 2014, and graduated from the Ecole Polytechnique in France in 2013. His research interests span video understanding, natural language processing, and machine learning. Most recently, he has been focusing on self-supervised learning from multiple modalities present in large collections of videos.

Пікірлер: 1
@sadiaafrinpurba9179
@sadiaafrinpurba9179 Жыл бұрын
Concise explanation
Generative AI in a Nutshell - how to survive and thrive in the age of AI
17:57
OMG🤪 #tiktok #shorts #potapova_blog
00:50
Potapova_blog
Рет қаралды 17 МЛН
FOOLED THE GUARD🤢
00:54
INO
Рет қаралды 63 МЛН
МАМА И STANDOFF 2 😳 !FAKE GUN! #shorts
00:34
INNA SERG
Рет қаралды 4,1 МЛН
Homemade Professional Spy Trick To Unlock A Phone 🔍
00:55
Crafty Champions
Рет қаралды 61 МЛН
Self-Supervised Learning of Visual Representations with Online Clustering
45:31
Visual Deep Learning with Limited Labels or Data
1:06:22
COMPUTER VISION TALKS
Рет қаралды 497
Exploring Simple Siamese Representation Learning and Beyond
1:14:14
COMPUTER VISION TALKS
Рет қаралды 3,7 М.
State of the Art: Training 70B LLMs on 10,000 H100 clusters
1:32:05
I wish every AI Engineer could watch this.
33:49
1littlecoder
Рет қаралды 58 М.
Contrastive Learning with SimCLR V1/V2 and Some Intriguing Properties
1:02:34
COMPUTER VISION TALKS
Рет қаралды 2,7 М.
Physics-Informed Translation with GANs
1:03:22
COMPUTER VISION TALKS
Рет қаралды 448
OMG🤪 #tiktok #shorts #potapova_blog
00:50
Potapova_blog
Рет қаралды 17 МЛН