REALM: Retrieval-Augmented Language Model Pre-Training (Paper Explained)

No video

REALM: Retrieval-Augmented Language Model Pre-Training (Paper Explained)

Рет қаралды 14,872

Күн бұрын

#ai #tech #science
Open Domain Question Answering is one of the most challenging tasks in NLP. When answering a question, the model is able to retrieve arbitrary documents from an indexed corpus to gather more information. REALM shows how Masked Language Modeling (MLM) pretraining can be used to train a retriever for relevant documents in an end-to-end fashion and improves over state-of-the-art by a significant margin.
OUTLINE:
0:00 - Introduction & Overview
4:30 - World Knowledge in Language Models
8:15 - Masked Language Modeling for Latent Document Retrieval
14:50 - Problem Formulation
17:30 - Knowledge Retriever Model using MIPS
23:50 - Question Answering Model
27:50 - Architecture Recap
29:55 - Analysis of the Loss Gradient
34:15 - Initialization using the Inverse Cloze Task
41:40 - Prohibiting Trivial Retrievals
44:05 - Null Document
45:00 - Salient Span Masking
50:15 - My Idea on Salient Span Masking
51:50 - Experimental Results and Ablations
57:30 - Concrete Example from the Model
Paper: arxiv.org/abs/...
Code: github.com/goo...
My Video on GPT-3: • GPT-3: Language Models...
My Video on BERT: • BERT: Pre-training of ...
My Video on Word2Vec: • [Classic] Word2Vec: Di...
Abstract:
Language model pre-training has been shown to capture a surprising amount of world knowledge, crucial for NLP tasks such as question answering. However, this knowledge is stored implicitly in the parameters of a neural network, requiring ever-larger networks to cover more facts.
To capture knowledge in a more modular and interpretable way, we augment language model pre-training with a latent knowledge retriever, which allows the model to retrieve and attend over documents from a large corpus such as Wikipedia, used during pre-training, fine-tuning and inference. For the first time, we show how to pre-train such a knowledge retriever in an unsupervised manner, using masked language modeling as the learning signal and backpropagating through a retrieval step that considers millions of documents.
We demonstrate the effectiveness of Retrieval-Augmented Language Model pre-training (REALM) by fine-tuning on the challenging task of Open-domain Question Answering (Open-QA). We compare against state-of-the-art models for both explicit and implicit knowledge storage on three popular Open-QA benchmarks, and find that we outperform all previous methods by a significant margin (4-16% absolute accuracy), while also providing qualitative benefits such as interpretability and modularity.
Authors: Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat, Ming-Wei Chang
Links:
KZfaq: / yannickilcher
Twitter: / ykilcher
Discord: / discord
BitChute: www.bitchute.c...
Minds: www.minds.com/...
Parler: parler.com/pro...
LinkedIn: / yannic-kilcher-488534136
If you want to support me, the best thing to do is to share out the content :)
If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this):
SubscribeStar: www.subscribes...
Patreon: / yannickilcher
Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq
Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2
Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m
Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n

Пікірлер: 33

@selfhelp119 3 жыл бұрын

Using marginalized probability is a good idea. Brilliant!

@shaz7163 3 жыл бұрын

can you please do "Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks"

@sheetalborar6813 Ай бұрын

Loved it!😊

@wentianbao5368 4 жыл бұрын

quite detailed and clear explanation. And I like the brief idea of the paper introduced in the beginning of the video~

@jenishah9825 3 жыл бұрын

This content here, is gold!

@kan_drio 11 ай бұрын

Excellent video man! Thank you so much!

@veedrac 4 жыл бұрын

I'd imagine REALM is pronounced like the work ‘realm’ (sounds like ‘relm’), given it seems to be a pun on the definition of realm, ‘a field or domain of activity or interest.’

@quebono100 4 жыл бұрын

Love it :)

@shayanroychoudhury9066 4 жыл бұрын

Could you do a video on ORQA?

@vimostan269 4 жыл бұрын

For "Salient span masking", "BERT-wwm" "SpanBERT" "ERNIE" "RoBERTa" all adopted mask-based modifications in improving BERT.

@DistortedV12 4 жыл бұрын

This seems on the surface similar to the idea Yannic had in the GPT-3 ML Street talk video.

@corgirun7892 Жыл бұрын

nice

@LNJP13579 4 жыл бұрын

Yannic - how do we get to know which research paper(RP) is more relevant. Only a miniscule fraction of RPs published make an impact. In earlier comments I had requested if you could somehow make a mapping of RP video to citations or anything similar, it would be great. Otherwise it is difficult to select videos from so many :).

@herp_derpingson 4 жыл бұрын

Checkout arxiv sanity preserver made by Karpathy. Its intended to serve this purpose. www.arxiv-sanity.com/

@MrAlextorex 4 жыл бұрын

use www.arxiv-sanity.com/ .The paper basicly must be accepted to conferences and must have many citations

@ziquaftynny9285 4 жыл бұрын

60 degrees or 300 degrees?

@sandeepunnikrishnan8806 Жыл бұрын

How would it be 300?

@bdennyw1 4 жыл бұрын

Great explainer! One thing that I’m not sure about. How are the 3 models connected, is this end to end? How does the retrieval work in that case?

@YannicKilcher 4 жыл бұрын

Yes, this is end to end.

@bdennyw1 4 жыл бұрын

@@YannicKilcher Thanks, I'll have to dig into the paper. The retrieval step doesn't seem like it's differentiable, so there is something I'm missing.

@user-nc5cq9yu2c 2 жыл бұрын

@@bdennyw1 I don't see how the retrieval step is differentiable, either😂

@moon-zm8mx 2 жыл бұрын

Thank you for sharing your clear paper explanation! even thought your clear explanation, I have one thing not to be clear to me... 34:17 I can't understand. I think the equation means that more retriever put relevant answer, more the r(z) value is going to be high, and that means gradient too is going up! But In my understanding, it is natural that more models are learned, more the gradient have to be small! So I'm so confused right now. Is what I know wrong? help me please..

@Leon-pn6rb 3 жыл бұрын

what does marginalizing mean in this context?

@YannicKilcher 3 жыл бұрын

I'm not sure what context you mean, could you clarify?

@tarunpaparaju5382 4 жыл бұрын

Hey Yannic! Great video! I really appreciate the work you are doing to make research more accessible to everyone! By the way, I don't see a 1080p (HD) option for this video. Is it possible to watch this video in 1080p? Thank you! :)

@YannicKilcher 4 жыл бұрын

Thanks for telling me, I didn't even see that. No idea why this happens

@tarunpaparaju5382 4 жыл бұрын

@@YannicKilcher Thanks for your reply! Keep making awesome videos, it really helps me a lot :)