Better not Bigger: Distilling LLMs into Specialized Models

  Рет қаралды 2,118

Snorkel AI

Snorkel AI

8 ай бұрын

Jason Fries, a research scientist at Snorkel AI and Stanford University, discussed the challenges of deploying LLMs and presented two variations of one solution: distillation.
The first solution, called “distilling step-by-step” emerged from a collaboration between researchers at Snorkel AI and Google Research. This approach prompts an LLM to give an answer to a question along with the model’s reasoning behind its answer. Data scientists then use both the answer and the rationale to train a smaller model. In experiments, this allowed researchers to train models on much less data while maintaining similar performance.
Jason also showed how the Snorkel Flow data development platform allows users to effectively distill the expertise of multiple LLMs into a deployable, small-format model.
More related videos: • Foundation Models: The...
More related videos: • Snorkel AI's 2023 Ente...
#airesearch #modeldistillation #largelanguagemodels

Пікірлер: 12
@vivekpadman5248
@vivekpadman5248 Ай бұрын
Very nice short informative video. I'm looking to create a distilled model on reasoning tasks for games which could run locally. This will help 😊 thanks
@SnorkelAI
@SnorkelAI Ай бұрын
Glad it was helpful!
@riser9644
@riser9644 8 ай бұрын
Link to the blog code or ppt would be good
@420_gunna
@420_gunna 5 ай бұрын
When you talk about distilation requiring large, unlabeled datsets... to be clear for my understanding, it's not necessarily that they're unlabeled data, it's more like we don't care about the dataset's labels, and instead use the teacher model's output distribution as the replacement pseudolabel. I guess you COULD create a distilled model by training against some data distribution that the teacher wasn't itself trained on... but I can't imagine why you would want to do that😄
@SnorkelAI
@SnorkelAI 3 ай бұрын
Sort of. Typically, you would use this for data that is, in fact, unlabeled-think sections of contracts or paragraphs from text books. You could also employ this approach for data that has labels that don't fit your desired schema, in which case your statement of "we don't care about the dataset's labels" would be 100% correct. As for your second comment, there could be a number of reasons you may want to do that. Perhaps the teacher LLM does quite well on a particular labeling task when given a highly-engineered prompt. This approach would let you transfer that performance into a smaller and cheaper model.
@vivekpadman5248
@vivekpadman5248 Ай бұрын
Is this approach used on all three levels of training? Base instruct ane chat fine-tuning? And are there different things to be considered for the above?
@SnorkelAI
@SnorkelAI Ай бұрын
I'm not 100% clear on your question. Are you referring to pre-training, fine-tuning and alignment? If so, this approach could be used on fine-tuning and/or alignment. It could also theoretically be used on pre-training, but I suspect that would yield poor results.
@vivekpadman5248
@vivekpadman5248 Ай бұрын
@@SnorkelAI yes that was exactly my question, thanks 😊. I have one follow up question here. Why do you think it would yeild poorer results on pre training phase any insights on that and in that case what kind (size and arch) of pretrained student model should be used with a specific teacher llm Or anything would work?
@SnorkelAI
@SnorkelAI Ай бұрын
Sorry for the slow reply here. KZfaq didn't surface your reply comment the same way it did your initial comment. We're getting a bit outside the bounds of what can be reasonably answered within a KZfaq comment, but I think we can reasonably say this: Distilling a model means using its output to train a smaller model. For pre-training, that would mean creating an immense volume of raw generated outputs to form the parent model. Several studies have shown that pre-training generative models on other models' generated output tends not to work so well. We don't yet fully understand why, but we understand that it is a questionable practice at present.
@vivekpadman5248
@vivekpadman5248 Ай бұрын
@@SnorkelAI no worries man, getting such a nice detailed reply is all that matters. Ah understood it properly now, also I guess the limits of the parameter size will come into picture while doing that if we use it for pretraining. Clean data plua synthetic data is anyways available now. Thanks again 😊🙏
@lionhuang9209
@lionhuang9209 8 ай бұрын
where can we get PPT?
@mechwarrior83
@mechwarrior83 8 ай бұрын
please
Fine-Tune and Customize LLMs with Snorkel AI
22:25
Snorkel AI
Рет қаралды 640
LLMOps: Everything You Need to Know to Manage LLMs
36:30
Databricks
Рет қаралды 17 М.
Was ist im Eis versteckt? 🧊 Coole Winter-Gadgets von Amazon
00:37
SMOL German
Рет қаралды 29 МЛН
OMG🤪 #tiktok #shorts #potapova_blog
00:50
Potapova_blog
Рет қаралды 18 МЛН
Lessons Learned on LLM RAG Solutions
34:31
Prolego
Рет қаралды 22 М.
How To Build Tools For AI Agents
12:07
Hugo Pod
Рет қаралды 874
Teacher-Student Neural Networks: Knowledge Distillation in AI
13:01
Computing For All
Рет қаралды 2,7 М.
KNOWLEDGE DISTILLATION ultimate GUIDE
5:35
Datafuse Analytics
Рет қаралды 2 М.
Has Generative AI Already Peaked? - Computerphile
12:48
Computerphile
Рет қаралды 840 М.
GraphRAG: LLM-Derived Knowledge Graphs for RAG
15:40
Alex Chao
Рет қаралды 82 М.
LoRA explained (and a bit about precision and quantization)
17:07
Adding Agentic Layers to RAG
19:40
AI User Group
Рет қаралды 16 М.
Secret Wireless charger 😱 #shorts
0:28
Mr DegrEE
Рет қаралды 2,4 МЛН
ИГРОВОВЫЙ НОУТ ASUS ЗА 57 тысяч
25:33
Ремонтяш
Рет қаралды 315 М.
1$ vs 500$ ВИРТУАЛЬНАЯ РЕАЛЬНОСТЬ !
23:20
GoldenBurst
Рет қаралды 1,5 МЛН