Developing an LLM: Building, Training, Finetuning

  Рет қаралды 18,599

Sebastian Raschka

Sebastian Raschka

Күн бұрын

REFERENCES:
1. Build an LLM from Scratch book: mng.bz/M96o
2. Build an LLM from Scratch repo: github.com/rasbt/LLMs-from-sc...
3. Slides: sebastianraschka.com/pdf/slid...
4. LitGPT: github.com/Lightning-AI/litgpt
5. TinyLlama pretraining: lightning.ai/lightning-ai/stu...
DESCRIPTION:
This video provides an overview of the three stages of developing an LLM: Building, Training, and Finetuning. The focus is on explaining how LLMs work by describing how each step works.
OUTLINE:
00:00 - Using LLMs
02:50 - The stages of developing an LLM
05:26 - The dataset
10:15 - Generating multi-word outputs
12:30 - Tokenization
15:35 - Pretraining datasets
21:53 - LLM architecture
27:20 - Pretraining
35:21 - Classification finetuning
39:48 - Instruction finetuning
43:06 - Preference finetuning
46:04 - Evaluating LLMs
53:59 - Pretraining & finetuning rules of thumb

Пікірлер: 49
@chineduezeofor2481
@chineduezeofor2481 4 күн бұрын
Thank you Sebastian for your awesome contributions. You're a big inspiration.
@tusharganguli
@tusharganguli Ай бұрын
Your articles and videos have been extremely helpful in understanding how LLMs are built. Building LLM from Scratch and Q and AI are resources that I am presently reading and they provide a hands-on discourse on the conceptual understanding of LLMs. You, Andrej Karpathy and Jay Alammar are shining examples of how learning should be enabled. Thank you!
@SebastianRaschka
@SebastianRaschka Ай бұрын
Thanks for the kind comment!
@box-mt3xv
@box-mt3xv Ай бұрын
The hero of open source
@SebastianRaschka
@SebastianRaschka Ай бұрын
Haha, thanks! I've learned so much thanks to all the amazing people in open source, and I'm very flattered by your comment to potentially be counted as one of them :)
@muthukamalan.m6316
@muthukamalan.m6316 Ай бұрын
great content! love it ❤
@tomhense6866
@tomhense6866 Ай бұрын
Very nice video, I liked it so much that I preordered your new book directly after watching it (to be fair I have read your blog for some time now).
@SebastianRaschka
@SebastianRaschka Ай бұрын
Thanks! I hope you are going to like the book, too!
@rachadlakis1
@rachadlakis1 Ай бұрын
Thanks for the great knowledge You are sharing
@DataChiller
@DataChiller Ай бұрын
the greatest Liverpool fan ever! ⚽
@SebastianRaschka
@SebastianRaschka Ай бұрын
Haha nice, at least one person watched it until that part :D
@haqiufreedeal
@haqiufreedeal 27 күн бұрын
Oh, my lord, my favourite machine learning author is a Liverpool fan.😎
@SebastianRaschka
@SebastianRaschka 27 күн бұрын
Haha, nice that people make it that far into the video 😊
@ananthvankipuram4012
@ananthvankipuram4012 22 күн бұрын
@@SebastianRaschka You'll never walk alone 🙂
@kartiksaini5847
@kartiksaini5847 Ай бұрын
Big fan ❤
@tashfeenahmed3526
@tashfeenahmed3526 21 күн бұрын
That's great Dr. Hope you will be doing good. I wish if i could download your deep learning book which is published recently. If there is any open source link to download it please mention in comments. Thanks and regards, Researcher at Texas
@sahilsharma3267
@sahilsharma3267 Ай бұрын
When is your whole book coming out ? Eagerly waiting 😅
@SebastianRaschka
@SebastianRaschka Ай бұрын
Thanks for your interest in this! It's already available for preorder (both on the publisher's website and Amazon) and if the production stage goes smoothly, it should be out by the end of of August
@KumR
@KumR 24 күн бұрын
Great Video. Now that LLM is so powerful , will regular machine learning & deep learning slowly vanish?
@SebastianRaschka
@SebastianRaschka 24 күн бұрын
Great question. I do think that special purpose ML solutions still have and will continue to have their place. The same way ML didn't make certain more traditional statistics based models obsolete. Regarding deep learning ... I'd say LLM is a deep learning model itself. But yeah, almost everything in deep learning is nowadays either a diffusion model, transformer-based model (vision transformer and most LLMs), or state space model
@RobinSunCruiser
@RobinSunCruiser 29 күн бұрын
Hi, nice videos! One question for my understanding. When talking about embedding dimensions such as 1280 in "gpt2-large" do you mean the size of the number vector encoding the context of a single token or the number of input tokens? When comparing gpt2-large and Lama2 the number is the same for the ".. embeddings with 1280 tokens".
@SebastianRaschka
@SebastianRaschka 23 күн бұрын
Good question, the term is often used very broadly and may refer to the input embeddings or the hidden layer sizes in the MLP layer. Here, I meant the size of the tokens that are embedded.
@bashamsk1288
@bashamsk1288 Ай бұрын
in the instruction fine tuning we propagate loss only on output text tokens? or for all tokens from start to EOS?
@SebastianRaschka
@SebastianRaschka Ай бұрын
That's a good question. You can do both. By default all tokens, but more commonly you'd mask the tokens. In my book, I include the token masking as a reader exercise (it's super easy to do). There was also a new research paper a few weeks ago that I discussed in my monthly research write-ups here: magazine.sebastianraschka.com/p/llm-research-insights-instruction
@bashamsk1288
@bashamsk1288 Ай бұрын
@@SebastianRaschka Thanks for the reply I just have a general question: do we use masking? For example, was masking used during the instruction fine-tuning of LLaMA 3 or mistral any Open source LLMs? Also, does your book include any chapters on the parallelization of training large language models?
@SebastianRaschka
@SebastianRaschka Ай бұрын
@@bashamsk1288 Masking is commonly used, yes. We implement it as the default strategy in LitGPT. In my book we do both. I can't speak about Llama 3 and Mistral regarding masking, because while these are open-weight models they are not open source. So there's no training code we can look at. My book explains DDP training in the PyTorch appendix, but it's not used in the main chapters because as a requirement all chapters should also work on a laptop to make them accessible to most readers.
@alihajikaram8004
@alihajikaram8004 26 күн бұрын
Would you make videos about time series and trannsformer?
@timothywcrane
@timothywcrane Ай бұрын
I'm interested in SLM RAG with Knowledge graph traversal/search for RAG dataset collection and vector-JIT semantic match for hybrid search. Any repos you think I would be interested in?
@timothywcrane
@timothywcrane Ай бұрын
bookmarked, clear and concise.
@SebastianRaschka
@SebastianRaschka Ай бұрын
Unfortunately I don't have a good recommendation here. I have only implemented standard RAGs without knowledge graph traversal.
@joisco4394
@joisco4394 29 күн бұрын
I've heard about instruct learning, and it sounds similar to how you define preference learning. I have also heard about transfer learning. How would you compare/define those?
@SebastianRaschka
@SebastianRaschka 29 күн бұрын
Transfer learning is basically involved in everything you do when you start out with a pretrained model. We don't really name or call it out explicitly anymore because it's so common. In instruction finetuning, the loss function is different from preference tuning mainly. Instruction finetuning trains the model to answer queries, and preference finetuning is basically more about the nuance of how these get answered. All preference tuning methods that are used today (DPO, RLHF+PPO, KTO), etc. expect you to have done instruction finetuning on your model before you preference finetune.
@joisco4394
@joisco4394 26 күн бұрын
@@SebastianRaschka Thanks for explaining it. I need to do a lot more research :p
@ArbaazBeg
@ArbaazBeg 15 күн бұрын
Should we give prompt to LLM when fine tuning for classification with last layer modification or directly pass the input to the LLM like in deberta?
@SebastianRaschka
@SebastianRaschka 15 күн бұрын
Thanks for the comment, could you explain a bit more what you mean by passing the input directly?
@ArbaazBeg
@ArbaazBeg 9 күн бұрын
@@SebastianRaschka Hey, sorry for the bad language. I meant should the chat formats like alpaca etc be applied or we give the text as it is to LLM for classification.
@SebastianRaschka
@SebastianRaschka 9 күн бұрын
@@ArbaazBeg Oh I see now. And yes, you can. I wanted to create an example and performance comparison for that to the GitHub repo (github.com/rasbt/LLMs-from-scratch) some time. For that I wanted to first instruction-finetune the model on a few more spam classification instructions and examples though.
@ArbaazBeg
@ArbaazBeg 8 күн бұрын
@@SebastianRaschka Can I help in this?
@mushinart
@mushinart 27 күн бұрын
Im sold , im buying your book .. would love to chat with you sometime if possible
@SebastianRaschka
@SebastianRaschka 10 күн бұрын
Thanks, hope you are liking it! Are you going to SciPy in July by chance, or maybe Neurips end of the year?
@mushinart
@mushinart 10 күн бұрын
@@SebastianRaschka unfortunately not,but I'd like to have a zoom/google meet chat with you if possible
@MadnessAI8X
@MadnessAI8X Ай бұрын
What we are seeking not only fuzzing code
@SebastianRaschka
@SebastianRaschka Ай бұрын
Glad that's useful
@ramprasadchauhan7
@ramprasadchauhan7 27 күн бұрын
Hello sir, please also make with javascript
@kumarutsav5161
@kumarutsav5161 Ай бұрын
🤌
@SebastianRaschka
@SebastianRaschka Ай бұрын
I take that as a compliment!? 😅😊
@kumarutsav5161
@kumarutsav5161 29 күн бұрын
@@SebastianRaschka Yes yes! It was supposed to be a compliment only. You are doing great work with our teaching materials :).
@redthunder6183
@redthunder6183 Ай бұрын
Easier said than done unless u got a GPU super computer lying around lol
@SebastianRaschka
@SebastianRaschka Ай бұрын
ha, I should mention that all chapters in my book run on laptops, too. It was a personal goal for me that everything should work even without a GPU. The instruction finetuning takes about ~30 min on a CPU to get reasonable results (granted, the same code takes 1.24 min on an A100)
How to Improve LLMs with RAG (Overview + Python Code)
21:41
Shaw Talebi
Рет қаралды 33 М.
I wish every AI Engineer could watch this.
33:49
1littlecoder
Рет қаралды 63 М.
I wish I could change THIS fast! 🤣
00:33
America's Got Talent
Рет қаралды 118 МЛН
Vivaan  Tanya once again pranked Papa 🤣😇🤣
00:10
seema lamba
Рет қаралды 33 МЛН
LOVE LETTER - POPPY PLAYTIME CHAPTER 3 | GH'S ANIMATION
00:15
The child was abused by the clown#Short #Officer Rabbit #angel
00:55
兔子警官
Рет қаралды 24 МЛН
Stanford CS25: V4 I Hyung Won Chung of OpenAI
36:31
Stanford Online
Рет қаралды 124 М.
[1hr Talk] Intro to Large Language Models
59:48
Andrej Karpathy
Рет қаралды 2 МЛН
LLM: Pretraining, Instruction fine-tuning and RLHF
1:01:53
YanAITalk
Рет қаралды 5 М.
Why are vector databases so FAST?
44:59
Underfitted
Рет қаралды 14 М.
I Analyzed My Finance With Local LLMs
17:51
Thu Vu data analytics
Рет қаралды 437 М.
host ALL your AI locally
24:20
NetworkChuck
Рет қаралды 835 М.
GraphRAG: LLM-Derived Knowledge Graphs for RAG
15:40
Alex Chao
Рет қаралды 87 М.
Andrew Ng On AI Agentic Workflows And Their Potential For Driving AI Progress
30:54
Tag her 🤭💞 #miniphone #smartphone #iphone #samsung #fyp
0:11
Pockify™
Рет қаралды 24 МЛН
OZON РАЗБИЛИ 3 КОМПЬЮТЕРА
0:57
Кинг Комп Shorts
Рет қаралды 1,2 МЛН
1$ vs 500$ ВИРТУАЛЬНАЯ РЕАЛЬНОСТЬ !
23:20
GoldenBurst
Рет қаралды 1,6 МЛН
Опыт использования Мини ПК от TECNO
1:00
Андронет
Рет қаралды 702 М.