Low-rank Adaption of Large Language Models: Explaining the Key Concepts Behind LoRA

  Рет қаралды 96,925

Chris Alexiuk

Chris Alexiuk

Күн бұрын

In this video, I go over how LoRA works and why it's crucial for affordable Transformer fine-tuning.
LoRA learns low-rank matrix decompositions to slash the costs of training huge language models. It adapts only low-rank factors instead of entire weight matrices, achieving major memory and performance wins.
🔗 LoRA Paper: arxiv.org/pdf/2106.09685.pdf
🔗 Intrinsic Dimensionality Paper: arxiv.org/abs/2012.13255
About me:
Follow me on LinkedIn: / csalexiuk
Check out what I'm working on: getox.ai/

Пікірлер: 172
@chrisalexiuk
@chrisalexiuk Жыл бұрын
Implementation video: kzfaq.info/get/bejne/n7-iZNuL05iagmw.html
@AlexG4MES
@AlexG4MES 10 ай бұрын
This was beautifully explained. As someone who relies exclusively on self learning from online materials, the mathematical barriers of differig notations and overly complex wording is the most time consuming challenge. Thank you for such a distilled explanation, with only the notation and wording that makes sense for an intuitive and initial dive understanding. Subscribed!
@chrisalexiuk
@chrisalexiuk 10 ай бұрын
Thanks so much for your kind words.
@jacobjakubek1076
@jacobjakubek1076 2 ай бұрын
I'm not kidding feed it to ChatGPT, helps me a ton
@kartikpodugu
@kartikpodugu 8 ай бұрын
I came across this video a month back, at that time, I didn't understand your excitement, though I understood the technique. Presently, I have better understanding of LLMs, and how they are trained and fine tuned for down stream tasks, and now I share your excitement.
@mahandolatabadi2600
@mahandolatabadi2600 Жыл бұрын
The high-level intuition you gave at the end of the video was great. As a mathematician I'm aware of the theory behind low-rank decomposition and the classic applications, but the way it is applied in the context of LLMs is interesting.
@douglasschonholtz8683
@douglasschonholtz8683 Жыл бұрын
The AutoGPT team is learning about LoRA and I recommended this because it is such a clear explanation. Thanks for the awesome resource!
@chrisalexiuk
@chrisalexiuk Жыл бұрын
Thanks so much!
@Ali-ts6po
@Ali-ts6po 11 ай бұрын
This is the first video I watched from your channel, and I loved it! now I have 12 tabs open, to watch rest of your videos. Simply Amazing! (and WOW, I could not imagine LoRA can be so good. it will save me a ton of resources in developing the product I am working on.
@user-qy9sx7bn1l
@user-qy9sx7bn1l 6 ай бұрын
I particularly appreciate the depth of research and preparation that clearly goes into this video. It's evident that you're passionate about the topics you cover, and your enthusiasm is contagious. Your dedication to providing accurate information while maintaining an accessible and entertaining format is commendable.
@amirmohammadi572
@amirmohammadi572 Жыл бұрын
Thank you for thus nice, deep yet easy to follow explanation of LoRA. Nice job
@mikesecret2731
@mikesecret2731 Жыл бұрын
Thank you so much Chris! This was an awesome video and has stopped me from going down the fine-tuning rabbit hole! Just dipping my toe into AI so it’s really great to find an informative channel like yours!
@chrisalexiuk
@chrisalexiuk Жыл бұрын
Fine-tuning (full parameter) can still be a great idea! But if you're just dipping in, LoRA is great!
@mdraugelis
@mdraugelis 3 ай бұрын
Thanks for the great intro to LoRA. I liked your graphics and your take-aways, also you energetic presentation :)
@deeplearning5408
@deeplearning5408 9 ай бұрын
Best explanation of the whole KZfaq so far. Great job!
@user-qn8zn4rj4t
@user-qn8zn4rj4t 8 ай бұрын
Woow the best explanation I've found so far.Coming from mathematics background it's really amazed me. Thank you
@AhmedKachkach
@AhmedKachkach 10 ай бұрын
Really simple explanation without skimming through the intuition and other important details. Subscribed for more content like this :)!
@chrisalexiuk
@chrisalexiuk 10 ай бұрын
Thanks so much!
@necbranduc
@necbranduc Жыл бұрын
Great explanation and first video I've seen from your channel. Somehow, I had the impression that you have 624K subscribers. Was shocked to see there's no actual "K" in there, when browsing your whole video history and seeing your channel is only ~2 months old. You'd deserve that "K" from my pov. Looking forward to the implementation video! (new subscriber)
@chrisalexiuk
@chrisalexiuk Жыл бұрын
Thank you so much!
@troy_neilson
@troy_neilson 8 ай бұрын
Fantastic video and a really concise and well-considered explanation. Thanks!!!
@nmirza2013
@nmirza2013 3 ай бұрын
Thanks a lot for this amazing explanation. I am fine tuning Mixtral 8x7B and using QLoRA have been able to perform test runs on Colab Pro using A100 machine.
@VitalContribution
@VitalContribution Жыл бұрын
Love your enthusiasm and great explanation!
@forrestallison1879
@forrestallison1879 Жыл бұрын
Thank you very much. I didn't understand the paper so this helped me a lot. I'd be curious to hear about some of the limitations people are experiencing. I know it's not all completely characterized yet because this is new. I've noticed that Lora can redirect the type of output that I get but it's not very good at adding information to the model. This makes sense
@yookoT
@yookoT Жыл бұрын
Love your clear and passionate explanations!!!
@chrisalexiuk
@chrisalexiuk Жыл бұрын
Thanks!
@szymonpogodzinach2495
@szymonpogodzinach2495 3 ай бұрын
Man that video is fireee! Thank you for your work!
@swfsql
@swfsql 4 ай бұрын
Thanks a lot for this video! This is the first time I see a good explanation on this LoRA thing! 14:45 One minor note, is that it would indicate that the model has a low intrinsic info only if you could get rid of the original weights and just stick to the lora. That is, during the lora finetune training, if you could get away with while decaying the original (non-lora) weights down to zero. So I think that what has a low intrinsic info is "what you have to change from the base model" for your finetuning needs - but not the base model itself.
@oliverhu1025
@oliverhu1025 Жыл бұрын
Great explanation! Keep up the great work!!
@EmadGohari
@EmadGohari 8 ай бұрын
Beautifully explained my friend thabk you very much!
@OwenIngraham
@OwenIngraham Жыл бұрын
bears repeating that you should continue doing these videos, wish I had your communication skillz
@lucretius1111
@lucretius1111 Жыл бұрын
Really a wonderful explanation. Best I've seen so far
@chrisalexiuk
@chrisalexiuk Жыл бұрын
Thanks so much, Shawn!
@user-bb2ut7nu3l
@user-bb2ut7nu3l 22 күн бұрын
Thank you for the explanation, It helps me a lot.
@user-ui1cl4pf6j
@user-ui1cl4pf6j Жыл бұрын
Love you Chris! It really helps me a lot!
@satyakebakshi7896
@satyakebakshi7896 2 ай бұрын
Amazing explanation, Thank you!
@vritansh14
@vritansh14 8 ай бұрын
One of the BEST explanations!!
@erfanasgari21
@erfanasgari21 7 ай бұрын
You explained this very well. thanks!
@jayp9158
@jayp9158 10 ай бұрын
Incredible video, just subscribed to your channel!
@chanm01
@chanm01 Жыл бұрын
Dude, you're great at explaining stuff. 👍
@chrisalexiuk
@chrisalexiuk Жыл бұрын
Thanks!!
@fatvvsfatvvs5642
@fatvvsfatvvs5642 7 ай бұрын
Thank you for the good explanation of LoRA! Super!
@jamesjang8389
@jamesjang8389 11 ай бұрын
Amazing explanation! Thankyou😊
@gara8142
@gara8142 Жыл бұрын
I got here thinking this would be explaining how my waifu appears in the pretty image but instead I got a precise mathematical description of how this system works. Very good!
@chrisalexiuk
@chrisalexiuk Жыл бұрын
>
@soonheng1577
@soonheng1577 Жыл бұрын
the best LoRA explanation I ever get. Better than ChatGPT.
@2mitable
@2mitable 8 ай бұрын
great work dude please do some more videos god bless you
@nyyotam4057
@nyyotam4057 Жыл бұрын
Chris, actually, what you are describing when you relate to this kind of application on the attention matrix, is specifically at inference, because the attention matrix is constantly being fine tuned (the only part of the GPT model to be constantly fine tuned at inference). This will enable to actually construct an AI that will enjoy the ability to think long term, since this will make use of so much less memory, so no need to reset the attention matrix so often. This, however, could be dangerous🙂.
@dyc7520
@dyc7520 Жыл бұрын
Great explanation! I have gained an intuitive insight from your video. Thank you. Looking forward to your new videos! It seems that there is a missing x in the formula on the slide LoRA Explained (Cont.), which h should equal to w0x + wAwBx, if I understand your explanation correctly.
@chrisalexiuk
@chrisalexiuk Жыл бұрын
You 100% correct! Thanks for catching that!
@mediocre3400
@mediocre3400 Жыл бұрын
You are really good at explaining it. Keep it up
@chrisalexiuk
@chrisalexiuk Жыл бұрын
Thank you very much!
@rojas-diego
@rojas-diego Жыл бұрын
Amazing video! Thanks
@moonly3781
@moonly3781 6 ай бұрын
Thank you so much! I'm working on a project to develop a chatbot for student advisory services and i am contemplating between two approaches: Fine-Tuning and Retrieval-Augmented Generation (RAG). Here are the key tasks the chatbot needs to address: Answering general queries about university courses and requirements. Providing personalized advice on study plans and schedules. Assisting with administrative processes (enrollment, documentation, etc.). Offering support for common academic and personal challenges faced by students. Given these responsibilities, which approach would be more suitable? Fine-Tuning could offer more precise, tailored responses, but RAG might better handle diverse, real-time queries with its information retrieval capabilities. Any insights or experiences with similar projects would be greatly appreciated!
@tech-talks-with-ali
@tech-talks-with-ali 11 ай бұрын
Thanks. This was very useful
@MrAyushRaj1
@MrAyushRaj1 8 ай бұрын
Absolutely brillant video!
@user-wr4yl7tx3w
@user-wr4yl7tx3w Жыл бұрын
this was explained so well. why can't the authors write their paper like the way you explained it.
@chrisalexiuk
@chrisalexiuk Жыл бұрын
It's certainly easier to distill information after they did the important work of creating it!
@mohammedhelal5778
@mohammedhelal5778 Жыл бұрын
The explanation was good, but the paper also had a decent explanation.
@ko-Daegu
@ko-Daegu Жыл бұрын
just cuz you know something doesn't necessary mean you can explain it clearly to others
@nicolas_boumal
@nicolas_boumal Жыл бұрын
Also, papers, talks, lectures, videos etc. are different forms of communication, consumed differently and also aiming for different (complementary) goals. Papers have the difficult job of conveying both Intuition about new ideas while also detailing all of the technicalities about it and connecting them to a vast literature, rigorously substantiating pros and cons, all under a page limit constraint. Talks/videos have the difficult job of truly making the intuition and potential impact of an idea "pop", with the advantage that they can drop many of the other concerns above. We need both.
@anshumansabath4362
@anshumansabath4362 Жыл бұрын
It takes 'work' to come up with good explanations - which may or may not be the priority at the time of writing (the first draft) of the paper.
@Ved3sten
@Ved3sten 10 ай бұрын
Couple of basic questions: Is the 'h' or embedding layer an implementation detail related to LoRA? I thought the embedding layer is only used for the initial conversion of tokens into vectors during input feeding in the beginning Does LoRA only apply to the finetuning or training process? To prepare for inference, I'd imagine we'd combine the pretrained weights with the delta matrix and store that resulting matrix right? I wasn't really sure what you meant by the whole swapping out of weights for different problem types, because I assumed we just store the resulting weight matrix after training is complete.
@TheBaayres
@TheBaayres Жыл бұрын
Thank you Chris .
@rene9901
@rene9901 2 ай бұрын
Hello ... new to the field of llm training .. thanks for putting up these two videos .. However , I am a bit confused by the comment ' ...you can do it during inference time' ? As per my understanding the weight updates are done during fine tuning ... and they are later used during inference ... if the task is changed we just revert back to the pertained weight by getting rid off the weight update for current task .. and fine tune the model based on the new task to get the new weight update ... the new update are then again used later during inference .. so the weight updates are during fine tuning only ... which I think why the authors mentioned that batch processing is not obtained by LoRA (base) ... though possible and difficult ... may be there is some future version where its implemented ? I am not sure .... but please correct me if I am conceptually wrong ...
@emirhanbilgic2475
@emirhanbilgic2475 10 ай бұрын
amazing, thank you so much
@billy.n2813
@billy.n2813 Жыл бұрын
Oh my God, thank you very much for this.
@jdbrinton
@jdbrinton Жыл бұрын
this saved me a ton of time. thank you!
@chrisalexiuk
@chrisalexiuk Жыл бұрын
Great to hear!
@samirelzein1095
@samirelzein1095 Жыл бұрын
thanks man, love the genuine!
@wobblynl1742
@wobblynl1742 9 ай бұрын
love the enthusiasm!
@shohamjac
@shohamjac 8 ай бұрын
Wow, what a great video!
@INTERN4AI
@INTERN4AI 2 ай бұрын
@AlexG4MES 8 months ago This was beautifully explained. As someone who relies exclusively on self learning from online materials, the mathematical barriers of differig notations and overly complex wording is the most time consuming challenge. Thank you for such a distilled explanation, with only the notation and wording that makes sense for an intuitive and initial dive understanding. Subscribed!
@ArtRebel007
@ArtRebel007 Жыл бұрын
Thank you so much for this video. The explanation is super clear. I do have a question though. When we reduce the WA and WB, does that wind up causing a lower fidelity in terms of the quality of inference results? Of course the difference is negligible, otherwise Lora wouldn't be useful, but I wonder if there is a loss, to what degree that might have a cumulative effect across many operations over time? In other words, yes, it's super that we are saving time and money, but is there a sense of at what cost overall? Thanks again for a great video!
@chrisalexiuk
@chrisalexiuk Жыл бұрын
For the most part, the fidelity of LoRA hinges on the fact that the downstream tasks are intrinsically low dimensional - and so can be represented by low-rank weight matrices. The authors claim that LoRA can *fully* represent the downstream task without the full weight matrix. You're absolutely correct to identify that cumulative error could be a determining factor in whether or not you use LoRA to fine-tune, but in most cases (from available literature), the savings on both the training and potential savings on the inference side outweigh the minimal, if any, error accumulation.
@praveen9083
@praveen9083 Жыл бұрын
great explanation!!
@chrisalexiuk
@chrisalexiuk Жыл бұрын
Thanks, Praveen.
@benjaminbear138
@benjaminbear138 7 ай бұрын
Dude. This video is amazing
@chrisalexiuk
@chrisalexiuk 6 ай бұрын
Thank you! That means a lot!
@ArunkumarMTamil
@ArunkumarMTamil 25 күн бұрын
how is Lora fine-tuning track changes from creating two decomposition matrix? How the ΔW is determined?
@languagemodeler
@languagemodeler 10 ай бұрын
bless up nice work thank you
@kingdown7502
@kingdown7502 6 ай бұрын
Could you explain why this saves memory? Don't you need the pre-trained weights in backprop to calculate the difference matrixes and during the forward pass?
@chrisalexiuk
@chrisalexiuk 6 ай бұрын
We only need to pass through the frozen weights, which means we don't need them in the optimizer. That is where the significant memory load reduction comes from.
@imranullah3097
@imranullah3097 Жыл бұрын
Please make a playlist on llm and different architecture and there implementation..🙂
@Veptis
@Veptis 3 ай бұрын
The one issue I have had is that this causes memory footprint to grow. But it sounds like you should be able to merge it into the base model at the end to keep the same footprint. Maybe that is something for me to try. I wonder if this low rank decomposition can be used for model distillation. Instead of just quantizing weights.
@davidromero1373
@davidromero1373 7 ай бұрын
Hi a question, can we use lora to just reduce the size of a model and run inference, or we have to train it always?
@karthikr5884
@karthikr5884 7 ай бұрын
Thank you
@gagangayari5981
@gagangayari5981 Жыл бұрын
Hi Chris, just trying to understand. Assume that I have two downstream tasks, and I have two adapters trained for these tasks. During inferencing and I want to use both of them simultaneously. Will LoRA allow me to do this with one base model along with the two different adapters ? Basically, can I use multiple adapters simultaneously ?
@chrisalexiuk
@chrisalexiuk Жыл бұрын
You can basically do exactly that.
@gagangayari5981
@gagangayari5981 Жыл бұрын
@@chrisalexiuk I want two different outputs for the two adapters. Not trying to fuse them.
@codediporpal
@codediporpal Жыл бұрын
So it's basically a linear algebra trick. Kinda sorta like pricipal component analysis. Very interesting. I can see this is essentially lossless compression post-training, but I have to wonder about subtle effects on the quality of the model during training as it's left with fewer parameters to discover features.
@chrisalexiuk
@chrisalexiuk Жыл бұрын
LoRA does need to *learn* the decomposed matrices - but otherwise, what you said about the subtle effects is correct. The assumption is that the downstream tasks are low rank, which mostly holds true.
@alexisravanel5275
@alexisravanel5275 Жыл бұрын
Are they low ranked or is it the weight update ? When added to original weights maybe it's not low ranked anymore?or do I make a mistake? Ilk read the paper to be sure 😅. Good vid!
@user-or7kr3ll6n
@user-or7kr3ll6n Жыл бұрын
thank You
@purefall91
@purefall91 11 ай бұрын
First of all, thanks for the video. I know its just a minor thing but still for the mathematical correctness, the equation in the LoRA explained slide should be updated as: h = W_o x + W_A W_B x Annoying reviewer1 out ! :)
@chrisalexiuk
@chrisalexiuk 11 ай бұрын
Yes, you're absolutely correct! Another person also suggested this - though I'm not sure how to edit the video!
@pladselsker8340
@pladselsker8340 Жыл бұрын
I wonder why this is such a recent and new concept. Matrix decomposition applied to machine learning models. Like, matrix decomposition has been known to work really well for so many different applications and tasks. How is it possible that we only thought about applying it to machine learning recently? Also, the authors talk about fine-tuning with low rank adaptation, but why are we not assuming low rank for the whole training process as well (meaning NOT from pretrained). If it's worth it to decompose a model before starting to train it, then the implications are huge for machine learning as a whole.
@chrisalexiuk
@chrisalexiuk Жыл бұрын
It's interesting, for sure - but I believe the main issue is that people assumed you needed the full rank (and for the generalized tasks you do). It's only once we fine-tune that the idea that downstream tasks are low-dimensional creeps in. I'm not certain about full training; but I'd imagine someone is trying that - and if not...it's a perfect thing for you to try!
@clray123
@clray123 Жыл бұрын
I suspect the people who do ML and the people who do maths are not exactly the same group.
@PotatoKaboom
@PotatoKaboom Жыл бұрын
cool video tahnk you!
@doctor.dongle
@doctor.dongle Жыл бұрын
Follow up video on LyCORIS locon?
@monolizm
@monolizm 5 ай бұрын
this is a masterpiece
@user-wr4yl7tx3w
@user-wr4yl7tx3w Жыл бұрын
But dW matrix changes with each propagation. When do we determine how many dimension is needed to represent dW?
@chrisalexiuk
@chrisalexiuk Жыл бұрын
While the contents of dW change - the shape does not. We don't really determine how many dimensions we need ahead of time, or empircally (yet) and intead select a "reasonable" r (rank) based on overall generation after fine-tuning.
@shinkurt
@shinkurt Жыл бұрын
LoRA is a game changer
@prithvishah2618
@prithvishah2618 Жыл бұрын
Nice!
@user-wr4yl7tx3w
@user-wr4yl7tx3w Жыл бұрын
what does the author mean by "parameter budget"? which in this case is 18M.
@chrisalexiuk
@chrisalexiuk Жыл бұрын
Artificially limiting the number of trainable parameters. In the specific example in the paper they were choosing between adapting Wq by itself, Wq & Wv, or all of them. So we know that choosing lower values for rank (r) means less parameters - and the idea is that if we have Wq by itself, we can choose a higher r and have the same number of parameters than if we have Wq & Wv together. Let me know if that explains things well!
@nyyotam4057
@nyyotam4057 Жыл бұрын
Hmmm.. Isn't it also extremely faster inference, due to that little thing called "the associative law of matrix multiplication"?
@wilfredomartel7781
@wilfredomartel7781 Жыл бұрын
Great!
@simonnarang3369
@simonnarang3369 6 ай бұрын
Why does deltaW need to be represented by both WA x WB? Why couldn't it be represented using just smaller matrix?
@chrisalexiuk
@chrisalexiuk 6 ай бұрын
In order to preserve the original shape of the weights and to avoid needing to change the model architecture!
@user-wr5cl4xb7h
@user-wr5cl4xb7h 10 ай бұрын
love your explanation it helped me a lot
@user-wr4yl7tx3w
@user-wr4yl7tx3w Жыл бұрын
how is Lora different from PEFT? or are they related?
@chrisalexiuk
@chrisalexiuk Жыл бұрын
LoRA is a subset of PEFT. So it's a method of Parameter Efficient Fine-tuning!
@glebmaksimov4885
@glebmaksimov4885 6 ай бұрын
Отличный туториал)
@chrisalexiuk
@chrisalexiuk 6 ай бұрын
Thanks!
@yagneshbhadiyadra7938
@yagneshbhadiyadra7938 2 ай бұрын
What's the difference between less parameters, and low intrinsic weights? because weights are parameters of Neural Net isn't it?
@sampadmohanty8573
@sampadmohanty8573 Ай бұрын
@15:28, there is nothing great about adding the extra LoRA parameters to the weights that makes it easier to swap the behaviour of the model at inference time because the difference between adding the matrices and loading the entirely new weight matrix from different finetuned models to the model architecture is negligible.
@chrisalexiuk
@chrisalexiuk Ай бұрын
I actually think that winds up being largely incorrect, looking at platforms like Gradient - and initiatives like multi-LoRA (LoraX, etc), seem to be a testament to that.
@creativeuser9086
@creativeuser9086 Жыл бұрын
What’s with the latency, I didn’t get it. Can’t we swap with traditional fine tuning, we store delta_W but it’s just way bigger than A & B.
@chrisalexiuk
@chrisalexiuk Жыл бұрын
While we could swap them technically, yes - the massive reduction in size of the LLaMA weights means we're only loading ~10-20MB of information, versus potential GBs of data. That size difference could be the difference between ~1ms additional latency versus 50-100ms or more of additional latency. Additionally, the changes in traditional fine-tuning are not stored as delta_W, and so it would be an extra process to be able to do so in the first place.
@creativeuser9086
@creativeuser9086 Жыл бұрын
@@chrisalexiuk oh, so in the case of LoRA the operation W+delta_W will require 10M operations while in the traditional fine-tuning, it would require 1B+ operations (depending on the size of the delta_W_traditional), right? Also, why can't we preload all the different versions of the model (W_original, W_finetuned_1, W_finetuned_2, etc.). Is it strictly because it will require lots of storage?
@user-wr4yl7tx3w
@user-wr4yl7tx3w Жыл бұрын
what is "prefix-based approaches"?
@chrisalexiuk
@chrisalexiuk Жыл бұрын
aclanthology.org/2021.acl-long.353/ arxiv.org/pdf/2110.07602.pdf Here are some papers that go over the idea!
@micknamens8659
@micknamens8659 Жыл бұрын
If the key and query matrices are of dimensions m×n but have rank r=max(rank_k, rank_q) in the pretrained net then why don't we transform these matrices into m×r matrices?
@chrisalexiuk
@chrisalexiuk Жыл бұрын
Essentially that's what LoRA is doing! We decompose the MxN matrix to a Mxr and an rxN matrix pair. I feel as though I likely misunderstood your point, though, so would you mind expanding a bit on it?
@clray123
@clray123 Жыл бұрын
@@chrisalexiuk I think the question is if LoRA is so great and the tweaked source matrices have so little information in them as to allow it to work, why don't we dispose of the big pretrained matrices altogether and replace them with small ones. In other words, why keep a huge model if you could use a much smaller one.
@alexisravanel5275
@alexisravanel5275 Жыл бұрын
Its delta w that is low ranked, not W. Model got info that we should keep. It's the delta that can be condensed. (If I got it right)
@chrisalexiuk
@chrisalexiuk Жыл бұрын
@@clray123 It's actually the case that only the *downtstream* tasks have intrinsically low dimensionality (and the assumption the LoRA paper makes is that they by extension have intrinsically low rank) So when it comes to the "full" model - we need that large rank! Hopefully that answers your question.
@clray123
@clray123 Жыл бұрын
@@chrisalexiuk Good points, thank you for answering.
@s11-informationatyourservi44
@s11-informationatyourservi44 Жыл бұрын
sub’d and even hit the spam bell
@talharuzgarakkus7768
@talharuzgarakkus7768 11 ай бұрын
It doesn't train the entire Lora model, so I came up with an idea to divide the model and train each part on Lora in each epoch. Wouldn't this approach, which requires less RAM like Lora and involves full fine-tuning, yield the same results?
@chrisalexiuk
@chrisalexiuk 11 ай бұрын
It does train all of the attention weights that have been adapted by the LoRA model! We target a specific module, but that module is represented many times throughout the model's architecture.
@alivecoding4995
@alivecoding4995 10 ай бұрын
Is this the basis of why we think model distillation works?
@chrisalexiuk
@chrisalexiuk 10 ай бұрын
AFAIK, no. I haven't dove deep into the original distillation papers, but I believe that process relies on some other methods - though they definitely seem related. I'll have to re-up on the distillation papers though!
@TheSummersault
@TheSummersault Жыл бұрын
Subscribed ;-)
@georgehu8652
@georgehu8652 8 ай бұрын
good
@Srindal4657
@Srindal4657 Жыл бұрын
What about smaller AIs networked together
@chrisalexiuk
@chrisalexiuk Жыл бұрын
Something like that is an excellent idea - especially if you're using different LoRA adapters for the same base model!
@quackat1110
@quackat1110 Жыл бұрын
Holy shit this is huge
@omarnomad
@omarnomad Жыл бұрын
Is it feasible to exchange the pre-trained model as well, as you explained for LoRa in 16:40? By the way, amazing content! Subscribed!
@chrisalexiuk
@chrisalexiuk Жыл бұрын
It is unfortunately not going to be as easy to hots-wap them! Also, the LoRA weights will be architecture specific, and so won't work on with a different base model.
@omarnomad
@omarnomad Жыл бұрын
@@chrisalexiuk I'd be awesome to work on some meta-architecture for such hot-swap use cases. Thanks for answering!
@dandan1364
@dandan1364 Жыл бұрын
So I don’t get why, if the weight matrix ranks is much less than the original matrix. Why doesn’t the larger mega model just use the low rank matrix instead of their monster weight matrix?
@chrisalexiuk
@chrisalexiuk Жыл бұрын
The base model needs that monster weight matrix to be good at many different things! It's the downstream tasks that have intrinsically low rank, according to the cited paper in the LoRA work.
@BR-hi6yt
@BR-hi6yt Жыл бұрын
Yes yes yes - D, K, matrix, 100 etc etc etc - but what does actually LoRa do?
@chrisalexiuk
@chrisalexiuk Жыл бұрын
LoRA learns the decomposed matrices of the target weight matrix training on your downstream task.
@BR-hi6yt
@BR-hi6yt Жыл бұрын
@@chrisalexiuk Sure - decomposed matrices .... lol
@chrisalexiuk
@chrisalexiuk Жыл бұрын
I'm sorry! I'm missing something here lol What precisely do you mean by what does it actually do? I must've misunderstood your original comment, sorry!
@alkodjdjd
@alkodjdjd 7 ай бұрын
as clear as mud
@tommyshelby6277
@tommyshelby6277 11 күн бұрын
doing a f'cking god's work
LoRA explained (and a bit about precision and quantization)
17:07
What is Apache Kafka®?
11:42
Confluent
Рет қаралды 335 М.
когда достали одноклассники!
00:49
БРУНО
Рет қаралды 1,7 МЛН
Trágico final :(
01:00
Juan De Dios Pantoja
Рет қаралды 33 МЛН
Why? 😭 #shorts by Leisi Crazy
00:16
Leisi Crazy
Рет қаралды 45 МЛН
Low Rank Decompositions of Matrices
14:41
Barry Van Veen
Рет қаралды 10 М.
Insights from Finetuning LLMs with Low-Rank Adaptation
13:49
Sebastian Raschka
Рет қаралды 3,9 М.
Mamba Language Model Simplified In JUST 5 MINUTES!
6:14
Analytics Camp
Рет қаралды 4,8 М.
What is LoRA? Low-Rank Adaptation for finetuning LLMs EXPLAINED
8:22
AI Coffee Break with Letitia
Рет қаралды 32 М.
Low-Rank Adaptation - LoRA explained
10:42
AI Bites
Рет қаралды 7 М.
LoRA & QLoRA Fine-tuning Explained In-Depth
14:39
Entry Point AI
Рет қаралды 23 М.
Planck Stars: Alive Inside a Black Hole
17:17
Astrographics
Рет қаралды 55 М.
What percentage of charge is on your phone now? #entertainment
0:14
Выложил СВОЙ АЙФОН НА АВИТО #shorts
0:42
Дмитрий Левандовский
Рет қаралды 847 М.
How Neuralink Works 🧠
0:28
Zack D. Films
Рет қаралды 31 МЛН
Трагичная История Девушки 😱🔥
0:58
Смотри Под Чаёк
Рет қаралды 373 М.
XL-Power Best For Audio Call 📞 Mobile 📱
0:42
Tech Official
Рет қаралды 772 М.