Fine-tuning LLMs with PEFT and LoRA

Рет қаралды 116,199

Күн бұрын

LoRA Colab : colab.research.google.com/dri...
Blog Post: huggingface.co/blog/peft
LoRa Paper: arxiv.org/abs/2106.09685
In this video I look at how to use PEFT to fine tune any decoder style GPT model. This goes through the basics LoRa fine-tuning and how to upload it to HuggingFace Hub.
My Links:
Twitter - / sam_witteveen
Linkedin - / samwitteveen
Github:
github.com/samwit/langchain-t...
github.com/samwit/llm-tutorials
00:00 Intro
00:04 - Problems with fine-tuning
00:48 - Introducing PEFT
01:11 - PEFT other cool techniques
01:51 - LoRA Diagram
03:25 - Hugging Face PEFT Library
04:06 - Code Walkthrough

Пікірлер: 114

@autonomousreviews2521 Жыл бұрын

You continue to make videos on exactly the things I'm trying to understand more deeply! Fantastic! There are a lot of detailed parameters in this video that you could certainly continue to elaborate on for those of us who aren't programmers...yet :) Looking forward to more of your vids!

@redfield126 Жыл бұрын

Perfect balance of theory and hands-on with a colab attached to most of your videos. Much Much apreciated. I recommend this channel to all people who wants to follow this crazy trend of LLM releases. the best path to keep all of us up to date! I learn so much thanks to you Sam. Thanks a ton. Keep moving forward.

@impolitevegan3179 Жыл бұрын

This is great. Not so many channels on YT that do this kind of stuff. Would appreciate more like this, other frameworks like deepspeed, useful datasets, training parameters experiments, etc. so many interesting stuff that is not covered on YT.

@victarion1571 Жыл бұрын

Sam, thanks for giving your audience their requests! The alpaca training video you made makes much more sense now

@kennethleung4487 Жыл бұрын

Awesome! been waiting for your take on this topic

@nacs Жыл бұрын

Many have said it but I'll reiterate -- your LLM videos are really great to watch, both the pace and the way you go from high level overviews to the detailed info. I also appreciate that it's not just focused on ChatGPT/GPT-4/hosted-models all the time and talks more about local training/finetuning/inferencing.

@briancase6180 Жыл бұрын

So this seems like the basis for a business: offer to train a custom model for product documentation, FAQ, etc with a specific product or company focus. Cool!

@Hypersniper05 Жыл бұрын

Or close domain semantic search with summarization

@handsanitizer2457 Жыл бұрын

@E Marrero can you explain that a bit more. I'm new to the machine learning space

@Hypersniper05 Жыл бұрын

@@handsanitizer2457 It's a bit too much to explain here but search in youtube for "openai embeddings" or "embedding searches" and you will have a general idea of how models can be used for searches, not only for open ai but other open source models as well. Fine tuning a model on close domain will help it understand your company's data better. You can also fine tune it to reply back in a certain way which opens the door to many options. Chatgpt was trained this way but more in conversational outputs

@ArjunKrishnaUserProfile Жыл бұрын

Does chatbase use this technique? It does the training on website or file data very fast.

@Hypersniper05 Жыл бұрын

@@ArjunKrishnaUserProfile I am pretty sure it doesn't train the model , that would be way more expensive than embedding

@PattersML 11 ай бұрын

Awesome explanation, this is exactly what I was looking for. Thank you!

@notanape5415 10 ай бұрын

Thanks for the awesome explanation. Going to binge your videos.

@coolmcdude Жыл бұрын

i would love to see more videos about this and showing people how we could adapt this to our own projects and maybe even a video about 4bit tuning.

@kaiman99919 Жыл бұрын

Thank you! Be great to see more on the data section - everyone always seems to gloss over that part, despite the fact that is clearly the most important part. Seen a lot of (from diff youtubers) 20-40 min vids on the configuration, barely mentioning the actual use of the data?

@saracen9 Жыл бұрын

Awesome stuff Sam. I’m in the process of using langchain to build a vector store and - whilst it’s fine for now - would be really interested in understanding the best way to then take this and use to generate a LORA. Feels like the logical next step.

@Secplavory-Wei Жыл бұрын

This really useful, Thank you!

@quebono100 Жыл бұрын

Wow thank you for you work

@autonomousreviews2521 Жыл бұрын

I would love a vid covering examples of the differently formatted types of datasets that can be used to train a lora and the types of abilities that the different kinds of dataset training will allow - or put another way - what kinds of behavioral changes in abilities can we use lora to fine-tune for in a model, and how do we then know what types of data formatting to use in order to get a chosen outcome. :D

@geekyprogrammer4831 Жыл бұрын

Very underrated channel. You deserve more viewers and subs.

@samwitteveenai Жыл бұрын

Thanks for the kind words.

@abhirj87 Жыл бұрын

very useful!! thanks a ton

@chavita4321 Жыл бұрын

So badass. Thanks!

@sundarramanp3057 Жыл бұрын

Can you create more videos on instruction-prompt-tuning as well, as a further extension to this video? Amazing work!

@caiyu538 8 ай бұрын

Great lectures.

@wilfredomartel7781 Жыл бұрын

Excellent!

@definty 9 ай бұрын

Hey Sam, Thanks for the great informative video as always! Do you know of a way to see which neurons get activated during training? I am because I was thinking of ways to reduce the big models and the most obvious way I could think of would be to view which neurons are getting activated when training especially with falcon 170b, even 32b is to big for me and considering I don't need multiple languages I was hoping this would be a good approach to reduce the size of models? It would be cool to see a Brain Surgeon type debugger for LLMs. It would be good to run a different training datasets through different llms to see which neurons get activated and which ones do not and ideally have a way to disable them during inference to test and measure the differences of the output.

@JonathanYankovich Жыл бұрын

I’d love a quick video like this on how to use checkpoints from PEFT training to do inference. When I’m training, I’m never sure how much is too much, and I can save checkpoints automatically easily to resume in case training stops. What I need to learn is how to use these checkpoints with the base model to do inference so I can test output quality against several checkpoints. Ideally I’d like to be able to do inference on a base model plus checkpoint, and then once I find a good result, merge the checkpoint into the base model so I can use it in production and keep VRAM low. (I am assuming inference on base model + checkpoint will use more vram)

@user-lw1zt4ov3i 8 ай бұрын

This is a great way to understand how we can fine-tune a text classification task using an LLM. I want to know if there is a method through which we can make the LLM learn from data in JSON format, where there are multiple labels for information retrieval or conversational recommendation tasks.

@JonathanYankovich Жыл бұрын

These fine-tuning-related topics are especially relevant to me right now. Currently training llama-30b variants at 4-bit. I’m very interested in how to roll adapters/checkpoints back into base models to keep VRAM usage down during inference (under 24GB)

@MridulSharmaMID Жыл бұрын

Hi I am also interested. Can we connect via email?

@PavanAtGrowexx 6 ай бұрын

Hey, I am also facing the same issue, did you find any update and could help me out please?

@rajivmehtapy Жыл бұрын

Very rare videos found on youtube for this topic.

@samwitteveenai Жыл бұрын

this is the first of a few on the topic

@joshmabry7572 Жыл бұрын

This is gold! Thank you!

@joshmabry7572 Жыл бұрын

I'm looking to train the Wizard-Vicuna models but run into `ValueError: The following `model_kwargs` are not used by the model: ['token_type_ids']`

@samwitteveenai Жыл бұрын

This could be because they have already folded a LoRA in there or the base model setup is different.

@bookfastorg Жыл бұрын

How to re-train it with additional data? Great video!

@nguyenanhnguyen7658 Жыл бұрын

Great quick tutorial. This is good for English-only pretraining/fine-tuning. What is about non-English ? What are steps should we take to (1) extend vocab (2) pretraining (with or without LoRA) free-non-structure-text corpus (3) fine-tune with LoRA for each task ... ! Would love to have your tutorial on this road, it would be great. Thanks, Steve.

@pawe460 Жыл бұрын

How does LoRA differ from transfer learning? If I understand correctly TL means adding additional layers onto frozen pre-trained network and training it on new dataset, right?

@MariuszWoloszyn Жыл бұрын

LoRa is not adding additional weights. Although it might seems so while training, at inference there are no additional parameters. It acts more like diff and patch (though in vector space).

@yth2011 11 ай бұрын

Thanks a lot~

@clementvanpeuter1742 Жыл бұрын

Love It.

@selinatian6607 Жыл бұрын

very great tutorial! with the saved pretrained model, how do we make prediction for classification problems?

@samwitteveenai Жыл бұрын

You can do that with a much simper model like BERT etc. or a T5 or structure the data to do it with the causal LM

@Aldraz Жыл бұрын

How many examples is necessary in dataset for it to learn the certain pattern? With OpenAI you are fine with just 200 examples, which I don't think would work here.

@ronyosef3806 Жыл бұрын

Hi Sam, thanks for the great video. I got a general question you might know the answer to. If I freeze pre-trained model weights (for example, BERT) and then train a classifier on top of its embeddings, does that called fine-tuning? If the weights are unfrozen, I know this can be called fine-tuning.

@samwitteveenai Жыл бұрын

you can freeze some of the weights and tune the top layer etc and it is fine tuning yes.

@ArunkumarMTamil 2 ай бұрын

how is Lora fine-tuning track changes from creating two decomposition matrix? How the ΔW is determined?

@edd36 Жыл бұрын

Hey, sorry I late to the party. I tried to load my LoRA model but when I checked the weights, the weights are the same with the original model. Is it supposed to do that? I already checked with my after-trained model and yes the weights is different.

@wilfredomartel7781 Жыл бұрын

Maybe a tutorial to integrate langchain with flan but accesing an api rest to query data.

@tawnkramer Жыл бұрын

Does anyone know the proper settings for generation with the story model? Mine tends to start ok and then becomes word spew halfway though.

@theunknown2090 Жыл бұрын

Hey man great video, hd a question do.u think a 500 m or 1b model could give good results similar to alpaca. What would be the smallest size a model can follow instructions?

@samwitteveenai Жыл бұрын

Its a really interesting question and something I am currently doing research on. 500 is probably too small. 1.5B things get a bit more interesting. The big challenge with smaller models is you can't expect them to know facts correctly. So you want to use them more as retrieval generation models. They can do language but need to have the facts and context fed in at generation time etc.

@theunknown2090 Жыл бұрын

The cerebras-gpt models are really fast compared to gpt2, gpt-neo in inference like a cerebras2.7b inference speed is almost equal to gpt1.5b and gptneo 1.3B

@ifeanyiidiaye1889 10 ай бұрын

How do you handle "CUDA out of memory" error in free Colab notebook?

@tisajokt7676 Жыл бұрын

If the only difference is in these added-on weights, is it possible to run multiple distinct finetuned models at the same time without duplicating the shared base pretrained model in memory?

@samwitteveenai Жыл бұрын

Yes this is trick we are working on for production. You have multiple LoRA weights for different tasks etc. Very much beyond the scope of here though.

@thisurawz 6 ай бұрын

Can you do a video on finetuning a multimodal LLM (Video-LlaMA, LLaVA, or CLIP) with a custom multimodal dataset containing images and texts for relation extraction or a specific task? Can you do it using open-source multimodal LLM and multimodal datasets like video-llama or else so anyone can further their experiments with the help of your tutorial. Can you also talk about how we can boost the performance of the fine-tuned modal using prompt tuning in the same video?

@yashjain6372 11 ай бұрын

Very informative!!!! does fine tunning with qlora/lora does support this kind of dataset? If not, what changes should i make in my output dataset? Review(col1) Nice cell phone, big screen, plenty of storage. Stylus pen works well. Analysis(col2) [{“segment”: “Nice cell phone”,“Aspect”: “Cell phone”,“Aspect Category”: “Overall satisfaction”,“sentiment”: “positive”},{“segment”: “big screen”,“Aspect”: “Screen”,“Aspect Category”: “Design”,“sentiment”: “positive”},{“segment”: “plenty of storage”,“Aspect”: “Storage”,“Aspect Category”: “Features”,“sentiment”: “positive”},{“segment”: “Stylus pen works well”,“Aspect”: “Stylus pen”,“Aspect Category”: “Features”,“sentiment”: “positive”}]

@debashisghosh3133 Жыл бұрын

In LoraConfig() method r is not the number of attention head instead it is the rank of the matrix that your are decomposing. From High Rank to LowRank. Here rank is 16.

@returncode0000 11 ай бұрын

5:23 is it possible to train this on a Nvidia RTX 4090 FE (24GB RAM)?

@ShlomiSchwartz 11 ай бұрын

Hi Sam, thank you for the video. I'm getting RuntimeError: expected scalar type Half but found Float running in Colab with GPU 0: Tesla V100-SXM2-16GB (UUID: GPU-a971aa0c-5408-727a-3b72-48b1926b5f66) On the training loop, what am i missing?

@ShlomiSchwartz 11 ай бұрын

It was a GPU issue, switching GPUs fixed it

@samwitteveenai 11 ай бұрын

YeahI don't think the Bitsnbytes fully supports v100 GPUs I have had issues with it in the past.

@richardrgb6086 11 ай бұрын

Hello! Can you fine-tuning T5?

@micbab-vg2mu Жыл бұрын

In the past, I tried fine-tuning some GPT models, but the results weren't good. Maybe this new technique will give me a better outcome

@samwitteveenai Жыл бұрын

fine-tuning comes down a lot to what you are tuning on and how much etc. LoRa has a lot of advantages and certainly worth a try.

@haticeobuz9081 Жыл бұрын

Hi, I have a question for you. When/or will you be uploading the video about seq2seq models? I would like to see that one as well!

@samwitteveenai Жыл бұрын

Yes I promised this and I will get to it. Will try to do it this week. Please remind me if I don't. Too many new LLMs and cool papers being released :D

@haticeobuz9081 Жыл бұрын

@@samwitteveenai Okay, thank you so much.

@Fearfulful Жыл бұрын

can you edit LoRa to LoRA in the tiitle? I was really confused for a second saying to myself what does long range radio do with LLMs

@samwitteveenai Жыл бұрын

lol done, thanks for pointing it out.

@JosePablo2008 10 ай бұрын

What is the minimum GPU RAM Memory to run this code? I think I need a new GPU to run this on my local machine

@caiyu538 9 ай бұрын

👍

@yth2011 9 ай бұрын

what is the difference between lora and embedings?

@IzittoCh Жыл бұрын

Would it be practical to train a small model on a 1660 super 6gb? I just want to add a personality for a home voice assistant

@samwitteveenai Жыл бұрын

probably not train it, you might be able to do some inference with that but training it on something with more VRAM etc

@ranu9376 Жыл бұрын

Great! Can we merge the peft weights with the actual weights and use it for inference? any downside to it other than the size? Also, wouldn't the weights get tampered if we save them locally instead and use it for inference?

@samwitteveenai Жыл бұрын

yes you can do that. I might show that in a future video. no big downside for most use cases. Saving the LoRa weights locally as when you load them they will load the original weights as well. Not sure what you mean by tampered.

@kutilkol Жыл бұрын

Awesome video! 12:10 The loss was not goin down tho brother..., try to update the video with model training converging. This one clerly did not

@user-rw5sk8fv4s 11 ай бұрын

I have two sample dataset like bello 1) [{ "en": "Hello, how are you today?", "fr": "Bonjour, comment ça va aujourd'hui ?" },...] 2) [ { "text": "Ravi is a young man from India who loves panipuri." },... ] so how can i fine tune above dataset using falcon llm model Please help me

@user-wr4yl7tx3w Жыл бұрын

Why is it called causal?

@nayakdonkey Жыл бұрын

@samwitteveenai I encounter RuntimeError: expected scalar type Half but found Float while running the training script specified in the colab notebook. Can you please helpme with pointers to solve the error. I am running in Colab (GPU 0: Tesla V100-SXM2-16GB)

@samwitteveenai Жыл бұрын

Ok v100s had some problems with the 8bit part in the past, so it could be that.

@nayakdonkey Жыл бұрын

@@samwitteveenai Thanks for the acknowledgement

@SubhamKumar-eg1pw Жыл бұрын

@@nayakdonkey Were you able to solve the above RuntimeError? I am facing the same with V100 machine

@shivamkumar-qp1jm Жыл бұрын

Can I train any llm model from hugging face like llama model

@samwitteveenai Жыл бұрын

yes with most of them.

@desrucca 7 ай бұрын

I finetuned BART, but the model output was extactly the same as the input ids. Whats possibly wrong ?

@RushikeshTade 2 ай бұрын

Did you merge weights?

@biswachat8521 Жыл бұрын

At 10:19, why did you pass in data['train'] as train_dataset? How is the training process going to know that data['train']['quote'] is the feature and data['train']['prediction'] is the target?

@PavanAtGrowexx 6 ай бұрын

Did you find any solution? I have the same query

@dolby360 Ай бұрын

I also have the same query

@Dygit Жыл бұрын

bitsandbytes seems to have lots of issues in terms of compatibility with various CUDA versions and outright doesn't support windows directly

@samwitteveenai Жыл бұрын

Yes they don't support the older GPUs that well either

@hosseinaboutalebi9998 7 ай бұрын

Hi Sam, Can you provide more videos on fine tunning? Especially with Mistra-Orca model. I like your videos very much. Thanks for sharing them.

@samwitteveenai 7 ай бұрын

Yeah I have been meaning to do this for a while. Next week will do some new ones.

@hosseinaboutalebi9998 7 ай бұрын

@@samwitteveenai Thanks so much Sam.

@bilalpenbegullu2851 11 ай бұрын

Finally something real...

@limitlesslife7536 11 ай бұрын

great video! I actually was hitting an error while trying to finetiune Dolly 2.0 model : RuntimeError: Expected to mark a variable ready only once. This error is caused by one of the following reasons: 1) Use of a module parameter outside the forward function. Please make sure model parameters are not shared across multiple concurrent forward-backward passes 2) Reused parameters in multiple reentrant backward passes. For example, if you use multiple checkpoint functions to wrap the same part of your model, it would result in the same set of parameters been used by different reentrant backward passes multiple times, and hence marking a variable ready multiple times. DDP does not support such use cases yet. this was fixed by commenting: model.gradient_checkpointing_enable() do you know why that might be the issue?

@samwitteveenai 11 ай бұрын

That video is quite old now, I think they have updated the library. I will try to take a look at it at some point. I am currently making some new Fine tuning vids so they should be out within a week.