Fine-Tuning Meta's Llama 3 8B for IMPRESSIVE Deployment on Edge Devices

Fine-Tuning Meta's Llama 3 8B for IMPRESSIVE Deployment on Edge Devices - OUTSTANDING Results!

Рет қаралды 4,698

16 күн бұрын

This video demonstrates an innovative workflow that combines Meta's open-weight Llama 3 8B model with efficient fine-tuning techniques (LoRA and PEFT) to deploy highly capable AI on resource-constrained devices.
We start by using a 4-bit quantized version of the Llama 3 8B model and fine-tune it on a custom dataset. The fine-tuned model is then exported in the GGUF format, optimized for efficient deployment and inference on edge devices using the GGML library.
Impressively, the fine-tuned Llama 3 8B model accurately recalls and generates responses based on our custom dataset when run locally on a MacBook. This demo highlights the effectiveness of combining quantization, efficient fine-tuning, and optimized inference formats to deploy advanced language AI on everyday devices.
Join us as we explore the potential of fine-tuning and efficiently deploying the Llama 3 8B model on edge devices, making AI more accessible and opening up new possibilities for natural language processing applications.
Be sure to subscribe to stay up-to-date on the latest advances in AI.
My Links
Subscribe: / @scott_ingram
X.com: / scott4ai
GitHub: github.com/scott4ai
Hugging Face: huggingface.co/scott4ai
Links:
Colab Demo: colab.research.google.com/dri...
Dataset: github.com/scott4ai/llama3-8b...
Unsloth Colab: colab.research.google.com/dri...
Unsloth Wiki: github.com/unslothai/unsloth/...
Unsloth Web: unsloth.ai/

Пікірлер: 21

@israelcohen4412 14 күн бұрын

So i never post comments, but the way you explained this was by far the best i have seen online, i wish I found your channel 8 months ago :) Please keep posting videos your explanation is very well thought off and put together.

@EuSouAnonimoCara 6 күн бұрын

Awesome content!

@ratsock 13 күн бұрын

Absolutely fantastic! Really appreciate, detailed, clear breakdown of concrete steps that let us drive value, rather than the clickbait hypetrain that everyone else is on.

@tal7atal7a66 13 күн бұрын

i like the thumbnails, topic types, explains methods, and the mr who explain. nice channel very valuable infos ❤

@andrepamplona9993 13 күн бұрын

Super, hyper fantastic! Thank you.

@Danishkhan-ni5qf 3 күн бұрын

Wow!

@gustavomarquez2269 12 күн бұрын

You are amazing! This is the best explanation about this topic. I liked it and just subscribed. Thank you very much !!!

@scott_ingram 12 күн бұрын

Thank you so much for the kind words and for subscribing, I really appreciate it! I'm so glad you found the video helpful in explaining how to fine-tune LLaMA 3 and run it on your own device. It's a fascinating topic and technology with a lot of potential. I'm looking forward to sharing more content on large language models and AI that you'll hopefully find just as valuable. Stay tuned!

@15ky3 12 күн бұрын

Amazing video, thanks for the best explanation I’ve ever seen on KZfaq. Could you also please make a video how to finetune the phi3 model? 🙏

@scott_ingram 12 күн бұрын

Great suggestion! I will look into that.

@RameshBaburbabu 14 күн бұрын

Thank you so much for sharing that fantastic clip! It was really informative. I'm currently looking into fine-tuning a model with my ERP system, which handles some pretty complex data. Right now, I'm creating dataframes and using panda-ai for analytics. Could you guide me on how to train and make inferences with this row/column data? I really appreciate your time and help!

@scott_ingram 13 күн бұрын

Thanks for your question and for watching the video. I'm glad you found it informative! Your approach largely depends on your use case and the kind of insights you're looking to derive from your data. Generally, you're going to want to follow these steps to train a model with complex data: Decide how you plan to interact with the model. For instance, maybe you're doing text generation; or natural language understanding tasks like sentiment analysis, named entity recognition and question answering; or text summarization; or domain-specific queries like legal, medical or corporate. Choose a model that has high benchmarks for the specific requirements of your task, the nature of your data and the desired output format. A model is more likely to train well if the base model's capabilities are already very strong for the task you intend to use it for. Consider factors like model performance, computational resources, and the availability of pre-trained weights for your specific domain or language. Prepare and preprocess your dataframes, removing/filling missing values, encoding variables numerically and normalizing the data. The cleaner the data, the better the training will be. Split the data into a training set and validation set. The validation set will be data you haven't trained the model on to see how the model performs with unseen data. Fine-tune with your dataset, test the model out, then iterate on the process by tweaking data, adding more data, trying different training parameters, even trying different models. Hope this helps guide you in your endeavor!

@15ky3 11 күн бұрын

Is the output from Ollama on your MacBook in real-time? Or you have speed up in the video? On my 2014 iMac, it is significantly slower. It's about time for a new one. What are the technical specifications of your Mac?

@scott_ingram 11 күн бұрын

Except for the download, which I sped up significantly, everything in terminal was shown in real time. The demo was done on a MacBook Pro M3 Pro Max. YMMV with other hardware.

@madhudson1 14 күн бұрын

rather than using google colab + compute for training, what are your thoughts on using a local machine + GPU?

@guyvandenberg9297 14 күн бұрын

Good question. I am about to try that. I think you need an Ampere architecture on the GPU - (A100 or RTX 3090). Scott, thanks for a great video.

@guyvandenberg9297 14 күн бұрын

Ampere architecture for BF16 as opposed to F16 as per Scott's explanation in the video.

@scott_ingram 13 күн бұрын

Thanks for your question! The notebook is designed to do the training on Colab, but you can run it locally for training if you have compatible hardware; I haven't tested it locally though. The RTX3090 does support brain float. Install python, then set up a virtual environment: python3 -m venv venv source venv/bin/activate Next, install and start the Jupyter notebook service: pip install jupyter jupyter notebook --kernel_name=python3 That will run a local jupyter notebook service and connect to a Python 3 kernel. Then, test GPU availability: import torch print(torch.cuda.is_available()) device = torch.device("cuda" if torch.cuda.is_available() else "cpu") print(f"Using device: {device}") Here's how you would create a tensor with pytorch on the RTX 3090 and tell it to use brain float: tensor = torch.randn(1024, 1024, dtype=torch.bfloat16) Some cells in the notebook won't run correctly, such as the first cell that sets up text wrapping (this cell is not relevant for training); that's designed for Colab specifically. There may be other compatibility issues, but I haven't tested it running locally. This should get you started to see whether your GPU could potentially work. Let me know how it works out!

@PreparelikeJoseph 7 күн бұрын

@@scott_ingram Id really like to get some ai agents running locally on a self hosted model. Im hoping two rtx 3090 can combine just via PCI and load a full 70b model.