How To Run Llama 3 8B, 70B Models On Your Laptop (Free)

  Рет қаралды 10,610

School of Machine Learning

School of Machine Learning

Ай бұрын

Written guide: schoolofmachinelearning.com/2...
Unlock the power of AI right from your laptop with this comprehensive tutorial on how to set up and run Meta's latest LLaMA models (8B and 70B versions). We will use Ollama to run these models locally on your laptop and that too for free.
What You'll Learn:
- An overview of LLaMA models and their capabilities.
- Step-by-step instructions on setting up your system for LLaMA 3.
- Tips on optimizing performance for both the 8B and 70B models.
- Troubleshooting common issues to ensure a smooth operation.
#LLaMA3 #MetaAI #AITutorial #MachineLearning #Coding #TechTutorial

Пікірлер: 38
@PJ-hi1gz
@PJ-hi1gz Ай бұрын
Informative and straight to the point, thank you!
@SchoolofMachineLearning
@SchoolofMachineLearning Ай бұрын
thank you :)
@mustafamohsen
@mustafamohsen Ай бұрын
Thank you for the guide, great stuff! Just a heads up, there's a slight error in the command table within the written guide. The command for the 70B should be `ollama run llama3:70b` instead of `ollama run llama3:8b`
@SchoolofMachineLearning
@SchoolofMachineLearning Ай бұрын
Thanks, fixed!
@MiraclesofCreation
@MiraclesofCreation 27 күн бұрын
nice guide with easy written instruction thanks
@SchoolofMachineLearning
@SchoolofMachineLearning 27 күн бұрын
Glad you liked it
@sphansel3257
@sphansel3257 Ай бұрын
most underrated channel. you deserve way more dude!☺
@SchoolofMachineLearning
@SchoolofMachineLearning Ай бұрын
thank you :)
@dosomethingwild4999
@dosomethingwild4999 16 күн бұрын
NEAT!
@gamersdepo3892
@gamersdepo3892 Күн бұрын
Hey i want to use the ollama version in my jupyter notebook, and just like we use the other models through api, i want to use it in my notebook for doing some continuous task, so how to do that? and also running it on a gpu would be much faster, just like we use models from the transformers, but i don't want to use transformers, but the model which i have loaded form the ollama, just like you did it in the video, bcuz i think that will save time and downloads also, can we do that?
@nqaiser
@nqaiser Ай бұрын
Hello, What would be recommended hardware specs to run Llama 3 70b at good performance for multiple users(~5 users).
@SchoolofMachineLearning
@SchoolofMachineLearning Ай бұрын
For what you require it makes more sense to call Llama via an API as it will be much cheaper. It's currently $0.64/1M input and $0.80/1M output tokens on Groq AI (that's the cheapest one I've seen). For hardware, I haven't built anything like that so not sure, maybe an A100? :D But for a single user from what I've seen online, good specs are: An Apple M2 Ultra w/ 24-core CPU, 60-core GPU, 128GB RAM (costs $8000 with the monitor) runs Meta-Llama-3-70B-Instruct.Q4_0.llamafile at 14 tok/sec (prompt eval is 82 tok/sec).
@nqaiser
@nqaiser Ай бұрын
@@SchoolofMachineLearning the sort of application I am considering requires an onpremise deployment so deploying it in cloud/consuming via api isn't an option. I am a bit more inclined towards Linux/Windows ecosystem. What would be the total VRAM/Ram required for the 70b model. Also does using 4bit quantized model result in some loss of accuracy, is that noticeable in the output?
@qtUnluckyThreshh
@qtUnluckyThreshh Ай бұрын
Does it have an endpoint I can access from localhost so I can make my own html interface?
@SchoolofMachineLearning
@SchoolofMachineLearning Ай бұрын
Meta doesn't directly provide an API access but you can access via Groq/Replicate/Microsoft/Databricks.
@WatsitTooyah
@WatsitTooyah 22 күн бұрын
open webui already exists too
@Muzick
@Muzick Ай бұрын
I've installed the 70B model on my desktop which has 64GB of memory. But it is running super slow. Any tips? Thanks!
@SchoolofMachineLearning
@SchoolofMachineLearning Ай бұрын
The short answer is to get a more powerful GPU :D
@swarupkumar2
@swarupkumar2 Ай бұрын
​@@SchoolofMachineLearningwhat should be the minimum GPU? Is RTX 3060 12GB enough?
@SchoolofMachineLearning
@SchoolofMachineLearning Ай бұрын
I don't think that is going to be enough. By default, Ollama downloads a 4-bit quant. Which for Llama 3 70B is 40 GB. Your GPU has only 12 GB of VRAM, so the rest has to be offloaded into system RAM, which is much slower. You have two options: - Use the 8B model instead (ollama run llama3:8b) - Use a smaller quant (ollama run llama3:70b-instruct-q2_K)
@schmutz06
@schmutz06 Ай бұрын
I ran into the same, and having looked around it appears £20-30K GPUs with ~40GB VRAM are the type you'd need to manage the 70b model. It is, after all, 40GB of data; where your GPU is insufficient, this will be loaded to your RAM, which is exponentially slower than video card memory at performing this work.
@schmutz06
@schmutz06 Ай бұрын
@@SchoolofMachineLearning what is that q2_K? i have a 12GB 3080Ti, is that the best option for me? I read some who attempted this found the 7b model was superior.
@ElcoolMo
@ElcoolMo Ай бұрын
forgive me I am new to coding, but could i get it running outside the terminal so it can have a nice GUI
@SchoolofMachineLearning
@SchoolofMachineLearning Ай бұрын
Yes, you can. Here is a tutorial for a nice interface using webUI: github.com/open-webui/open-webui. You can also directly use on Meta.ai.
@thesattary
@thesattary 18 күн бұрын
I'm jealous of your internet speed bro :(
@SchoolofMachineLearning
@SchoolofMachineLearning 18 күн бұрын
haha :)
@hunterking4228
@hunterking4228 Ай бұрын
Can I run 8B on my 8GB memory. Will it work ? I dont mind it being slow
@SchoolofMachineLearning
@SchoolofMachineLearning Ай бұрын
It will have extremely poor performance, even then I don't think you will be able to run. But you can give it a shot.
@nastastic
@nastastic Ай бұрын
I tried it and it's a waste of time. Computer freezes with simple commands and takes ages to come out of freeze. m3 macbook pro with 8gb ram
@juritronics
@juritronics Ай бұрын
doesn't it have an API that we can use instead of installing it in our own pc's
@SchoolofMachineLearning
@SchoolofMachineLearning Ай бұрын
Meta doesn't provide llama 3 API directly afaik but if you want to try out llama 3 you can do so on Meta.ai. A lot of other companies provide llama 3 API such as Databricks, Replicate, Microsoft, etc.
@maizizhamdo
@maizizhamdo Ай бұрын
groq offre llma 3 70 b for free with api
@Ahduciekwndnbbbsvvvghhhyyyyy
@Ahduciekwndnbbbsvvvghhhyyyyy Ай бұрын
How slow is 70b on your laptop?
@SchoolofMachineLearning
@SchoolofMachineLearning Ай бұрын
The requirements are: - 16GB memory for 8B model. - 32GB memory for 70B model (even then it is very slow). I have not tried the 70B model on my laptop but I'm assuming it is almost not usable.
@behunkydory9966
@behunkydory9966 Ай бұрын
@@SchoolofMachineLearning How can I check memory requirements information about Llama-3 models? Especially I want to know the requirements for 70B model.
@WatsitTooyah
@WatsitTooyah 22 күн бұрын
70B model on 32GB mac m1 max is taking like a minute per word... 8B model is very fast.
host ALL your AI locally
24:20
NetworkChuck
Рет қаралды 636 М.
Unlimited AI Agents running locally with Ollama & AnythingLLM
15:21
Pokey pokey 🤣🥰❤️ #demariki
00:26
Demariki
Рет қаралды 8 МЛН
когда достали одноклассники!
00:49
БРУНО
Рет қаралды 3,5 МЛН
OMG 😨 Era o tênis dela 🤬
00:19
Polar em português
Рет қаралды 11 МЛН
"okay, but I want Llama 3 for my specific use case" - Here's how
24:20
Llama 3 - 8B & 70B Deep Dive
23:54
Sam Witteveen
Рет қаралды 32 М.
iTerm2 AI w/ ollama + llama3 - 3.5.1beta2 release
4:31
Wes Higbee
Рет қаралды 2 М.
LLaMA 3 Tested!! Yes, It’s REALLY That GREAT
15:02
Matthew Berman
Рет қаралды 203 М.
How to Install and test LLaMA 3 Locally [2024]
10:36
CodewithBro
Рет қаралды 20 М.
Run your own AI (but private)
22:13
NetworkChuck
Рет қаралды 1,1 МЛН
Reliable, fully local RAG agents with LLaMA3
21:19
LangChain
Рет қаралды 83 М.
Stop paying for ChatGPT with these two tools | LMStudio x AnythingLLM
11:13
Create your own CUSTOMIZED Llama 3 model using Ollama
12:55
AI DevBytes
Рет қаралды 12 М.
Pokey pokey 🤣🥰❤️ #demariki
00:26
Demariki
Рет қаралды 8 МЛН