Run LLAMA-v2 chat locally

  Рет қаралды 35,127

Abhishek Thakur

Abhishek Thakur

Жыл бұрын

In this video, I'll show you how you can run llama-v2 13b locally on an ubuntu machine and also on a m1/m2 mac. We will be using llama.cpp for this video.
Please subscribe and like the video to help me keep motivated to make awesome videos like this one. :)
My book, Approaching (Almost) Any Machine Learning problem, is available for free here: bit.ly/approachingml
Follow me on:
Twitter: / abhi1thakur
LinkedIn: / abhi1thakur
Kaggle: kaggle.com/abhishek

Пікірлер: 60
@abhishekkrthakur
@abhishekkrthakur Жыл бұрын
Please subscribe and like the video to help me keep motivated to make awesome videos like this one. :)
@anandteerthrparvatikar5359
@anandteerthrparvatikar5359 Жыл бұрын
This is fast, quite helpful, Thanks Abhishek
@hanspeter-oi2ku
@hanspeter-oi2ku Жыл бұрын
instruct from 4.44 ./main -ins \ -f ./prompts/alpaca.txt \ -t 8 \ -ngl 1 \ -m llama-2-13b-chat.ggmlv3.q4_0.bin \ --color \ --c 2048 \ --temp 0.7 \ --repeat_penalty 1.1 \ -s 42 \ -n -11
@ddamyanov
@ddamyanov Жыл бұрын
Thanks a lot!
@NicholasRenotte
@NicholasRenotte 6 ай бұрын
You're the real MVP Abhishek!! Mixtral is crazy fast with 2bit quant on an M1, couldn't believe it.
@codex6634
@codex6634 Жыл бұрын
Thank you so much for this material. I was thinking, Is it possible run llama cpp + llama 2 + hugginhface chat ? If that's okay with you, would you mind making a video about it?
@mathematicalninja2756
@mathematicalninja2756 11 ай бұрын
examples/server has frontend too, I deployed that behind a REST API
@2dapoint424
@2dapoint424 Жыл бұрын
I get this error: error loading model: unknown (magic, version) combination: 4f44213c, 50595443; is this really a GGML file? any idea how to fix it?
@computerai1247
@computerai1247 11 ай бұрын
@abhishekkrthakur are these .bin files contain the model weights
@niazhimselfangels
@niazhimselfangels Жыл бұрын
I think we're missing a window you intended to share at around 3:20 on how to download the file, but this is great! 🤩
@abhishekkrthakur
@abhishekkrthakur Жыл бұрын
yeah, it was just the model page. shouldnt matter if you follow all the commands
@1ofallkind
@1ofallkind Жыл бұрын
How to prepare dataset for tabular question answer for fine tuning?. Please assist.
@rsjenwar
@rsjenwar 9 ай бұрын
@abhishekkrthakur llama.cpp has also introduced finetune few days back. Can you please make a tutorial on finetuning llama.cpp on local machine (e.g. mac m1) ?
@kumariyengar8894
@kumariyengar8894 Жыл бұрын
I just tried this on my MacBook Pro 2.6 GHz 6-Core Intel Core i7. When i run the command, i get the error "error loading model: llama.cpp: tensor 'layers.4.ffn_norm.weight' is missing from model". Is this because it is an Intel core?
@abhishekarvind
@abhishekarvind Жыл бұрын
how do i input images to the model?
@user-ef5nt7bl3j
@user-ef5nt7bl3j Жыл бұрын
Can the 7B model run on edge devices like a raspbery pi or a jetson nano?
@SatishKumar-fr8mj
@SatishKumar-fr8mj 11 ай бұрын
is it possible?
@RG-ik5kw
@RG-ik5kw Жыл бұрын
Awesome stuff! Are you “TheBloke” ? :) Also what’s your mac specs? Will it work on an Intel mac 2019 ?
@abhishekkrthakur
@abhishekkrthakur Жыл бұрын
No. Im not 🙂 I ran it on m2 mac. not sure about intel. i dont think it will work. but do try and let us know :)
@jennilthiyam1261
@jennilthiyam1261 5 ай бұрын
what can we do if we want to make it run on GPU?
@SANJIVRAI6693
@SANJIVRAI6693 11 ай бұрын
is there a windows tutorial ?
@Rohit-dp3bu
@Rohit-dp3bu 10 ай бұрын
Hi , i am using llama CPP python. But want to utilise my GPU and CPU at max extent . What parameters should I set to llama . Like n_threads and layer etc I have 3050 4gb + ryzen 7 4800 Using llama 2 7b 4_0 quantized model by bloke
@jennilthiyam1261
@jennilthiyam1261 5 ай бұрын
were you able to use it on GPU?
@sarnathk1946
@sarnathk1946 Жыл бұрын
cuBLAS - BLAS stands for Basic Linear Algebra Subroutines. CU stands for NVIDIA CUDA Technology
@ashu-
@ashu- Жыл бұрын
"dash" vs "minus" :p
@kimi6299
@kimi6299 Жыл бұрын
Which GPU did you use? Is it possible to run 70B locally by using two GPUs?
@greenbillugaming2781
@greenbillugaming2781 11 ай бұрын
no. you atleast need A100 GPU
@AnkitSingh-gw6fj
@AnkitSingh-gw6fj Жыл бұрын
Sir, can you also do a working demo of building a local chatbot by training a model on private datasets like pdf, one note, word etc.
@abhishekkrthakur
@abhishekkrthakur Жыл бұрын
previous videos already handled that. please take a look 🙂
@AnkitSingh-gw6fj
@AnkitSingh-gw6fj Жыл бұрын
@@abhishekkrthakur That's great. Will take a look now. Thank you
@joe7843
@joe7843 11 ай бұрын
Is there a python version for llama.cpp?how would you approach making this accessible via an api?
@sarnathk1946
@sarnathk1946 Жыл бұрын
make -j 8 Will compile using 8 cores in parallel. Straight away fast compilation
@developonetwork
@developonetwork Жыл бұрын
cc: fatal error: cannot execute ‘cc1obj’: execvp: No such file or directory compilation terminated. make: *** [Makefile:286: ggml-metal.o] Error 1 facing this error
@averytheloftier
@averytheloftier Ай бұрын
did you ever figure this out?
@InspireAndGrow
@InspireAndGrow Жыл бұрын
I'm getting: error loading model is this really a GGML file?
@abhishekkrthakur
@abhishekkrthakur Жыл бұрын
invalid file?
@InspireAndGrow
@InspireAndGrow Жыл бұрын
@abhishekkrthakur I deleted my file and ran wget again and it worked 👍
@robosergTV
@robosergTV Жыл бұрын
link to the repo?
@abhishekkrthakur
@abhishekkrthakur Жыл бұрын
just commands. no repo. sorry
@yeahyes55
@yeahyes55 Жыл бұрын
Are you using a VM?
@abhishekkrthakur
@abhishekkrthakur Жыл бұрын
nope
@manavshah9062
@manavshah9062 Жыл бұрын
How can i run it on linux with just CPU?
@abhishekkrthakur
@abhishekkrthakur Жыл бұрын
just dont use cuda flag. everything else remains the same
@deepakkajla
@deepakkajla Жыл бұрын
Encoding bitrate seems too low, during scroll the screen becomes pixelated
@abhishekkrthakur
@abhishekkrthakur Жыл бұрын
used a different device to record today. everything except scrolling should be fine. good that i didnt scroll too much lol
@moonlight7684
@moonlight7684 Жыл бұрын
I am running this on Macbook AIr M2, the responses are really slow
@abhishekkrthakur
@abhishekkrthakur Жыл бұрын
did you compile with LLAMA_METAL=1 param?
@moonlight7684
@moonlight7684 Жыл бұрын
@@abhishekkrthakur yes I did, but did change ngl to -1, else it was giving an error
@abhishekkrthakur
@abhishekkrthakur Жыл бұрын
ngl=1?
@christopheprat378
@christopheprat378 Жыл бұрын
@@abhishekkrthakur Got the same problem I compiled using the LLAMA_METAL=1 param and when I set -ngl 1, the program crashed as too much memory was used by Metal (on a MacBook Air M2 8GB of RAM). I think Metal can't use more than half of the RAM. Is there something I am missing?
@NehilSood
@NehilSood Жыл бұрын
yes! responses are really slow I even tried this on ubuntu machine with nvidia A10G, same delay @abhishekkrthakur please help us thru this once
@staviq
@staviq Жыл бұрын
nvidia-smi locks your GPU when you run it, so running it on a loop causes a giant hiccup every second Don't do it :)
@sarnathk1946
@sarnathk1946 Жыл бұрын
Can we finetune on local machine?
@abhishekkrthakur
@abhishekkrthakur Жыл бұрын
yes
@sarnathk1946
@sarnathk1946 Жыл бұрын
@@abhishekkrthakur Vow! Thats a great thing! Thanks for the video! You are awesome!
@longervisiontechnology3568
@longervisiontechnology3568 Жыл бұрын
Python version?
@abhishekkrthakur
@abhishekkrthakur Жыл бұрын
3.9, 3.10. both should work
Ollama - Local Models on your machine
9:33
Sam Witteveen
Рет қаралды 81 М.
Unleash the power of Local LLM's with Ollama x AnythingLLM
10:15
Tim Carambat
Рет қаралды 110 М.
Amazing weight loss transformation !! 😱😱
00:24
Tibo InShape
Рет қаралды 53 МЛН
УГАДАЙ ГДЕ ПРАВИЛЬНЫЙ ЦВЕТ?😱
00:14
МЯТНАЯ ФАНТА
Рет қаралды 3,9 МЛН
Slow motion boy #shorts by Tsuriki Show
00:14
Tsuriki Show
Рет қаралды 8 МЛН
All You Need To Know About Running LLMs Locally
10:30
bycloud
Рет қаралды 129 М.
Using docker in unusual ways
12:58
Dreams of Code
Рет қаралды 421 М.
Getting Started on Ollama
11:26
Matt Williams
Рет қаралды 44 М.
NN: llama cpp python
7:29
Йоши Тадамори
Рет қаралды 276
Run Llama 2 on local machine | step by step guide
7:02
Code With Aarohi
Рет қаралды 35 М.
Run your own AI (but private)
22:13
NetworkChuck
Рет қаралды 1,3 МЛН
The U-Net (actually) explained in 10 minutes
10:31
rupert ai
Рет қаралды 90 М.
Hugging Face GGUF Models locally with Ollama
4:56
Learn Data with Mark
Рет қаралды 21 М.
Amazing weight loss transformation !! 😱😱
00:24
Tibo InShape
Рет қаралды 53 МЛН