Run LLAMA-v2 chat locally

Рет қаралды 35,127

Жыл бұрын

In this video, I'll show you how you can run llama-v2 13b locally on an ubuntu machine and also on a m1/m2 mac. We will be using llama.cpp for this video.
Please subscribe and like the video to help me keep motivated to make awesome videos like this one. :)
My book, Approaching (Almost) Any Machine Learning problem, is available for free here: bit.ly/approachingml
Follow me on:
Twitter: / abhi1thakur
LinkedIn: / abhi1thakur
Kaggle: kaggle.com/abhishek

Пікірлер: 60

@abhishekkrthakur Жыл бұрын

Please subscribe and like the video to help me keep motivated to make awesome videos like this one. :)

@anandteerthrparvatikar5359 Жыл бұрын

This is fast, quite helpful, Thanks Abhishek

@hanspeter-oi2ku Жыл бұрын

instruct from 4.44 ./main -ins \ -f ./prompts/alpaca.txt \ -t 8 \ -ngl 1 \ -m llama-2-13b-chat.ggmlv3.q4_0.bin \ --color \ --c 2048 \ --temp 0.7 \ --repeat_penalty 1.1 \ -s 42 \ -n -11

@ddamyanov Жыл бұрын

Thanks a lot!

@NicholasRenotte 6 ай бұрын

You're the real MVP Abhishek!! Mixtral is crazy fast with 2bit quant on an M1, couldn't believe it.

@codex6634 Жыл бұрын

Thank you so much for this material. I was thinking, Is it possible run llama cpp + llama 2 + hugginhface chat ? If that's okay with you, would you mind making a video about it?

@mathematicalninja2756 11 ай бұрын

examples/server has frontend too, I deployed that behind a REST API

@2dapoint424 Жыл бұрын

I get this error: error loading model: unknown (magic, version) combination: 4f44213c, 50595443; is this really a GGML file? any idea how to fix it?

@computerai1247 11 ай бұрын

@abhishekkrthakur are these .bin files contain the model weights

@niazhimselfangels Жыл бұрын

I think we're missing a window you intended to share at around 3:20 on how to download the file, but this is great! 🤩

@abhishekkrthakur Жыл бұрын

yeah, it was just the model page. shouldnt matter if you follow all the commands

@1ofallkind Жыл бұрын

How to prepare dataset for tabular question answer for fine tuning?. Please assist.

@rsjenwar 9 ай бұрын

@abhishekkrthakur llama.cpp has also introduced finetune few days back. Can you please make a tutorial on finetuning llama.cpp on local machine (e.g. mac m1) ?

@kumariyengar8894 Жыл бұрын

I just tried this on my MacBook Pro 2.6 GHz 6-Core Intel Core i7. When i run the command, i get the error "error loading model: llama.cpp: tensor 'layers.4.ffn_norm.weight' is missing from model". Is this because it is an Intel core?

@abhishekarvind Жыл бұрын

how do i input images to the model?

@user-ef5nt7bl3j Жыл бұрын

Can the 7B model run on edge devices like a raspbery pi or a jetson nano?

@SatishKumar-fr8mj 11 ай бұрын

is it possible?

@RG-ik5kw Жыл бұрын

Awesome stuff! Are you “TheBloke” ? :) Also what’s your mac specs? Will it work on an Intel mac 2019 ?

@abhishekkrthakur Жыл бұрын

No. Im not 🙂 I ran it on m2 mac. not sure about intel. i dont think it will work. but do try and let us know :)

@jennilthiyam1261 5 ай бұрын

what can we do if we want to make it run on GPU?

@SANJIVRAI6693 11 ай бұрын

is there a windows tutorial ?

@Rohit-dp3bu 10 ай бұрын

Hi , i am using llama CPP python. But want to utilise my GPU and CPU at max extent . What parameters should I set to llama . Like n_threads and layer etc I have 3050 4gb + ryzen 7 4800 Using llama 2 7b 4_0 quantized model by bloke

@jennilthiyam1261 5 ай бұрын

were you able to use it on GPU?

@sarnathk1946 Жыл бұрын

cuBLAS - BLAS stands for Basic Linear Algebra Subroutines. CU stands for NVIDIA CUDA Technology

@ashu- Жыл бұрын

"dash" vs "minus" :p

@kimi6299 Жыл бұрын

Which GPU did you use? Is it possible to run 70B locally by using two GPUs?

@greenbillugaming2781 11 ай бұрын

no. you atleast need A100 GPU

@AnkitSingh-gw6fj Жыл бұрын

Sir, can you also do a working demo of building a local chatbot by training a model on private datasets like pdf, one note, word etc.

@abhishekkrthakur Жыл бұрын

previous videos already handled that. please take a look 🙂

@AnkitSingh-gw6fj Жыл бұрын

@@abhishekkrthakur That's great. Will take a look now. Thank you

@joe7843 11 ай бұрын

Is there a python version for llama.cpp?how would you approach making this accessible via an api?

@sarnathk1946 Жыл бұрын

make -j 8 Will compile using 8 cores in parallel. Straight away fast compilation

@developonetwork Жыл бұрын

cc: fatal error: cannot execute ‘cc1obj’: execvp: No such file or directory compilation terminated. make: *** [Makefile:286: ggml-metal.o] Error 1 facing this error

@averytheloftier Ай бұрын

did you ever figure this out?

@InspireAndGrow Жыл бұрын

I'm getting: error loading model is this really a GGML file?

@abhishekkrthakur Жыл бұрын

invalid file?

@InspireAndGrow Жыл бұрын

@abhishekkrthakur I deleted my file and ran wget again and it worked 👍

@robosergTV Жыл бұрын

link to the repo?

@abhishekkrthakur Жыл бұрын

just commands. no repo. sorry

@yeahyes55 Жыл бұрын

Are you using a VM?

@abhishekkrthakur Жыл бұрын

nope

@manavshah9062 Жыл бұрын

How can i run it on linux with just CPU?

@abhishekkrthakur Жыл бұрын

just dont use cuda flag. everything else remains the same

@deepakkajla Жыл бұрын

Encoding bitrate seems too low, during scroll the screen becomes pixelated

@abhishekkrthakur Жыл бұрын

used a different device to record today. everything except scrolling should be fine. good that i didnt scroll too much lol

@moonlight7684 Жыл бұрын

I am running this on Macbook AIr M2, the responses are really slow

@abhishekkrthakur Жыл бұрын

did you compile with LLAMA_METAL=1 param?

@moonlight7684 Жыл бұрын

@@abhishekkrthakur yes I did, but did change ngl to -1, else it was giving an error

@abhishekkrthakur Жыл бұрын

ngl=1?

@christopheprat378 Жыл бұрын

@@abhishekkrthakur Got the same problem I compiled using the LLAMA_METAL=1 param and when I set -ngl 1, the program crashed as too much memory was used by Metal (on a MacBook Air M2 8GB of RAM). I think Metal can't use more than half of the RAM. Is there something I am missing?

@NehilSood Жыл бұрын

yes! responses are really slow I even tried this on ubuntu machine with nvidia A10G, same delay @abhishekkrthakur please help us thru this once