New Tutorial on LLM Quantization w/ QLoRA, GPTQ and Llamacpp, LLama 2

  Рет қаралды 14,709

code_your_own_AI

code_your_own_AI

10 ай бұрын

LLM Quantization: GPTQ - AutoGPTQ
llama.cpp - ggml.c - GGUL - C++
Compare to HF transformers in 4-bit quantization.
Download Web UI wrappers for your heavily quantized LLM to your local machine (PC, Linux, Apple).
LLM on Apple Hardware, w/ M1, M2 or M3 chip.
Run inference of your LLMs on your local PC, with heavy quantization applied.
Plus: 8 Web UI for GTPQ, llama.cpp or AutoGPTQ, exLLama or GGUF.c
koboldcpp
oobabooga text-generation-webui
ctransformers
lmstudio.ai/
github.com/marella/ctransformers
github.com/ggerganov/ggml
github.com/rustformers/llm/bl...
huggingface.co/TheBloke/Llama...
github.com/PanQiWei/AutoGPTQ
cloud.google.com/model-garden
huggingface.co/autotrain
h2o.ai/platform/ai-cloud/make...
#quantization
#ai
#webui

Пікірлер: 22
@jacehua7334
@jacehua7334 10 ай бұрын
Have been busy with work but it's so great on the weekend to see absolute great content from you like always!
@ViktorFerenczi
@ViktorFerenczi 10 ай бұрын
Excellent video, as always! Thank you. - It would be nice to have a video comparing AWQ with the quantization methods discussed here.
@code4AI
@code4AI 10 ай бұрын
Activation-aware Weight Quantization (AWQ)? Great idea!
@hoangnam6275
@hoangnam6275 9 ай бұрын
U r the best, best content everyweek
@ChrisBrock-mh8qq
@ChrisBrock-mh8qq 5 ай бұрын
Really Great Videos!
@ctejada-0
@ctejada-0 10 ай бұрын
Happy to see llama.cpp taking off. Since the beginning of this new wave of AI as a consequence of LLM advancements I've been rooting for llama.cpp as it is (in my opinion) the best approach to enable everyone to have their own LLM and enable a plethora of software solutions (open and closed source) that were never possible before. Thank you for this video focused on it.
@code4AI
@code4AI 10 ай бұрын
Thank you for your comment. Maybe I'll do another video on the latest llamacpp ...
@henkhbit5748
@henkhbit5748 9 ай бұрын
Great explanation of the different quatizations methods. Would be nice if we can compare for example llma2 7b models: normal, qlora 4b, qptq 4b, gguf 4b format with different inference questions with an without RAG...
@amparoconsuelo9451
@amparoconsuelo9451 10 ай бұрын
Can a subsequent SFT and RTHF with different, additional or lesser contents change the character, improve, or degrade a GPT model?
@akashkarnatak3014
@akashkarnatak3014 10 ай бұрын
Okay, so gqtq is a quantization technique and gguf is a format to store quantized weights, can't we quantize a model using gptq algorithm and store it in gguf format and run using llama.cpp?
@junzhengge407
@junzhengge407 4 ай бұрын
I have the same question😢 need help
@yusufkemaldemir9393
@yusufkemaldemir9393 9 ай бұрын
Thanks. Does llama2 cpp 4 bit quantized provide back propagation while running it on m2 MacBook? If yes, do you mind provide ref notebook?
@surajrajendran6528
@surajrajendran6528 4 ай бұрын
Quantised models cannot be back-propagated. All training should be done in floating point precision.
@AK-ox3mv
@AK-ox3mv 4 ай бұрын
What does k mean in q4_km? What's difference between q4 and 4bit? Are they same thing?
@spencerfunk6697
@spencerfunk6697 5 ай бұрын
need a tutorial on quantizing vision models
@devyanshrastogi
@devyanshrastogi 8 ай бұрын
Trust me after 20 seconds of your intro I was about to skip this video 🤣🤣 the intro was terrific (Literally).
@gileneusz
@gileneusz 10 ай бұрын
0:08 oh... so maybe I'll watch your next video, sorry....
@code4AI
@code4AI 10 ай бұрын
You are the lucky one ...
@gileneusz
@gileneusz 10 ай бұрын
@@code4AI no, no that's just my dream 😢
@ernestoflores3873
@ernestoflores3873 2 ай бұрын
MISTRAL 7B explained - Preview of LLama3 LLM
41:30
code_your_own_AI
Рет қаралды 8 М.
Understanding 4bit Quantization: QLoRA explained (w/ Colab)
42:06
code_your_own_AI
Рет қаралды 38 М.
КАРМАНЧИК 2 СЕЗОН 7 СЕРИЯ ФИНАЛ
21:37
Inter Production
Рет қаралды 529 М.
🤔Какой Орган самый длинный ? #shorts
00:42
БОЛЬШОЙ ПЕТУШОК #shorts
00:21
Паша Осадчий
Рет қаралды 9 МЛН
Мы никогда не были так напуганы!
00:15
Аришнев
Рет қаралды 6 МЛН
All You Need To Know About Running LLMs Locally
10:30
bycloud
Рет қаралды 123 М.
Demo: Rapid prototyping with Gemma and Llama.cpp
11:37
Google for Developers
Рет қаралды 64 М.
Quantize any LLM with GGUF and Llama.cpp
27:43
AI Anytime
Рет қаралды 10 М.
How To Connect Local LLMs to CrewAI [Ollama, Llama2, Mistral]
25:07
codewithbrandon
Рет қаралды 62 М.
AWQ for LLM Quantization
20:40
MIT HAN Lab
Рет қаралды 5 М.
"okay, but I want Llama 3 for my specific use case" - Here's how
24:20
GGUF quantization of LLMs with llama cpp
12:10
AI Bites
Рет қаралды 1,7 М.
NEW TextGrad by Stanford: Better than DSPy
41:25
code_your_own_AI
Рет қаралды 9 М.
PEFT LoRA Explained in Detail - Fine-Tune your LLM on your local GPU
40:55
После ввода кода - протирайте панель
0:18
Up Your Brains
Рет қаралды 1,2 МЛН
Mastering Picture Editing: Zoom Tools Tutorial
0:52
Photoo Edit
Рет қаралды 504 М.
Tag her 🤭💞 #miniphone #smartphone #iphone #samsung #fyp
0:11
Pockify™
Рет қаралды 30 МЛН
⚡️Супер БЫСТРАЯ Зарядка | Проверка
1:00