how the tokenizer for gpt-4 (tiktoken) works and why it can't reverse strings

  Рет қаралды 1,905

Chris Hay

Chris Hay

5 ай бұрын

chris breaks down the chatgpt (gpt-4) tokenizer and shows why large language models such as gpt, llama-2 and mistral struggle to reverse words. chris looks at how words, programming languages, different languages and even how morse code is tokenized, and shows how tokenizers tend to be biased towards english languages and programming languages,

Пікірлер: 6
@ernestuz
@ernestuz 2 ай бұрын
The funny thing is the most complete the vocabulary the less pressure in the upper layers, so it's not only cheaper because of fewer tokens, but in processing, I wonder if somebody has prepared a semi handcrafted tokenizer, where, let's say the first 30K tokens come from a dictionary and the rest is generated.
@chrishayuk
@chrishayuk Ай бұрын
exactly. tbh, i wouldn't' be surprised if someone goes that direction
@feniyuli
@feniyuli 3 ай бұрын
It is very helpful to understand how the tokenization works. Thanks! Do you think data that we encode using tiktoken will be sent to the AI?
@chrishayuk
@chrishayuk Ай бұрын
definitely not, it's all local
@ilyanemihin6029
@ilyanemihin6029 4 ай бұрын
Thanks, very interesting information
@chrishayuk
@chrishayuk 4 ай бұрын
glad it was useful
Homemade Professional Spy Trick To Unlock A Phone 🔍
00:55
Crafty Champions
Рет қаралды 61 МЛН
Increíble final 😱
00:37
Juan De Dios Pantoja 2
Рет қаралды 111 МЛН
Osman Kalyoncu Sonu Üzücü Saddest Videos Dream Engine 170 #shorts
00:27
MEU IRMÃO FICOU FAMOSO
00:52
Matheus Kriwat
Рет қаралды 38 МЛН
Glitch Tokens - Computerphile
19:29
Computerphile
Рет қаралды 316 М.
GOSIM 2024 Europe APP & WEB Nico Burns: The State of Rust Ul
48:41
GOSIM Foundation
Рет қаралды 2,2 М.
GPT-4 Makes Old ChatGPT Look Like a JOKE!
12:56
Nick Chapsas
Рет қаралды 426 М.
A Small Language Model (SLM) using Python
7:32
Computing For All
Рет қаралды 8 М.
How Powerful Is GPT-4 Really
13:01
Nick White
Рет қаралды 68 М.
The future of AI agents is WebAssembly (get started now)
39:51
Chris Hay
Рет қаралды 1,4 М.
How to CREATE Your OWN GPT (w/ Custom Actions)
13:47
Daniel K.
Рет қаралды 27 М.
1$ vs 500$ ВИРТУАЛЬНАЯ РЕАЛЬНОСТЬ !
23:20
GoldenBurst
Рет қаралды 1,4 МЛН