No video

LongRoPE & Theta Scaling to 1 Mio Token (2/2)

  Рет қаралды 1,340

code_your_own_AI

code_your_own_AI

Күн бұрын

LongRoPE & Theta Extrapolation Scaling of RoPE for extreme context length explained - in scientific detail. To increase the context lengths of modern LLMs, we evaluate the performance and methods of LongRope and Theta Extrapolation /Scaling for extreme context length extensions. From 8K to 4M context length for a Llama 3-7B LLM.
Rope encoding works well within the training context length but faces challenges when the sequence length during inference exceeds the training length, leading to a performance drop. This is primarily because the positional encodings become out-of-distribution (OOD), causing instability in the attention mechanism.
To overcome this issue, theta scaling is introduced. The idea is to adjust the "rotary base," which is a key parameter in RoPE. By increasing this base value, the model can extend its effective context length, allowing it to handle longer sequences more accurately. This adjustment aligns the positional encodings with the longer input texts, improving the model's ability to extrapolate and maintain performance.
Interestingly, decreasing the rotary base can also enhance the model's extrapolation capabilities. By doing so, the positional encodings are more tightly packed, ensuring that the model can fully learn the positional patterns within the training context. This approach helps the model generalize better to longer sequences beyond its training data. Both increasing and decreasing the rotary base offer ways to extend the context length that RoPE-based models can handle effectively, providing a versatile solution to improve their performance on longer texts.
#airesearch
#aieducation

Пікірлер: 4
@MattJonesYT
@MattJonesYT 3 ай бұрын
Cutting edge stuff, this is great!!
@manslaughterinc.9135
@manslaughterinc.9135 2 ай бұрын
On the topic of attention and context, would love to see a video on Needle-in-a-hastack and multi-needle-in-a-haystack performance of these different kinds of context extension approaches.
@simonstrandgaard5503
@simonstrandgaard5503 3 ай бұрын
Excellent topic. Fine tuning with a longer context length.
@joelvalim
@joelvalim 3 ай бұрын
it seems they are doing the very opposite to quantize. (I am being very visual here ok?). Quantize is kind of squashing preserving proportions and shape. LongRoPE seems to act as a kind of hologramatic projection.... and a little bit of a hamer to adjust the edges... The final fine tuning would be a way to fill the voids created by the projection, which is imperfect by nature, cause it would be able to project a shadow, not a perfect picture. Final fine tuning would fill these voids, conecting the points in that weak blue print created by the rescaled new hiper dimensional space.
RoPE Rotary Position Embedding to 100K context length
39:56
code_your_own_AI
Рет қаралды 3,1 М.
Has Generative AI Already Peaked? - Computerphile
12:48
Computerphile
Рет қаралды 966 М.
Fortunately, Ultraman protects me  #shorts #ultraman #ultramantiga #liveaction
00:10
女孩妒忌小丑女? #小丑#shorts
00:34
好人小丑
Рет қаралды 83 МЛН
Пройди игру и получи 5 чупа-чупсов (2024)
00:49
Екатерина Ковалева
Рет қаралды 4,2 МЛН
Gli occhiali da sole non mi hanno coperto! 😎
00:13
Senza Limiti
Рет қаралды 21 МЛН
AI "Behavior Cloning" for fast Residual RL (MIT, Harvard)
36:41
code_your_own_AI
Рет қаралды 1,4 М.
10 weird algorithms
9:06
Fireship
Рет қаралды 1,2 МЛН
LongRoPE
1:59:05
hu-po
Рет қаралды 2,8 М.
LoRA & QLoRA Fine-tuning Explained In-Depth
14:39
Entry Point AI
Рет қаралды 37 М.
The Most Important Algorithm in Machine Learning
40:08
Artem Kirsanov
Рет қаралды 391 М.
Stable Diffusion in Code (AI Image Generation) - Computerphile
16:56
Computerphile
Рет қаралды 290 М.
[1hr Talk] Intro to Large Language Models
59:48
Andrej Karpathy
Рет қаралды 2,1 МЛН
Fortunately, Ultraman protects me  #shorts #ultraman #ultramantiga #liveaction
00:10