LongRoPE & Theta Scaling to 1 Mio Token (2/2)

  Рет қаралды 1,164

code_your_own_AI

code_your_own_AI

Ай бұрын

LongRoPE & Theta Extrapolation Scaling of RoPE for extreme context length explained - in scientific detail. To increase the context lengths of modern LLMs, we evaluate the performance and methods of LongRope and Theta Extrapolation /Scaling for extreme context length extensions. From 8K to 4M context length for a Llama 3-7B LLM.
Rope encoding works well within the training context length but faces challenges when the sequence length during inference exceeds the training length, leading to a performance drop. This is primarily because the positional encodings become out-of-distribution (OOD), causing instability in the attention mechanism.
To overcome this issue, theta scaling is introduced. The idea is to adjust the "rotary base," which is a key parameter in RoPE. By increasing this base value, the model can extend its effective context length, allowing it to handle longer sequences more accurately. This adjustment aligns the positional encodings with the longer input texts, improving the model's ability to extrapolate and maintain performance.
Interestingly, decreasing the rotary base can also enhance the model's extrapolation capabilities. By doing so, the positional encodings are more tightly packed, ensuring that the model can fully learn the positional patterns within the training context. This approach helps the model generalize better to longer sequences beyond its training data. Both increasing and decreasing the rotary base offer ways to extend the context length that RoPE-based models can handle effectively, providing a versatile solution to improve their performance on longer texts.
#airesearch
#aieducation

Пікірлер: 4
@manslaughterinc.9135
@manslaughterinc.9135 Ай бұрын
On the topic of attention and context, would love to see a video on Needle-in-a-hastack and multi-needle-in-a-haystack performance of these different kinds of context extension approaches.
@MattJonesYT
@MattJonesYT Ай бұрын
Cutting edge stuff, this is great!!
@simonstrandgaard5503
@simonstrandgaard5503 Ай бұрын
Excellent topic. Fine tuning with a longer context length.
@joelvalim
@joelvalim Ай бұрын
it seems they are doing the very opposite to quantize. (I am being very visual here ok?). Quantize is kind of squashing preserving proportions and shape. LongRoPE seems to act as a kind of hologramatic projection.... and a little bit of a hamer to adjust the edges... The final fine tuning would be a way to fill the voids created by the projection, which is imperfect by nature, cause it would be able to project a shadow, not a perfect picture. Final fine tuning would fill these voids, conecting the points in that weak blue print created by the rescaled new hiper dimensional space.
RoPE Rotary Position Embedding to 100K context length
39:56
code_your_own_AI
Рет қаралды 2,4 М.
БОЛЬШОЙ ПЕТУШОК #shorts
00:21
Паша Осадчий
Рет қаралды 10 МЛН
THE POLICE TAKES ME! feat @PANDAGIRLOFFICIAL #shorts
00:31
PANDA BOI
Рет қаралды 25 МЛН
NERF WAR HEAVY: Drone Battle!
00:30
MacDannyGun
Рет қаралды 57 МЛН
"okay, but I want Llama 3 for my specific use case" - Here's how
24:20
5 Easy Ways to help LLMs to Reason
50:37
code_your_own_AI
Рет қаралды 3,9 М.
Solid State Batteries Are REALLY Here: Yoshino Power Station
12:23
Undecided with Matt Ferrell
Рет қаралды 628 М.
Has Generative AI Already Peaked? - Computerphile
12:48
Computerphile
Рет қаралды 871 М.
Advanced RAG Techniques
5:06
Mosleh Mahamud
Рет қаралды 3,3 М.
Simulating the Evolution of Rock, Paper, Scissors
15:00
Primer
Рет қаралды 737 М.
Adversarial Questions Test Multimodal MED AI sys
21:08
code_your_own_AI
Рет қаралды 1,3 М.
Magnifying The World's Brightest Flashlight (200,000 Lumens)
8:55
The Action Lab
Рет қаралды 450 М.
Samsung Galaxy 🔥 #shorts  #trending #youtubeshorts  #shortvideo ujjawal4u
0:10
Ujjawal4u. 120k Views . 4 hours ago
Рет қаралды 7 МЛН
Easy Art with AR Drawing App - Step by step for Beginners
0:27
Melli Art School
Рет қаралды 14 МЛН
iPhone socket cleaning #Fixit
0:30
Tamar DB (mt)
Рет қаралды 12 МЛН