No video

In-Context Learning: EXTREME vs Fine-Tuning, RAG

  Рет қаралды 4,037

code_your_own_AI

code_your_own_AI

Күн бұрын

Fill the complete context length with many shot examples and evaluate the performance! Great new insights, although extreme scaling happens on open source LLMs with extended context length, not on latest models w/ 1 Mio token length, like Gemini 1.5 Pro (see my new, upcoming video in the next days).
A comprehensive analysis of in-context learning (ICL) when extended to long-context models, specifically examining the performance and characteristics. A key finding is that performance in ICL continues to improve with the inclusion of hundreds or even thousands of demonstrations, surpassing traditional fine-tuning methods in certain scenarios. This improvement is not merely additive but is significantly driven by the model's ability to attend to more relevant examples during inference.
For datasets with extensive label spaces, the gains from increasing context length are particularly pronounced. Additionally, the study highlights that while retrieval methods show diminishing returns with extended contexts, the use of a large, randomly selected set of demonstrations in long-context ICL remains surprisingly effective, suggesting that the sheer volume of context can sometimes compensate for the lack of finely-tuned example selection.
Another critical insight from the study is the reduced sensitivity of long-context ICL to the order of examples and the negative impact of grouping same-label examples, which suggests that optimal performance relies on a diverse set of in-context demonstrations rather than clustered or ordered ones. The research also identifies that the performance gains in long-context ICL are primarily attributed to the model's ability to reference relevant examples rather than refining task-specific decision boundaries through extensive encoding. This revelation is supported by experiments showing that performance saturates before reaching the maximum context length for many datasets, indicating that current models have not yet fully exploited the potential of long-context ICL.
Furthermore, long-context models exhibit robust performance across various datasets, maintaining efficiency and accuracy, and offering a promising alternative to traditional fine-tuning, especially when computational efficiency and rapid adaptability are paramount.
All rights w/ authors:
In-Context Learning with Long-Context Models: An In-Depth Exploration
arxiv.org/pdf/...
#airesearch
#ai
#newtechnology

Пікірлер: 10
@DaveRetchless
@DaveRetchless 3 ай бұрын
Your content is the best detailed explanation of new papers and topics as they evolve. Thank you for the exceptional work.
@code4AI
@code4AI 3 ай бұрын
Appreciate your comment!
@AaronALAI
@AaronALAI 3 ай бұрын
I built a rig at home which lets me run mixtral 8*22 quantized to 8bit using exllamav2. It has a 64k context length and I've found that in context learning works very well. This is just my subjective observation, but i have my setup so new conversations with specific topics first incorporate a bunch of context. It's a small upfront time cost (about a minute on intial setup), but the model responds much better this way. Additionally, ive tried giving the model a bunch of context up front via rag with similar results. I think in context learning is going to dominate rag and fine-tuning, because of its simplicity, dynamic nature, and one needs fewer resources to have a larger impact on the model output.
@kevinpham6658
@kevinpham6658 3 ай бұрын
Really interesting approach. Using sglang’s RadixAttention and the fork primitive right after all the ICL examples would make this strategy extremely viable and fast because you only have to evaluate the examples once. Multiple forks == multi-LoRa, but without the hassle of finetuning?
@frankschilder8974
@frankschilder8974 3 ай бұрын
very nice summary. I liked in particular your insights at the end of the video. I'm wondering, however, about the additional cost of ICL+ for a production system compared to a fine-tuned model. It would be nice to see a chart comparing the inferencing cost answering the question of which approach would be more cost-effective in the long run, especially for high through-put scenarios.
@kevinpham6658
@kevinpham6658 3 ай бұрын
If you use sglang’s fork primitive, you cache the ICL tokens and get it for free on subsequent calls.
@MrJucazila
@MrJucazila Ай бұрын
Thanks to much for this content, it´s super clear, thanks! 🙂
@code4AI
@code4AI Ай бұрын
You're very welcome!
@covertassassin1885
@covertassassin1885 3 ай бұрын
@code4AI How could we apply this with longer problems? Having more examples where each example is long would fill up the context window very rapidly (eg, summarization). How would you recommend we balance those? My ideas: 1. Simply use RAG ICL to get the most relevant examples until the context is nearly filled 2. If the output of the model is short but the input is long, show many examples of the output of the llm/answer and just omit the long input that would take up many tokens. There are a few reasons behind this: the answer typically is a condensed form of information meaning much of the utility of the example is in the answer, the answer has the formatting the model should follow, and preventing dilution of the context window. (If you are filling a lot of the context window with a lot of tokens from the actual input of the problem, then the model will have fewer tokens of example answer text to pay attention to) 3. Potentially the inverse of #2 could also be useful: if the output is long for a short input, eg write a piece of code to solve a problem, then give the llm multiple examples of the input so it knows the general types of things it should be solving for. What are your thoughts on #2? I think it would still be very important to give a variety of examples to make aure you get a good distribution. However, maybe a 4th solution would be even better: 4. Hybrid ICL: Use RAG to retrieve a few full-length examples but then append many short examples (eg. just the output if the input is long). This way, the model can look at a few full problems & solutions but then has many more examples of the answer to reference. These output answers if in the form of chain-of-thought could be similar to what you referenced at the end with regards to many examples of reasoning
@RickySupriyadi
@RickySupriyadi 3 ай бұрын
at the end they(google and anthropic) also learn from open source and research from mass researchers to create their products
Next-Gen AI: RecurrentGemma (Long Context Length)
30:58
code_your_own_AI
Рет қаралды 3,8 М.
New Trick for Fine-Tuning LLMs #airesearch
27:23
code_your_own_AI
Рет қаралды 2,8 М.
How I Did The SELF BENDING Spoon 😱🥄 #shorts
00:19
Wian
Рет қаралды 37 МЛН
My Cheetos🍕PIZZA #cooking #shorts
00:43
BANKII
Рет қаралды 28 МЛН
Вы чего бл….🤣🤣🙏🏽🙏🏽🙏🏽
00:18
Why Does Diffusion Work Better than Auto-Regression?
20:18
Algorithmic Simplicity
Рет қаралды 290 М.
SUPERHUMAN RAG  #ai
31:21
code_your_own_AI
Рет қаралды 16 М.
No more Fine-Tuning: Unsupervised ICL+
31:09
code_your_own_AI
Рет қаралды 4,9 М.
What is an LLM Router?
9:16
Sam Witteveen
Рет қаралды 27 М.
New Discovery: LLMs have a Performance Phase
29:51
code_your_own_AI
Рет қаралды 15 М.
New Summarization via In Context Learning with a New Class of Models
28:12
Data Scientist vs. AI Engineer
10:39
IBM Technology
Рет қаралды 170 М.
Understanding B-Trees: The Data Structure Behind Modern Databases
12:39
GraphRAG or SpeculativeRAG ?
25:51
code_your_own_AI
Рет қаралды 8 М.
Has Generative AI Already Peaked? - Computerphile
12:48
Computerphile
Рет қаралды 966 М.
How I Did The SELF BENDING Spoon 😱🥄 #shorts
00:19
Wian
Рет қаралды 37 МЛН