Do not use Llama-3 70B for these tasks ...

Рет қаралды 3,052

2 ай бұрын

A detailed data analysis of the 1 mio votes by the AI community of the performance of LLMs open up new insights to areas where LLMs outperform, and areas where you better do not use a particular LLM, but opt for a better performance LLM.
all rights w/ authors:
What’s up with Llama 3? Arena data analysis
lmsys.org/blog/2024-05-08-lla...
#airesearch #ai #newtechnology

Пікірлер: 13

@gileneusz 2 ай бұрын

this is great video! really amazing explanation

@code4AI 2 ай бұрын

One of the best comments today! 😊

@martinsherry 2 ай бұрын

“of course, those people were wrong”…..hahahaha.

@code4AI 2 ай бұрын

Finally, someone is laughing ! Success! 😂

@Sl15555 2 ай бұрын

summarization might be low because of llama3's context length., that's my best guess. ill have to test it more as i like using the llm's to summarize youtube videos ( thought i watched this one ). I have found some areas llama3 works well and use it for that. one is creative writing / poems, but the result is then used to produce creative lists for other tasks works really well.

@henkhbit5748 2 ай бұрын

If an opensource llm perform well for your particular usecase then, for me, it Will always have my preference than a big monolithic closed source llm from ClosedAi!

@IdPreferNot1 2 ай бұрын

Love how your critiques shred the populist AI community while providing useful info.

@thedoctor5478 2 ай бұрын

I couldn't care less about friendliness. We can get that from low param models and use them to reform texts. Larger models should just care about reasoning above all else.

@TheReferrer72 2 ай бұрын

Now I know you are tripping. Unless I can't read that graph properly you are trying tell us that a 44-45% win rate is a big loss! Especially as this is a 70b open weights model, while the others are all closed weights. And as another commenter noted Llama 3 has only 4k context window so of course it will be poor at summarisation and other tests that rely on a long context. We will be getting longer context versions from Meta, multi model and huge parameters.

@code4AI 2 ай бұрын

Llama 3 was trained on 8192 token 😂

@TheReferrer72 2 ай бұрын

@@code4AI ok it has a 8k token length, GPT4 Turbo 128k, Claude 200K, Gemini 1000K+, so 16 times longer my point still stands. And I notice how you did not address my first point, Like I said you are tripping.

@peterbell663 2 ай бұрын

I found it essentailly useless and a waste of my time. I gave it a dataset of 10,000 lines with 22 variables and asked for summary statistics in cumulative blocks of 1000. 10 blocks in total, I reposed this question about 8 times over hours and each time the answer was DRIBBLE. And that was a very easy task. Imagine giving it a little bit more difficulta task like time series modelling. I will check the alternatives.

@dennisestenson7820 2 ай бұрын

Maybe you should choose an appropriate tool for the task.