Autonomous Open Source LLM Evaluator (Ollama) - Full Guide

  Рет қаралды 3,947

All About AI

All About AI

Күн бұрын

Autonomous Open Source LLM Evaluator (Ollama) - Full Guide
👊 Become a member and get access to GitHub and Code:
/ allaboutai
🤖 Great AI Engineer Course:
scrimba.com/learn/aiengineer?...
🔥 Open GitHub Repos:
github.com/AllAboutAI-YT/easy...
📧 Join the newsletter:
www.allabtai.com/newsletter/
🌐 My website:
www.allabtai.com
Today I take a look at my Autonomous Open Source LLM Evaluator using Ollama and GPT-4. This is a neet tool to test open source LLMs on different tasks like problems and code
00:00 Ollama LLM Eval Intro
00:21 Ollama LLM Eval Flowchart
01:28 LLM Evaluator Code 1
06:24 Test 1
08:30 LLM Evaluator Code 2
09:13 Test 2
10:53 Conclusion

Пікірлер: 18
@gunabalang9543
@gunabalang9543 10 күн бұрын
I love the way you are using AI .....
@MrSuntask
@MrSuntask 10 күн бұрын
Livestream was great!
@Ms.Robot.
@Ms.Robot. 10 күн бұрын
This is a useful idea🎉
@CarrotStick2165
@CarrotStick2165 10 күн бұрын
aya:35b blows everything out of the window. Not ten times better then chatGPT but one hundred times better. It's slow as it's 35B run locally but, I love it. Besides that I use llama3 for most everyday tasks..
@ArseniyPotapov
@ArseniyPotapov 9 күн бұрын
I've built a similar system, but I noticed that judge model sometimes hallucinates and gives high marks to obviously wrong solutions. I tried to make a jury of multiple judges (different big models) this improved judging quality, but made it 8X slower. Also, with multiple judges you will need to fuse their judgements to some consensus, it's just pretty slow and all models do hallucinate.
@ArseniyPotapov
@ArseniyPotapov 9 күн бұрын
One of the problems almost all models suck at is the puzzle "a fox, a chicken and a sack of grain" or ("wolf, goat and cabbage problem"). All models recognize that it's a classic puzzle, but only few can give a coherent solution without weird glitches
@thenarrowgate3063
@thenarrowgate3063 8 күн бұрын
In The Bubble sort evaluation, all the models that were eval as wrong (MIstral, Codestral..etc) had a syntax error in line 1 because it included the output text as a line of code as for the code itself it was sound on all..so it is not a proper eval as you need to check your code as to why it worked for a couple but not the others as a simple syntax error that wasnt part of the LLM's code but yours does not make for a proper eval. Other than its a cool idea
@JohnDoe-zx8bu
@JohnDoe-zx8bu 10 күн бұрын
What is the sense to estimate many models by some more powerful model if this is required for each problem so it would be much faster to just ask GPT-4 for an answer of the problem
@TwoWayOrbitalStation
@TwoWayOrbitalStation 9 күн бұрын
Because chatgpt can not be run locally. If you can evaluate what the best small local model is for a task, then you can use that model locally on your pc. If you have sensative code or senstative information, you dont want to pass this through chatgpt since openai will take your data, so you run locally. Not to mention, running locally is completely free, where using chatgpt api is gonna cost you. The whole point of the test is basic test examples, so then you can pick the model to do a similar more complex task
@JohnDoe-zx8bu
@JohnDoe-zx8bu 9 күн бұрын
@@TwoWayOrbitalStation the issue I see is that results might be different for slightly changed tasks. Means for this current task you get right result, but if you try to get answer for similar but different then answer might be wrong. So if you want to use small models locally then need to have some different way to estimate results without ChatGPT
@BinxNet
@BinxNet 7 күн бұрын
@@JohnDoe-zx8bu In this system, ChatGPT (being the best model) provides a rough approximation of “best” solutions from those provided, saving you tokens on getting it to provide its own lengthy results. Can also use this in a multi-agent loop with whatever LLM it picked to improve output entirely on user side, no additional tokens. This is just where documentation and an understanding of what you’re asking of the LLMs comes into play. If you’re allowing the blind to lead the blind, of course it’s going to be horrible. However, if you need a bit of help doing this One Thing you just can’t get right, then problem solved. We are not at the stage of these tools being autonomous fix-alls for every problem. I’ve seen many people saying things like “ChatGPT almost broke my computer because i tried to get it to help me do this thing”, but the reality is, THEY almost broke their computer bc they had no fking idea what ChatGPT was telling them. Accountability is on the end-user here to determine what is and is not a useful output, and how it then can be applied.
@delta-gg
@delta-gg 7 күн бұрын
3:20 What if Kayley is a boy?
@tonywhite4476
@tonywhite4476 10 күн бұрын
May I ask what is your roadmap for this channel?
@j0hnny_R3db34rd
@j0hnny_R3db34rd 10 күн бұрын
Yes, you may.
@Ms.Robot.
@Ms.Robot. 10 күн бұрын
Who ? Who are you?
@watchdog163
@watchdog163 9 күн бұрын
Roadmap? This is not a crypto coin. 😂
A Natural Language AI (LLM) SQL Database - Could this work?
8:52
ИРИНА КАЙРАТОВНА - АЙДАХАР (БЕКА) [MV]
02:51
ГОСТ ENTERTAINMENT
Рет қаралды 4,2 МЛН
Получилось у Вики?😂 #хабибка
00:14
ХАБИБ
Рет қаралды 5 МЛН
The joker's house has been invaded by a pseudo-human#joker #shorts
00:39
Untitled Joker
Рет қаралды 4,5 МЛН
Mesop - Google's New UI Maker
14:04
Sam Witteveen
Рет қаралды 47 М.
Local LLM with Ollama, LLAMA3 and LM Studio //  Private AI Server
11:57
VirtualizationHowto
Рет қаралды 4,9 М.
Mixture of Agents (MoA) BEATS GPT4o With Open-Source (Fully Tested)
12:55
WebClient Abuse with Shadow Credentials
12:01
Brady McLaughlin
Рет қаралды 39
Generative AI in a Nutshell - how to survive and thrive in the age of AI
17:57
Secret Wireless charger 😱 #shorts
0:28
Mr DegrEE
Рет қаралды 1,7 МЛН
Собери ПК и Получи 10,000₽
1:00
build monsters
Рет қаралды 1,2 МЛН
Iphone or nokia
0:15
rishton vines😇
Рет қаралды 1,9 МЛН