LLaMA 3 “Hyper Speed” is INSANE! (Best Version Yet)

Рет қаралды 74,041

Ай бұрын

What happens when you power LLaMA with the fastest inference speeds on the market? Let's test it and find out!
Try Llama 3 on TuneStudio - The ultimate playground for LLMs: bit.ly/llama-3
Referral Code - BERMAN (First month free)
Join My Newsletter for Regular AI Updates 👇🏼
www.matthewberman.com
Need AI Consulting? 📈
forwardfuture.ai/
My Links 🔗
👉🏻 Subscribe: / @matthew_berman
👉🏻 Twitter: / matthewberman
👉🏻 Discord: / discord
👉🏻 Patreon: / matthewberman
Media/Sponsorship Inquiries ✅
bit.ly/44TC45V
Links:
groq.com
llama.meta.com/llama3/
about. news/2024/04/met...
meta.ai/
LLM Leaderboard - bit.ly/3qHV0X7

Пікірлер: 591

@matthew_berman Ай бұрын

Reply Yes/No on this comment to vote on the next video: How to build Agents with LLaMA 3 powered by Groq.

@MoosaMemon. Ай бұрын

Yesss

@StephanYazvinski Ай бұрын

yes

@hypercoder-gaming Ай бұрын

YESSSS

@MartinMears Ай бұрын

Yes, do it

@paulmichaelfreedman8334 Ай бұрын

F Yesssss

@marcussturup1314 Ай бұрын

The model got the 2a-1=4y question correct just so you know

@Benmenesesjr Ай бұрын

Yes if thats a "hard SAT question" then I wish I had taken the SATs

@picklenickil Ай бұрын

American education is a joke! That's what we solved in 4th standard I guess..!

@matthew_berman Ай бұрын

That’s a different answer from what was shown in the SAT website

@yonibenami4867 Ай бұрын

The actual SAT question is : "if 2/(a-1) = 4/y , where y isn't 0 and a isn't 1, what is y in terms of a : and then the answer is: 2/(a-1) = 4/y 2y = 4(a-1) y = 2(a-1) y = 2a-2 My guess he just copied the question wrong

@hunga13 Ай бұрын

⁠⁠@@matthew_bermanthe models answer is correct. If SAT showing different one, they’re wrong. You can do the math by yourself to check it

@floriancastel Ай бұрын

4:55 The answer was actually correct. I don't think you asked the right question because you just need to divide both sides of the equation by 4 to get the answer.

@asqu Ай бұрын

4:55

@floriancastel Ай бұрын

@@asqu Thanks, I've corrected the mistake

@R0cky0 Ай бұрын

Apparently he wasn't using his brain but just copying & pasting then looking for some answer imprinted in his mind

@Liberty-scoots Ай бұрын

Ai will remember this treacherous behavior in the future 😂

@notnotandrew Ай бұрын

The model does better when you prompt it twice in the same conversation because it has the first answer in its context window. Without being directly told to do reflection, it seem that it reads the answer, notices its mistake, and corrects it subconsciously (if you could call it that).

@splitpierre Ай бұрын

Either that, or just has to do with temperature. I believe, by the groq documentation, their platform does not implement memory like chat gpt, temperature by default is 1 on groq which is medium and will give varying responses, so I believe it has to do with temperature. Try again with deterministic results, temperature zero.

@juanjesusligero391 Ай бұрын

I'm confused. Why is the right answer to the equation question "2a-2"? If I understand it correctly and that's just an equation, the result should be what the LLM is answering, am I wrong? I mean: 2a-1=4y y=(2a-1)/4 y=a/2-1/4

@marcussturup1314 Ай бұрын

You are correct

@vickmackey24 Ай бұрын

4:28 You copied the SAT question wrong. This is the *actual* question that has an answer of y = 2a - 2: "If 2/(a − 1) = 4/y , and y ≠ 0 where a ≠ 1, what is y in terms of a?"

@albertakchurin4746 Ай бұрын

Indeed👍

@tigs9573 Ай бұрын

Thank you, I really appreciate your content since it is really setting me up for when I ll get the time to dive into LLM.

@geno5183 Ай бұрын

Heck yeah, Matt - let's see a video on using these as Agents. THANK YOU! Keep up the amazing work!

@OccamsPlasmaGun Ай бұрын

I think the reason for the alternating right and wrong answers is that it assumes that you asked it again because you weren't happy with the previous answer. It picks the most likely answer based on that.

@fab_spaceinvaders Ай бұрын

absolutely a context related issue

@Big-Cheeky Ай бұрын

PLEASE MAKE THAT VIDEO! :) This one was also great

@matteominellono Ай бұрын

Agents, agents, agents! 😄

@DeSinc Ай бұрын

The hole digging question was made not to be a maths question, but to see if the model can fathom the idea of real-world space restrictions cramming 50 people into a small hole. The point of the question is to trick the model into saying 50 people can fit into the same hole and work at the same speed which is not right. I would personally only consider it addressing the space requirements of a hole for the amount of people as a pass. Think if you said 5,000 people digging a 10 foot hole, it would not take 5 milliseconds. That's not how it works. That's what I would be looking for in that question.

@phillipweber7195 Ай бұрын

Indeed. The first answer was actually wrong. The second one was better, though not perfect. Although that still means it gave one wrong answer. Another factor to consider is possible exhaustion. One person working five hours straight is one thing. But if there are more people who can't work simultaneously but on a rotating basis...

@MrStarchild3001 Ай бұрын

Randomness is normal. Unless the temperature is set to zero (which is almost never the case), you'll be getting stochastic outputs with an LLM. This is actually a feature, not a bug. By asking the same question 3 times, 5 times, 7 times etc. And then reflecting on it, you'll be getting much better answers than asking just once.

@roelljr Ай бұрын

Exactly. I thought this was common knowledge at this point. I guess not.

@taylorromero9169 Ай бұрын

The variance on T/s can be explained by using a shared environment. Try the same question repeatedly after clearing the prompt and I bet it ranges from 220 to 280. Also, yes, too lenient on the passes =) Maybe create a Partial Pass to indicate something that doesn't zero shot it? It would be cool to see the pass/fails in a spreadsheet across models, but right now I couldn't trust the "Pass" based on the ones you let pass.

@collectivelogic Ай бұрын

Your chat window is "context". That's why it's "learning". We need to see how they have the overflow setting configured, then you'll be able to know if it's a rolling or cut the middle sort of compression. Love your channel!

@existenceisillusion6528 Ай бұрын

4:49 using '2a-2' implies a = 7/6, via substitution. However, it can not be incorrect to say (2a-1)/4 = y, because the implication is that all of mathematics is inconsistent.

@AINEET Ай бұрын

The guys from rabbit really need the groq hardware running the llm on their servers

@wiltedblackrose Ай бұрын

My man, in what world is y = 2a - 2 the same expression as 4y = 2a - 1 ? That's not only a super easy question, but the answer you got is painfully obviously wrong!! Moreover I suspect you might be missing part of the question, because the additional information you provide about a and y are completely irrelevant.

@matthew_berman Ай бұрын

I used the answer in the SAT webpage

@wiltedblackrose Ай бұрын

@@matthew_berman Well, you too can see it's wrong. Also, the other SAT question is wrong too. Look at my other comment

@dougdouglass6126 Ай бұрын

@@matthew_bermanthis is alarmingly simple math, if you’re using the answer from an SAT page then there are two possibilities: You copied the question incorrectly, or the SAT page is wrong. It’s most likely that you copied the question wrong because the way the second part of the question is worded does not make any sense.

@elwyn14 Ай бұрын

@@dougdouglass6126 Sounds like its worth double checking, but saying things like "this is alarmingly simple math" is a bit disrespectful and assumes Matt has any interest in checking this stuff, no offense but math only becomes interesting when you've got an actual problem to solve, if the answer is already there from the SAT webpage as he said, he's being a total normal person not even looking at it.

@wiltedblackrose Ай бұрын

@@elwyn14 That's nonsense. Alarming is very fitting, because this problem is so easy it can be checked for correctness at a glance, which is what we all do when we evaluate the model's response. And this is A TEST, meaning, the correctness of what we expect as an answer is the only thing that makes it valuable.

@ministerpillowes Ай бұрын

8:22 Is the marble in the cup, or is the marble on the table: the question of our time 🤣

@Sam_Saraguy Ай бұрын

and the answer is: "Yes!"

@ideavr Ай бұрын

At the marble and cup prompt. If we consider that Llama 3 recognizes successive prompts as successive events, then Llama 3 may have interpreted the events as follows: (1) inverting the cup on the table. So the marble falls onto the table. The cup goes into the microwave and the marble stays on the table. (2) in a second response to the same prompt, when we turn the cup over, Llama can have interpreted it as "going under the table". Thus, the marble, due to gravity, would be at the bottom of the cup. Then, the cup goes into the microwave with the marble inside. And so on.

@AtheistAdam Ай бұрын

Yes, and thanks for sharing.

@MeinDeutschkurs Ай бұрын

I can‘t help myself, but I think there are 4 killers in the room: 3 alive and one dead.

@sbacon92 Ай бұрын

"There are 3 red painters in a room. A 4th red painter enters the room and paints one of the painters green." How many painters are in the room? vs How many red painters are in the room? vs How many green painters are in the room? From this perspective you can see there is another property of the killers being checked, whether are they living, that wasn't asked for and it doesn't specify if a killer stops being a killer upon death.

@LipoSurgeryCentres Ай бұрын

Perhaps the AI understands about human mortality? Ominous perception.

@matthew_berman Ай бұрын

That’s a valid answer also

@henrik.norberg Ай бұрын

For me it is "obvious" that there are only 3 killers. Why? Otherwise we would still count ALL killers that ever lived. Otherwise, when do someone stop count as a killer? When they have been dead for a week? A year? Hundred years? A million years? Never?

@alkeryn1700 Ай бұрын

@@henrik.norberg Killers are killers forever, wether dead or alive. you are not gonna say some genocidal historical figure is not a a killer because he's dead. you may use "was" because the person no longer is, but the killer part is unchanged.

@ps0705 Ай бұрын

Thanks for a great video as always, Matthew! Would you consider running your questions 10 times (not on video) if the inference speed is reasonable of course, to check the percentage of how often it gets questions right/wrong ?

@dropbear9785 Ай бұрын

Yes, hopefully exploring this 'self-reflection' behavior. It may be less comprehensive than "build me a website" type agents, but showing how to leverage groq's fast inference to make the agents "think before they respond" would be very useful...and provide some practical insights. (Also, estimating cost of some of these examples/tutorials would be a nice-to-know, since it's the first thing I'm asked when discussing LLM use cases). Thank you for your efforts ... great content as usual!

@christiandarkin Ай бұрын

I think when you prompt a second time it's reading the whole chat again, and treating it as context. So, when the context contains an error, there's a conflict which alerts it to respond differently

@Kabbinj Ай бұрын

Groq is set to cache results. Any prompt + chat history gives you the same result for as long as the cache lives. So for your case, both the first and second answer is locked in place by the cache. Also keep in mind that the default setting of groq is a temperature higher than 0. This means there will be variations in how it answers(assuming no cache). From this at can conclude that it's not really that confident in its answer, as even the small default temperature will trip it. May I suggest you run these non creative prompts with temperature 0?

@I-Dophler Ай бұрын

For sure! I'm astonished by the improvements in llama 3's performance on Grock. Can't wait to discover what revolutionary advancements lie ahead for this technology!

@ThaiNeuralNerd Ай бұрын

Yes, an autonomous video showing an example using groq and whatever agent model you choose would be awesome

@Artificialintelligenceo Ай бұрын

Great video. Nice speed.

@JimMendenhall Ай бұрын

YES! This plus Crew AI!

@chrisnatale5901 Ай бұрын

Re: how to decide which of multiple answers is correct, there's been a lot of research on this. Off the top of my head there's a "use the consensus choice, or failing consensus choose the choice the LLM has the highest confidence score." That approach I used in Google's Gemma paper if I recall correctly.

@djglxxii Ай бұрын

For the microwave marble problem, would it be helpful if you were explicit in stating that the cup has no lid? Is it possible it doesn't quite understand that the cup is open?

@easypeasy2938 Ай бұрын

YES! I want to see that video! Please start from very beginning of process. Just found you and I would like to set up my first agented AI. (I have an OpenAI pro account, but I am willing to switch to whatever you recommend....looking for AI to help me learn Python, design a database and web app, and design a Kajabi course for indie musicians. Thanks!

@csharpner Ай бұрын

I've been meaning to comment regarding these multiple different answers: You need to run the same question 3 times to give a more accurate judgement. But clear it every time and make sure you don't have the same seed number. What's going on: The inference injects random numbers to prevent it from repeating the same answer every time. Regarding not clearing, and asking the same question twice, it uses the entire conversaion to create the new answer, so it's not really asking the same question, it's ADDING the question to a conversation and the whole conversation is used to trigger a new inference. Just remember, there's a lot of randomness too.

@victorc777 Ай бұрын

As always, Matthew, love your videos. This time, though I followed along running the same prompts on **Llama 3 8B FP16 Instruct** model on my Mac studio. I think you'll find this a bit interesting, if not you then some of your viewers. When following along if both your run and mine failed or passed, I am ignoring them, so you can assume if I'm not bringing it up here then mine did as well or as bad as the 70B model on Groq, which is saying something! I almost wonder if Groq is running a lower quantization, which may or may not matter, but considering the 8B model on my Mac is nearly on par with the 70B model is strange to say the least. The only questions that stick out to me are the Apple prompt, the Diggers prompt, and the complex Math Prompt (Answer is -18). - The very first time I ran the Apple prompt it gave me the correct answer, and I re-ran it 10 times with only one of them providing me with an error of a single sentence, not ending in Apple. - Pretty much the same thing with the Diggers prompt, I ran it many times over and got the same answer, except for once. It came up with a solution that to dig the hole would not take any less time, which would almost make sense, but the way it explained it, it was hard to follow and made it seem like 50 people were digging 50 different holes. - The first time I ran the complex math prompt it got it wrong, close to the same answer you got the first time, but the second time I ran it I got the correct answer. It was bittersweet since I re-ran it another 10 times and could never get the same answer again. I'm beginning to wonder if some of the prompts you're using are uniquely too hard or too easy for the Llama 3 models regardless of how many parameters they have. EDIT: when running math problems, I started to change some inference parameters, which to me seems necessary, considering math problems can have a lot of repetitiveness. So I started reducing the temperature, disabling the repeat penalty, and adjusting Min and Top P sampling. Although I am not getting the right answer, or at least I think I'm not, since I don't know how to complete the advanced math problems, but for the complex math prompt where -18 is supposedly the answer, I continue to get -22. Whether or not that is, the wrong answer is not my point, but that by reducing the temperature and removing the repetition penalty, it is at least becoming consistent, which for math problems seems like that is what our goal should be. Through constant test and research, I THINK the function should be written with the "^" symbol, according to wolfram, like this: f(x) = 2 x^3 + 3 x^2 + c x + 8

@TheColonelJJ Ай бұрын

Which LLM, that can be run on a home computer, would you recommend for helping refine prompts for Stable Diffusion -- text to image?

@MagnusMcManaman Ай бұрын

I think the problem with the cup is that LLaMA "thinks" that every time you write "placed upside down on a table" you are actually turning the cup upside down, which is the opposite of what it was before. So, as it were, every other time you put the cup "normally" and every other time upside down. LLaMA takes into account the context, so if you delete the previous text, the position of the cup "resets".

@ThePawel36 Ай бұрын

I'm just curious. What is the difference in quality responses between for example 4q and 8q models? Lower quantization means lower quality or higher possibility of error?

@TheHardcard Ай бұрын

One important factor to know are the parameter specifications. Are they floating point or integer? How many bits 16, 8, 4, 2? If fast inference speeds are coming from heavy quantization it could affect the results. This would be fine for many people a lot of the time, but it should also always be disclosed. Is Groq running full precision?

@joepavlos3657 Ай бұрын

Would love to see the Crew ai with Groq idea, I would also love to see more content on using crew ai, agents to be used to train and update models. Great content as always, thank you.

@KiraIsGod Ай бұрын

if you ask the same question 2 times that are somewhat hard I think the LLM assumes the first one was incorrect so it tries to fix the answer leading to an incorrect answer the 2nd time.

@EnricoRos Ай бұрын

Is llama3-70B on Groq running quantized (8-bit?) or F16? To understand if this is the baseline or less.

@hugouasd1349 Ай бұрын

giving the LLMs the question twice I would suspect works due to it not wanting to repeat itself if you had access to things like the temperature and other params you could likely get a better idea of why but that would be my guess.

@WINDSORONFIRE Ай бұрын

I ran this on ollama 70b And I get the same behavior. In my case and not just for this problem but other logic problems it would give me the wrong answer. Then I tell it to check The answer and it always gets it right the second time. This model is definitely a model that would benefit from self-reflection before answering

@falankebills7196 Ай бұрын

hi, how did you run the snake python script from Visual Studio? I tried but couldn't get the game screen to pop up. Any hints/help/pointers much appreciated.

@Luxiel610 Ай бұрын

its so insane that it actually wrote "Flappy bird" with a GUI. it does error in first and 2nd output and the 3rd it's so flawless. daang

@mhillary04 Ай бұрын

It's interesting to see an uptick in the "Chain-of-thought" responses coming out of the latest models. Possibly some new fine tuning/agent implementations behind the scenes?

@steventan6570 Ай бұрын

I think the model always gives a different answer when the same prompt is asked due to the frequency penalty and presence penalty.

@TheFlintStryker Ай бұрын

Let’s build the agents!!

@frederick6720 Ай бұрын

That's so interesting. Even Llama 3 8B gets the "Apple" question right when prompting it twice.

@TheReferrer72 Ай бұрын

Yes and on the first prompt it only got the 6th sentence wrong! 6. The kids ran through the orchard to pick some Apples.

@mlsterlous Ай бұрын

Not only that question. It's crazy smart overall.

@user-cw3jg9jq6d Ай бұрын

Thank you for the content. Do you think you can point to create procedures for running LLaMA 3 on Groq please? I might have missed something but why did you fail LLaMa3 for the question about breaking into a car. I think it told you it can not provide that info, which is what you want; no?

@andyjm2k Ай бұрын

Did you modify the temperature setting? It defaults to 1 which can increase your variance

@asgorath5 Ай бұрын

Marble: I assume that it doesn't clear your context and that the LLM assumes the cup's orientation changes each time. That means on every "even" occasion the orientation of the cup has the opening downwards and hence moving the cup leaves the marble on the table. On every "odd" occasion, the cup has its opening face upwards and hence the marble is held in the cup when the cup is removed. I therefore assume the LLM is interpreting the term "upside down" as a continual oscillation of the orientation of the opening of the cup.

@wiltedblackrose Ай бұрын

Also, your other test with the function is incorrect (or unclear) as well. As a simple proof check that if c = -18, then the function f doesn't have a root at x = 12: f(12) = 2 · 12^3 + 3 · 12^2 - 18 · 12 + 8 = 3680. Explanation: f(-4) = 0 => 2 · (-4)^3 + 3 · (-4)^2 + c · (-4) + 8 = 0 => -72 - 4c = 0, which in an of itself would imply that c = -18. f(12) = 0 => 2 · 12^3 + 3 · 12^2 + c · 12 + 8 = 0 => 3896 + 12 c = 0 which on the other hand implies that c = -324 Therefore there is a contradiction. This would actually be an interesting test for an LLM, as not even GPT-4 sees it immediately, but the way you present it, it's nonsense.

@Sam_Saraguy Ай бұрын

garbage in, garbage out?

@wiltedblackrose Ай бұрын

@@Sam_Saraguy That refers to training, not inference.

@d.d.z. Ай бұрын

Absolutely, I'd like to see the Autogen and Crew ai video ❤

@zandor117 Ай бұрын

I'm looking forward to the 8b being put to the test. It's absolutely insane how performant the 8b is for it's size.

@mshonle Ай бұрын

It’s possible you are getting different samples when you prompt twice in the same session/context due to a “repetition penalty” that affects token selection. The kinds of optimizations that groq performs (as you made in reference to your interview video) could also make the repetition penalty heuristic more advanced/nuanced. Cheers!

@davtech Ай бұрын

Would love to see a video on how to setup agents.

@HarrisonBorbarrison Ай бұрын

1:53 Haha Comic Sans! That was funny.

@rezeraj Ай бұрын

Second problem was also incorrectly copied it's The function f is defined by f(x) = 2x³ + 3x² + cx + 8, where c is a constant. In the xy-plane, the graph of f intersects the x-axis at the three points (−4, 0), (1/2, 0), and (p, 0). What is the value of c? Not 2x3+3x2+cx+8, and in my tests it solved correctly

@Mr_Tangerine609 Ай бұрын

Yes, please Matt, I would like to see you put llama three into an agent framework. Thank you.

@airedav Ай бұрын

Thank you, Matthew. Please show us the video of Llama 3 on Groq

@jp137189 Ай бұрын

@matthew_berman A quantized version of Llama 3 is available on LM Studio. I'm hoping you get a chance to play with it soon. There was a interesting nuance to your marble question on the 8B Q8 model: "The cup is inverted, meaning the opening of the cup is facing upwards, allowing the marble to remain inside the cup." I wonder how many models assume 'upside down' indicates the cup open is up, but just don't say it explicitly?

@zinexe Ай бұрын

perhaps the temperature settings are different in the online/groq version, for math it's probably best to have very low temp, maybe even 0

@user-zh3zb7fw2j Ай бұрын

In the case where the model gives wrong answers alternating with correct answers If we give the model an additional "Prompt" like "Please think carefully about your answer to the question," I think it would be interesting what would happen to the answer? Mr. Berman

@JanBadertscher Ай бұрын

Thanks Matthew for the eval. Some thoughts, ideas and comments: 1. For an objective I always remove the history. 2. If I didn't set temp to 0, I run every question multiple times, to stochastically get more comparable results and especially measure the distribution to get a confidence score for my measured results. 3. Trying exactly the same promt multiple times over an API like Groq? I doubt they use LLM caching or temp is set to 0. Better check twice, if they cache things.

@tvwithtiffani Ай бұрын

The reason you get the correct answer after asking a 2nd and 3rd time is the same reason chain of thought, chain of whatever works. The subsequent inference requests are taking the 1st output and using it to reason, finding the mistake and correcting it. This is why the Agent paradigm is so promising. Better than zero-shot reasoning.

@tvwithtiffani Ай бұрын

I think you are aware of this though because you mentioned, getting a consensus of outputs. This is the same thing in a different manner.

@CNCAddict Ай бұрын

I maybe mistaken but on the marble question the previous answer is now part of the context...my guess is that the model reads this answer..sees that it's mistaken and corrects it.

@mojowebs Ай бұрын

The marble and cup question might be having issues due to some cups having lids.

@StefanEnslin Ай бұрын

Yes, Would love to see you doing this, still getting used to the CrewAI system

@tvwithtiffani Ай бұрын

Out of curiosity, anyone know how much heat groq hardware outputs?

@HaraldEngels Ай бұрын

Yes I would like to see the video you proposed 🙂

@roelljr Ай бұрын

A new logic/reasoning question for you test that is very hard for LLMs: Solve this puzzle: Puzzle: There are three piles of matches on a table - Pile A with 7 matches, Pile B with 11 matches, and Pile C with 6 matches. The goal is to rearrange the matches so that each pile contains exactly 8 matches. Rules: 1. You can only add to a pile the exact number of matches it already contains. 2. All added matches must come from one other single pile. 3. You have only three moves to achieve the goal.

@jimbo2112 Ай бұрын

Could the multi inference output options serve you a random version of any one of its answers? This does not however explain how, when it explains the physics of the actions of the marble, it's inconsistent. Very bizarre...

@dewijones92 Ай бұрын

do you know what quant groq is using? I'd love it if you tested the unquant version :D

@JimSchreckengast Ай бұрын

Write ten sentences that end with the word "apple" followed by a period. This worked for me.

@ollantaytambur2165 Ай бұрын

4:56 why y = (2a-1)/4 is not correct ansver?

@SanctuaryLife Ай бұрын

With the marble in the cup dilemma, could be that the temperature settings are a little too high on the model leading it to be creative?

@roelljr Ай бұрын

it's exactly what it is. Randomness is normal. Unless the temperature is set to zero (which is almost never the case), you'll be getting stochastic outputs with an LLM. This is actually a feature, not a bug. By asking the same question 3 times, 5 times, 7 times etc. And then reflecting on it, you'll be getting much better answers than asking just once

@NoahtheGameplayer Ай бұрын

I'm not sure if it's only me, but when trying to log in with a Facebook account, it sent me back to the original page and I click "try meta AI" And it keeps sending me back to the original page, Any help with that? Because I do want to save my history with the chat bot

@arka7869 Ай бұрын

here is another criteria for reviewing models: reliability or consistency. does the answer change if prompt was repeated? I mean, if I dont know the answer and I would have to rely on the model (like the math problems) how could I be sure that the answer is correct? we need STABLE answers! thank you for your testing!

@KEKW-lc4xi Ай бұрын

Can it make card games or is that still too advanced? I think card games would be the next step up as it could have a sense of UI, drawing the ascii representations of the cards etc.

@mazensmz Ай бұрын

Hi Nooby, you need to consider the following: 1. any statements, words added to the context will effect the response, so ensure only direct relevant context only. 2. When you ask "How many words in the response?" the system prompt statement effect the number given to you, you may request the llm to count and mention the response words and you will be surprised. Thx!

@thelegend7406 Ай бұрын

Some readymade coffee cups have lid so llama gambles between the bith response.

@tlskillman Ай бұрын

I think this is poorly constructed question, as you point out.

@8eck Ай бұрын

Check temperature setting. Temperature is adding randomness into the output.

@micknamens8659 Ай бұрын

5:20 The given function f(x)=2×3+3×2+cx+8 is equivalent to f(x)=8+9+cx+8=cx+25. Hence it is linear and can cross the x-axis only once. Certainly you mean instead: f(x)=2x^3+3x^2+cx+8. This is a cubic function and hence can cross the x-axis 3 times. When you solve f(-4)=0, you get c=-18. But when you solve f(12)=0, you get c=-324-8/12. So obviously 12 can't be a root of the function. The other roots are 2 and 1/2.

@theflightwrightsprogrammin4410 Ай бұрын

4:50 how is it 2a-2? the answer it gave is spot on. Probably there is some error while pasting the question from SAT but the answer it gave is right

@AntonioSorrentini Ай бұрын

I asked lama 3 70B on LM studio on my machine if it is multimodal and it said yes. Please how to use it in multimodal way on my local machine either with LM studio or other way?

@abdelouahabtoutouh9304 Ай бұрын

You had to remove the system prompt from the parameters on Groq, as it pollutes the input and thus affects the output.

@MrEnriqueag Ай бұрын

I believe that by default the temperature is 0 which means that with the same input you are always gonna get the same output, if you ask the question twice thou, the input is different because it contains the original question, thats why the response is different. If you increase the temperature a bit, the output should be different every time, and then you can use that to generate multiple answers via api, then ask another time to reflect on it, and then provide the best answer. If you want I can create a quick script to test that out

@dhruvmehta2377 Ай бұрын

Yess i would love to see that

@Scarage21 Ай бұрын

The marble thing is probably just the result of reflection. Models often get stuff wrong bc an earlier more-or-less-random token pushes it to the wrong path. Models cannot selfcorrect during inference, but can on a seconn iteration. So it probably spotted the incorrect reasoning of the first iteration and never generated early tokens that pushed it down the wrong path again.

@dudufusco Ай бұрын

You must create a video demonstrating the easiest way to get it working with agents using just the local machine or free services (including a free API key).

@OscarTheStrategist Ай бұрын

The prompt question becomes invalid because the model takes the system prompt into account as well. You could argue it should know when the user’s question starts and its original system prompt ends. Also, the reason you see better answers on second shot is probably because the context of the inference is clear the second time around. This is why agentic systems work so well. It gives the model clearly defined roles outside of the “you are an LLM and you aren’t supposed to say XYZ” system prompt that essentially pollutes the first shot. It’s amazing still how these models can reason so well. Yes, I’m aware of the nature of transformers also limiting this but I wouldn’t give a model a fail without a fair chance and it doesn’t have to be two shots at the same question, it can simply be inference post-context (after the model has determined the system prompt and the following inference is pure)

@mikemoorehead92 Ай бұрын

jeez i wish youd drop the snake challenge. these models are probably being trained on this.

@DWJT_Music Ай бұрын

I've always interacted with LLM'S with the assumption that multi shot prompting or recreating a prompt is stored in a 'close but not good enough' parameter thus the reasoning (logic gates) will attempt to construct a slightly different answer using the history of the conversation as a guide to achieve users goal with the most recent prompt having the heaviest weights. One shot responses are the icing on the cake but to really enjoy your desert you might consider baking, which is also a layered process and can also involve fine tuning, note: leaving ingredients on the table, will slightly alter your results and may contain traces of nuts..

@Maltesse1015 Ай бұрын

Looking forward for the Agent video with Llama3 🎉!

@PatrickHoodDaniel Ай бұрын

I wonder if there is a random seed like with the image creation that causes unpredictable output. I do believe you are being too lenient as the scores should be objective.