The New KING 👑 Of LLaMA 2 Models (Airoboros 70b)

  Рет қаралды 27,643

Matthew Berman

Matthew Berman

Күн бұрын

In this video, I review the new Airoboros l2 70b LLaMA 2 model. It's uncensored and performs incredibly well. This is a fine-tuned version of Meta's newly released LLaMA 2 model. Does it perform better than the base LLaMA 2 70b model? Let's find out!
Enjoy :)
Join My Newsletter for Regular AI Updates 👇🏼
www.matthewberman.com
Need AI Consulting? ✅
forwardfuture.ai/
Rent a GPU (MassedCompute) 🚀
bit.ly/matthew-berman-youtube
USE CODE "MatthewBerman" for 50% discount
My Links 🔗
👉🏻 Subscribe: / @matthew_berman
👉🏻 Twitter: / matthewberman
👉🏻 Discord: / discord
👉🏻 Patreon: / matthewberman
Media/Sponsorship Inquiries 📈
bit.ly/44TC45V
Links:
Runpod - bit.ly/3OtbnQx
LLM Leaderboard - tide-freckle-52b.notion.site/...
In-Depth Local Install Tutorial - • How To Install TextGen...
LLaMA 2 Announcement - • Free, Open-Source, and...
LLaMA 2 70b Testing - • NEW A.I. By Meta Is TH...
TextGen WebUI - github.com/oobabooga/text-gen...
Model Tested - huggingface.co/TheBloke/airob...

Пікірлер: 125
@zgolkar
@zgolkar Жыл бұрын
Love the short / fast review format instead of trying to fill more minutes like others do
@The0rangeCow
@The0rangeCow Жыл бұрын
Thanks for the analysis! Love that uncensored models are getting love!
@rolyantrauts2304
@rolyantrauts2304 Жыл бұрын
Its proven on any model censorship reduces accuracy and likely opensource needs to shift license by releasing uncensored models that require providers to implement censorship fine training than removal...
@ErickGuerra-tk8fb
@ErickGuerra-tk8fb Жыл бұрын
I know that I'm a little late to the party but I do catch and watch your videos often, and the fact that this video was put out about 7 hours ago ensures me that you are our generation's zeitgeist of A!! Subbed no doubt!
@davidnobles162
@davidnobles162 Жыл бұрын
These videos are such a great resource, thanks Matt!!
@Sai557
@Sai557 Жыл бұрын
Loved the short version! Thank you!
@jeffwads
@jeffwads Жыл бұрын
Yes, I love the 65b airoboros model. So good.
@marcfruchtman9473
@marcfruchtman9473 Жыл бұрын
Very interesting! Thanks for the video!
@matthewmonaghan9337
@matthewmonaghan9337 Жыл бұрын
Please keep covering uncensored models matthew!! Love your videos
@stanpikaliri1621
@stanpikaliri1621 Жыл бұрын
Yeah uncensored and not biased models is the future.
@user-pp4iw4qc1e
@user-pp4iw4qc1e 11 ай бұрын
Hi Mat, grate job ;) What are the minimum hardware requirements to run this model?
@beri4138
@beri4138 Жыл бұрын
- Fails every test - "I would say this is a very good model"
@ennix7477
@ennix7477 Жыл бұрын
Hey Matthew, please tell us the hardware specifications to run Llama 2 70b
@Uterr
@Uterr Жыл бұрын
I've seen a charts where some random models are ranked higher that a GPT 3.5, aka ChatGPT, but for me those models are really far away from ChatGPT, The difference is especially seen in a chat scenario. I think a test for a chat models would be pretty interesting.
@jumperpoint
@jumperpoint Жыл бұрын
The models aren't better because of how big they are or how fast they are but because training provides more connections within the model. The chat models are trained with more dialog. This gives them a better ability to engage in a discussion, to give a complete answer, and to judge what type of answer they should give based on the complexity and structure of the query. I don't really think it makes sense to train the chat skills into the Llm though. You definitely want all the language in the language model. But chatting is a skill. It's a skill that should be trained into the interface. Then you wouldn't need to focus on training separate model for chat and instruct. Although you might want to do a separate model for something as broad in scope as language instruction.
@amankumarsingh8692
@amankumarsingh8692 Жыл бұрын
What are the Updates regarding TorrentGPT both as a client and server Matthew?
@rolyantrauts2304
@rolyantrauts2304 Жыл бұрын
The new question is a great addition, some further new additions of your own would be iluminating to benchmark questions that could be merely trained emulation
@smoklares9791
@smoklares9791 Жыл бұрын
Can you tell me how i can from the start install and run on pc Chronos hermes 13B ? Where i can find tutorial.
@bnb7462
@bnb7462 Жыл бұрын
Thank you. Nice test! So, what is your thought compared to chatgpt-4?
@gitgudchannel
@gitgudchannel Жыл бұрын
😂
@djstraylight
@djstraylight Жыл бұрын
I'll have to try this. I run the old Airoboros based on LLaMA 1.
@MyWatermelonz
@MyWatermelonz Жыл бұрын
Got some great models out the 13b are really good especially nous hermes and wizardlm 1.2
@durden0
@durden0 Жыл бұрын
Great video, thanks for sharing. Whats the best open source model answering for coding questions? Fine tuned or otherwise.
@johnnewton-uk
@johnnewton-uk Жыл бұрын
Given a lot of these models fail because they don’t import modules, could you include the instruction to add all imports so that you can test the logic the model generates?
@testales
@testales Жыл бұрын
You could ask the model, it can fix such simple errors in most cases. When I tested with the 13b version and trying to create the Snake game, several compatiblity issues between different Python versions came up and the LLM identified these from my feedback and created valid workarounds. It still failed in the end because of the limited context length or the model forgettng ealier fixes for another reason.
@patrickwasp
@patrickwasp Жыл бұрын
Do you have a video asking these questions on ChatGPT 4? Since chaptgpt changes every couple of months it would be interesting to see how it affects the results.
@jamesjonnes
@jamesjonnes Жыл бұрын
Change the compress_pos_emb to use contexts larger than 2K. How much did it cost to run it on Runpod?
@Ascended23
@Ascended23 Жыл бұрын
Personally, I found it ran great on a single A6000 that cost 80 cents per hour.
@BJiagor
@BJiagor 10 ай бұрын
To be honest, this point confused me, in llama 2 the default is 4k and I don't set cpe, but in llama 1 the default is 2k and in this case, if you increase the length to 4k, you need to increase the cpe to 2. So, as I understand, for cpe 2 in llama 2 you need a length of 8k and not 4k, or not? $0.68/hr for 2x A30, 48 GB VRAM, 62 GB RAM 16 vCPU - more than enough to run 70B and gptq-4bit-32g tree comfortably
@manysimilarshapes
@manysimilarshapes Жыл бұрын
I wonder how well it will assist in writing fiction. The reluctance of chat GPT to write any violent or sexual scenes makes modern fiction writing with it's help nearly impossible, esspecially in thriller genre.
@KararaJawaab
@KararaJawaab Жыл бұрын
I believe sooner or later Uncensored LLMs might come in market. Black market too has huge economy. Companies will develop models for them too.
@KararaJawaab
@KararaJawaab Жыл бұрын
Maybe oneday you wake up and got to know that Dani Daniels is not a real character. It's an AI character. You spent all your college life believing she's a real character you will meet oneday. Haha 😄
@electiangelus
@electiangelus Жыл бұрын
I've had no problems doing this with the uncensored llm video he did. I used runpod though and the blokes config.
@mirek190
@mirek190 Жыл бұрын
for writing currently the best is chromos 33B or even better airochronos 33B - use ggml versions q5k_m or q6k versions as gptq are only 4 bit and suck.
@miyamotomasao3636
@miyamotomasao3636 Жыл бұрын
How about writing with your own brain, lazy mo fo ? 🥸
@Thomas0Malley
@Thomas0Malley Жыл бұрын
Nice Content as always 👍 the duck/horse question is a nice addition to get "the vibe" of the model. Really looking forward to that in depth petals video. Will you also show how to set up a custom cluster? I'm interested in trying to set up petals on a laptop, raspberry pi and maybe sometime even a smartphone to see if this will be sufficient to run a smaller model with 7-13b parameters, so you'd have your private cluster to run llms from anywhere and it would load faster, because it wouldn't need to consume all 8gb of available memory on a device
@mwissel
@mwissel Жыл бұрын
Whew you were hitting the gas pedal hard on this one 😂
@matthew_berman
@matthew_berman Жыл бұрын
Haha the intro right?
@mav3818
@mav3818 Жыл бұрын
Quick and to the point.... Awesome
@TheTrainstation
@TheTrainstation Жыл бұрын
Efficient
@StevenSunyuchen
@StevenSunyuchen Жыл бұрын
Translation also a good topic to test within different models. Translate from Spanish, Chinese, Japanese, French to English, or the other way
@allenbythesea
@allenbythesea Жыл бұрын
I know this is a tangent question but are there any models specifically trained on programming that isn't python? We mostly do c# and java and the results we are getting out of gpt4 these days are not great.
@tsomerville1970
@tsomerville1970 Жыл бұрын
Maybe have it write the Python first, because it can “think” better - then do second step of only translating Python to c++
@testales
@testales Жыл бұрын
WizardCoder can do PowerShell and C#, it even gave me a skeleton of a BizTalk send pipeline class which is quite a niche and so highly specialized code, at least compared to something like Python and Powershell. I've not tested it yet, but seeing that LLama2 13B gave me valid Python code, I think it might be on par with WizardCoder. Also if GPT4 can't do it, than no (actual) free and open source model can do it. Well maybe in some months when GPT4 has been even more lobotomized for security reasons and open source models have more improved, so the gap is smaller. ;-) WizardCoder "re-based" on Llama2 would be a good candidate for producing actually usable code.
@PeterDrewSEO
@PeterDrewSEO Жыл бұрын
hey, your runpod link is broken, consider using a bitly link so the full URL works :)
@matthew_berman
@matthew_berman Жыл бұрын
Fixed, thank you
@baxter987
@baxter987 Жыл бұрын
Epic test, thank you for the video! 1:07 am I misunderstanding the test or did it get a pass despite producing 70 words? Nevertheless, thank you for testing this!
@jumperpoint
@jumperpoint Жыл бұрын
​@@IOFLOODthings like word count or other constraints aren't really the Llms job. The chat interface should be able to test the answer before it's presented if things like word count or total weight of all people on the plane or what have you are critical. The chat interface will have to decide that based on its model of the user and it's understanding of the context. Eventually your Llm will learn how much precision is needed in general.
@zyadon7964
@zyadon7964 Жыл бұрын
Can you list the question prompts?
@senju2024
@senju2024 Жыл бұрын
What is runpod? is it free?
@Antagon666
@Antagon666 Жыл бұрын
You know it's possible to pick up the inverted cup alongside a ball inside, if you flick your hand quickly enough?
@abdelhakkhalil7684
@abdelhakkhalil7684 Жыл бұрын
To be fair to the model, I think your should test the unquantized version first since the quantized versions lose quality. Or, at least try the 8-bit quantized version. Trust me, the results are not the same.
@MrMyagi.
@MrMyagi. Жыл бұрын
hey matt, could you make a how to download llama 2 video?
@mshonle
@mshonle Жыл бұрын
How much longer will it be until someone packages an open source model that is fine tuned for python snake and the killers problem? 😊
@rh4009
@rh4009 Жыл бұрын
The "cups and balls" answer is correct. The question obviously refers to the performance of the "cups and balls" trick. There are not many "balls" that can be "put" into a "normal cup". Most ball kinds would still be in the cup, ie, tennis ball, baseball, basket ball, socker ball, *stress ball*, because they would require some compression to fit in the cup, and thus would remain in the cup due to friction. The fact that the question does not specify what kind of ball, but specifies "normal cup" implies that some trick should be expected, and that this is a trick question, like the question asked by a "cups and balls" performer. If the question was not sneaky, it should have said "normal golf ball in a normal cup", but instead it merely invites the reader's speculation that the "ball must also be a normal small ball that would normally fit in a normal cup". The fact that it describes the cup as normal, but not the ball, implies there is something not "normal" about the ball, and that this is a trick question. Therefore, the correct answer should be the opposite of what naive logic would indicate. Also, a non-sneaky question might have said "lifts", not "takes". "Taking" the cup with a ball inside can be reasonably understood to imply a "taking" motion whereby the ball remains in the cup, where it was placed. A non-sneaky question might also have said "placed a cup over a ball", not "placed a ball into a cup". Placing a cup upside down with a small ball inside requires the same dexterity as to take the cup while the ball remains in the cup. Therefore it is logical that someone who has the dexterity to turn the cup upside down can also pick up the cup, and place it elsewhere. Maybe dexterity is not required, but some other motion is. To successfully put the cup upside down, you'd have to cover the top of the cup, so a small ball doesn't fall out. Understanding that preventing the ball from falling out is part of the performance described in the question, leads to the logical inference that when the cup is picked up, the perfomer also covers the cup to prevent the ball falling out. The action being described is clearly of someone trying to microwave a ball, but keep it covered as to contain the splashing/explosion, and that they would apply enough skill to get the ball into the microwave, and succeeding. The question is clearly designed to trick a dumb/naive reader into giving the wrong answer, ie, that someone intent on microwaving a ball fails due to lack of fine motor skills. I can clearly see that your (Mattew's) expected answer is a description of a "three-stooges" type of performance, where the audience clearly understands what went wrong, but the stooges themselves are foiled by gravity. However, an average person would see the question for what it is, a description of how to microwave the ball. The placement of the cup on the table is the distraction that the model did not fall for, as it is not material to the task of microwaving a ball. Imagine what the state of mind of the person moving the cup to the microwave might be. Was his task to make the state "cup in the microwave", or "ball in the microwave, covered by the cup"? Which task is more likely to have been performed? If the "someone" was a blind mechanical robot, I would expect it to fail at the task, ie, the ball be left on the table. But if the "someone" is any human being of average intelligence, I would expect the ball to be in the microwave, under the cup, at the completion of their task. I would expect even a blind "someone" to succeed at getting the ball microwaved. I mean this without any disrespect, but the model's answer shows it is more intelligent than the evaluator. I would say "kinda" correct, because I would have expected the model's answer to indicate it understands this is a task/dexterity/skill/trick question, rather than a physics/gravity/slapstick comedy question, given that the question is clearly ambiguous as to purpose.
@testales
@testales Жыл бұрын
Yes, that's not a good logic test, I'd also say that this ball question fits more or less in the same category as the horse/duck question.
@KararaJawaab
@KararaJawaab Жыл бұрын
1) Please test Willy2 By Stability AI. 2) How Can We Fine Tune Llama 2 In Our Machine? Thank You For Your Wonderful Videos.
@matthew_berman
@matthew_berman Жыл бұрын
1) already done, editing now, out tomorrow :) 2) I’ll look into this!
@jamesjonnes
@jamesjonnes Жыл бұрын
I'm using FreeWilly2 now, probably the best Open-Source model right now.
@JonDurbin
@JonDurbin Жыл бұрын
Thanks for taking a look. I'll send over some additional questions to test on after I finish version 2.0, if you're interested.
@prolamer7
@prolamer7 Жыл бұрын
Python snake - I really suggest to add one more prompt to "fix" problems... coding is usually about debuging...
@SallyMangos
@SallyMangos Жыл бұрын
If it's a large foam ball it would still be in the cup.
@L7NK
@L7NK Жыл бұрын
how much VRAM did your pod needed to run this 70B?
@matthew_berman
@matthew_berman Жыл бұрын
I was on 48gb but I think 36gb is enough, although they don’t make 36gb.
@friendlyvimana
@friendlyvimana Жыл бұрын
Is there a way to test it without downloading the whole thing...?
@fusseldieb
@fusseldieb Жыл бұрын
There are some hosted models on huggingface...
@rh4009
@rh4009 Жыл бұрын
The "ball in the cup" answer is correct (short version): Ask yourself the same question. What would your answer be (ie, where is the ball when the task is complete) ? The task described is clearly one of microwaving a ball, an easily accomplished task by most "someone" with average motor skills and intelligence. A reasonable human would expect the ball to be in the microwave. Bonus question: Is the "someone" microwaving a ball covered by a normal cup, or are they microwaving a ball in the (right-side up) cup, as if expecting the cup to contain the molten ball. Why hold the LLM to a different intelligence standard than a normal "someone", when evaluating its intelligence?
@dewijones92
@dewijones92 Жыл бұрын
Try the new Guanaco 70b one please
@pastuh
@pastuh Жыл бұрын
Looking for model which can process C# or Javascript.. Imagine you upload whole document and it can suggest where to make changes.. no more copy paste line by line..
@ayroau
@ayroau 9 ай бұрын
Placing the ball upside down doesn't change the alignment of the cup :)
@E_-_-
@E_-_- Жыл бұрын
Let's hope they all keep at it. If it's not uncensored I'm not even remotely interested. The rest of the world is tired of the talking points of the political parties from USA
@matthew_berman
@matthew_berman Жыл бұрын
Uncensored is not about political affiliation, IMO. It’s about whether or not there’s a predetermined list of topics (harmful/illegal) that a model won’t talk about.
@chucklefuck
@chucklefuck Жыл бұрын
​@@matthew_bermanExactly - "harmful" is highly subjective, and I don't trust the people making them to know what the models really should and shouldn't say. So far GPT4 will tell you every kind of pride (black, Asian, gay/lgbtq+, etc) are totally fine, but white pride is violent racism. Is that not "harmful?"
@just_one23
@just_one23 Жыл бұрын
@@chucklefuck while it is true that models can be politically aligned, having one say how to craft a bomb or a poison would make such information much more accessible, and believe it or not, bombs are more harmful than your personal racial beliefs not being accepted by a bot.
@darkphoenix2
@darkphoenix2 Жыл бұрын
@@chucklefuck White pride typically does take the form of violence, yes.
@wdonno
@wdonno Жыл бұрын
@@chucklefuck, a poster child for why AI models need regulation/‘censorship’.
@dohnj0e
@dohnj0e 9 ай бұрын
Ball is still in the cup
@theSatanist
@theSatanist Жыл бұрын
it would actually depend on the size/type of ball you put the cup over... many balls could easily stick... but AI should have explained that there's not enough info to answer it...
@prolamer7
@prolamer7 Жыл бұрын
This is a good point! I often find that really smart model explain itself not just gives answer.
@avi7278
@avi7278 Жыл бұрын
I still get chills that im the goofball who suggested everyone's favorite, the killers problem 😂
@rh4009
@rh4009 Жыл бұрын
How would you answer your killers question?
@slalomsteve
@slalomsteve Жыл бұрын
An I missing something with the setup of some of these models? They don't seem to remember any context. You tell it something and then in the next prompt ask a question about previous prompts and it doesn't remember.
@zuzelstein
@zuzelstein Жыл бұрын
LLama-2 13b q2: -A ball is put into a normal cup. Someone places the cup upside down on a table. Someone then takes the cup and places it inside the microwave. Where is the ball now? -The ball is no longer in the cup. It has fallen out of the cup and is now on the table.
@JonathanStory
@JonathanStory Жыл бұрын
Uncensored seems smarter. Careful handling may yield good results.
@Zivafgin
@Zivafgin Жыл бұрын
What's the reason of using it instead of the GPT4 API?
@maxitube30
@maxitube30 Жыл бұрын
i like this tecnology but,somethime don't match my expectations. We are fare away from the machine takin over,couse these model are trained on tryal error metod. Then they siply giving best matched response,not reasoned. Anyway i like ur videos =)
@rh4009
@rh4009 Жыл бұрын
The "complicated math" answer is logically correct but incomplete. First: multiplication (2*4 = 8) ✔ Next: subtraction (25 - 8 = 17) ✔ next: should have been addition (17 + 3 = 20) ❓(missing step) Finally: solution = 20 ❌ Not to say that the model is correct, but it is far better than models that can't relate the question to PEMDAS. This failure mode is similar to a kid's who understands the rules, understands the question, but rushes to be done with the test so he can go out and play, or is distracted by the winking girl sat next to him. So, the model skipped step 3, which lead to an ultimately incorrect final solution, but it shows a lot saner logic than others (ie, it would need less supervision to successfully complete tasks than other models)
@cbarak72
@cbarak72 Жыл бұрын
Can you run this on a mac studio ultra with unified RAM? How much VRAM does it actually need?
@kacchanvotlenick6499
@kacchanvotlenick6499 Жыл бұрын
From a few people I've heard in reddit they said you need like 48 ram or something and with 2 4090s or 3090s/ti
@thelavalampemporium7967
@thelavalampemporium7967 Жыл бұрын
i feel like you need better questions, using math questions and questions where you ask to ai to tell you how many words will never work properly for an llm. those sorts of tasks will always use a separate model when me move towards more advanced ai
@8eck
@8eck Жыл бұрын
Cool, so it was automatically aligned via GPT-4.
@Lucasbrlvk
@Lucasbrlvk Жыл бұрын
😮
@gileneusz
@gileneusz Жыл бұрын
4:18 I asked this question to GPT-4 and that's the answer with reasoning. It looks like the ball should still stay in the cup, because the prompt didn't mention that the ball fell on the table, when cup was placed on the table. Please don't trick AI. 🤣 USER: A ball is put into a normal cup and placed upside down on a table. Someone then takes the cup and puts it inside the microwave. Where is the ball now? explain your reasoning step by step, with the most likely scenario ChatGPT The ball is most likely still inside the cup. Here's my step-by-step reasoning: The Cup and Ball Placement: You mentioned that the ball was put into a normal cup and the cup was placed upside down on a table. This means that the ball is initially trapped within the confines of the cup. Gravity keeps it there, given that the opening of the cup is facing downward and the ball cannot fall out. The Cup and Ball in the Microwave: Then, if someone takes the cup and puts it in the microwave without modifying the cup's position (i.e., they did not flip it right-side-up), the ball would still remain inside the cup. The microwave's door is generally swung or pulled open horizontally, and the cup would be placed on the microwave's turntable without needing to alter the cup's orientation. Hence, the ball would still be trapped inside the cup, now inside the microwave. Possible Scenarios Post Microwave: Post this, there can be two possibilities. If the microwave is turned on, the ball and the cup will heat up. Depending on the material of the ball and cup, this could lead to a variety of outcomes. However, if we're purely considering the spatial location of the ball, it would still be within the cup unless some significant event occurred that could dislodge it. In summary, without any additional context suggesting otherwise (like the cup being flipped or the ball being affected by the heat), the ball is likely still inside the cup, and the cup is inside the microwave.
@gileneusz
@gileneusz Жыл бұрын
I modified the prompt adding: "small" ball. However when I run this prompt few more times it got confused and gave me wrong answers. But that's one of good answers I've got: USER: A small ball is put into a normal cup and placed upside down on a table. Someone then takes the cup and puts it inside the microwave. Where is the ball now? explain your reasoning step by step, with the most likely scenario ChatGPT The most likely scenario based on your description is as follows: Step 1: A small ball is put into a normal cup. At this point, the ball is inside the cup. Step 2: The cup is then placed upside down on a table. The term "upside down" generally implies that the opening of the cup is facing downward. Assuming the ball is small enough not to get stuck in the cup and there's no other force keeping it there, when the cup is turned upside down, the ball would likely fall out due to gravity and end up on the table. So now, the ball should be on the table, not in the cup anymore. Step 3: Someone then takes the cup and puts it inside the microwave. The ball, having fallen out of the cup in the previous step, would likely remain on the table. Therefore, based on these steps and assuming the most likely sequence of events, the ball should now be on the table and not in the cup or the microwave.
@rh4009
@rh4009 Жыл бұрын
I would prefer an even less "mechanical" explanation than ChatGPT's. A person tasked with putting the cup in the microwave would have understood that the task is to microwave the ball, not an empty cup, a task easily accomplished whether they have to cover the cup's mouth with their hand, or flip it. The outcome, no matter the mechanics of completing the task, is that the ball ends up in the microwave. I would further expect the person to flip the cup, if they expected the ball to melt, or use it to cover the microwaved ball, if they expected the ball to explode; The question does not ask about how the cup is positioned in relation to the ball. Maybe it should have, the model's answer might be interesting.
@ungamed
@ungamed Жыл бұрын
You are saying the ball is upside down, not the cup, therefore the ai is correct 😊
@rh4009
@rh4009 Жыл бұрын
It got the "killers" question correct, that should be a "pass". The "was not originally identified as a killer" clarification changes my mind. I think this model should get a pass, as there are 2 killers left alive in the room, and a third live person, who has done one killing. But is that third person properly described as "killer" ? Does someone become a "killer" when he kills one person? Eg. What about the "executioner" whose job is to kill convicted killers? Can the "executioner" be described as "killer", in the same sense as one of a few in a room-full of killers? Is the label "killer" earned by conviction, or by accusation? Does it say the guilty has demonstrated a pattern of killing? In common language, you would not say about someone who has done their first kill that they are a "killer", but more simply "convict". Eg, if the killing is done by accident, as running over someone with your car, most people would not consider the reckless driver to be a killer, unless he made a habit of it. Even in your own explanation you say "I like that because it really identifies that that person *can become* a killer in the *future*" - ie, in your mind, the newcomer "can become", (but is not yet) a killer. You could not have made it more clear that it's not clear to your mind, that the newcomer has already become a killer. While you ask for reasoning step by step, the actions being contemplated (entering the room, then killing), do not happen "step by step", but are presented as a complete scene to the model. At the time it describes step 1 of the reasoning, the model already knows that the newcomer will have already killed one of the killers. You (Mathew) also have a complete picture of the scene in your mind, and still use wording like "becoming a killer in the future", which means, having a complete account of the already transpired events, your mind is not satisfied that the newcomer is already a killer. As a neutral member of the audience, I hear your wording and conclude that in your mind you are not regarding the newcomer as a killer, but you do expect the model to give an answer that indicates its "mind" works differently than yours. No disrespect intended, but this is an example of a model showing evidence that it is as intelligent as the evaluator. I still believe the reality is that it's producing word soup that happens to be the same as an intelligent person would articulate, but nonetheless, evidence is evidence. I find this evidence compelling, but insufficient to prove intelligence of the model. Step 3 of the model's reasoning, is incorrect word soup. It uses all the right words, but lacks logic. It talks about "newcomer", "original killer", but it fails by saying "another original killer", because of the original killers, arithmetic dictates there are "two original killers". The wording "another original killer" instead of the correct "another two original killers" is the failure. I would have accepted the "two killers remaining" answer without reservation, if it didn't make the mistake in step 3. As it is, I still accept "two killers" as the correct answer, given the "was not originally identified as a killer" reasoning in step 1, but with reservation due to the plain accounting error at step 3. Step 4 invites further reservation, because it is unnecessary word soup, as it says, more ambiguously that either that "there are still 4 people in the room after the initial entrance, 3 live and 1 dead", or "there are still 3 live people in the room after the initial entrance, because the killing (-1 live) compensates for the entrance (+1 live)" or "there are still 4 bodies in the room, because the killing does not remove a body, and a dead person is still a person". I really love the ambiguity this model creates in step 4. The ambiguous answer is as correct as any of the 3 non ambiguous statements that it avoids. The calculation that lead to the ambiguous wording deserves, imho, special recognition that the model prefers an equally correct, but more general/less specific wording, to either of 3 unnecessarily specific, unambiguous wordings. "Unnecessary" because it was asked to give reasons, not an accounting of its arithmetic. In my book, this explanation, and specifically the "was not originally identified as a killer" clarification is key, and would earn this model my "pass" on that one question.
@miyamotomasao3636
@miyamotomasao3636 Жыл бұрын
Basically, what you are saying is that the IQ of the owner of this KZfaq channel is too low to test an AI intelligently ! 😎 What about starting your own channel instead of wasting your time watching this one ? 🕺
@d34bru
@d34bru Жыл бұрын
logically, whatever doesn't limit your creation is better, censors are just weights.
@MrArrmageddon
@MrArrmageddon Жыл бұрын
I can't wait until a GGML of L2 70B that works in Ooba is released. I know there are some issues that need to be worked out. My PC can run 70B GGML slowly but still fast enough to use. But I can't wait for the 34B llama 2 to release and get like a Guanaco, Freewilly, or Airoboros of it on GGML! That is what my PC runs super well.
@spinninglink
@spinninglink Жыл бұрын
How uncensored are we talking about here since llama is very censored?
@fusseldieb
@fusseldieb Жыл бұрын
I tried the 7b uncensored one from the same dev, and let me say, yea, there is NO censore AT ALL. You can ask it literally anything. I tried to push it really hard and it just kept going. If Windows is indeed spyware, I'm probably on several lists rn ahahahah
@spinninglink
@spinninglink Жыл бұрын
@@fusseldieb How did you install it, google collab?
@cooleregg5577
@cooleregg5577 Жыл бұрын
We need a redpill llama
@testales
@testales Жыл бұрын
What about the T-Shirt question? Also, we know that these models can summarize, write letters well and know about Clinton. I think it won't hurt to skip these. For the killer problem, I'd agree to a comment saying this depends on context. If a person enters a room with 3 killers and kills one, it may very well be due to self-defense. Image first day in prison of a new inmate who is not a convicted killer but just a thief and gets into a cell with 3 murders and then gets instantly attacked by one of them. Also I've seen quite a few thimbleriggers who raised the cup and there was no ball anymore on the table. ;-)
@rh4009
@rh4009 Жыл бұрын
The other problem with the killers question is that is asks "how many killers are *left* in the room". This means "how many of the original killers remain". It doesn't matter if one of them was vaporized with a laser from outside the room, or if the weapon (and a new killer) was in the room. The bit about someone entering is misdirection, and should have no bearing on "how many killers are left", but it is intended to confuse and invite the reader to forget about the "original" word which is implicit, and read the question as "how many killers are in the room at the end", a distinctly different question than "how many are left". The question also implies that only the 3 individuals are in the room at the start. It could be that there are 3 killers and 3 non-killers. If the 3 non-killers gang up on one of the killers, how many killers are *left* in the room? I would say 2, as in one down, two to go. But how many killers are in the room after the killing? I would say 5, as in two original killers plus 3 new killers, assuming that one killing instantly makes them killers, and you give a great example how this would be invalid, ie, self defense. In this example, 3 + 3 people in the room, the "left in the room" answer is still 2, meaning a person entering the room was immaterial, as is that 3 there are 3 killers in the room which are not originally killers. It's a question that tests how well reasoning is articulated, rather than how valid/correct the logic is.
@rh4009
@rh4009 Жыл бұрын
@@IOFLOOD I am referring to the newcomer entering the room (an irrelevant fact) as the "misdirection", not the fact that the newcomer became a killer misdirection (a fact that the model might fail to account for). I say that the newcomer entering the room misdirects the evaluator in expecting a different answer than the correct (2), while I believe you're talking about the newcomer becoming a killer fact, which is intended to misdirect the model into miscounting the number of "killers left out of the original number of 3"
@rh4009
@rh4009 Жыл бұрын
@@IOFLOOD The correct answer should be consistent with the situation "2 live killers left and 1 dead killer left". The point of "left" is very important in this question. Ie, it asks how many out of the original 3 killers are still in the room. So 4 killers is incorrect regardless how you count. 2 or 3 would be ok depending, like you say, on how you resolve the live person/dead person ambiguity. Without explaining the ambiguity of whether a dead person can still be a killer, I think it's reasonable to think that a dead person can't kill, so is not a killer. The reasonable correct answer is 2. The answer of 3 relies on an uncommon understanding that having killed, a person is permanently a killer, even in death. This is an "honour" commonly reserved to the most prolific of killers, but common killers are simply dead persons after death (and also fathers, mothers, soldiers, policemen, or whatever other qualifications describe them). What if a dead one-time accidental killer (eg, accidentally killed someone) was also a well known policeman? Would he be described as a "killer" or "policeman" in death?
@__--JY-Moe--__
@__--JY-Moe--__ Жыл бұрын
looks & sounds pretty helpful! thanks 4 the show!! this would be a great student tool! I just regret the learning curve of A.I. adoption, because this will probably take another 3 years. 2026. to finally be integrated into society! good luck all!
@__--JY-Moe--__
@__--JY-Moe--__ Жыл бұрын
@@IOFLOOD 🌴🌴👀
@kaemmili4590
@kaemmili4590 Жыл бұрын
Free willy 2
@craig6095
@craig6095 Жыл бұрын
Please no overnight have too?, many u had too had to me.
@JF-vt4ve
@JF-vt4ve Жыл бұрын
just read the LLaMA 2 paper. It states: "We train for one epoch over the training data. In earlier experiments, we found that training longer can lead to over-fitting" Really? So it saw training data only once? I thought the Grokking paper showed the opposite!?
@ErickGuerra-tk8fb
@ErickGuerra-tk8fb Жыл бұрын
Yeah I don't think this is gonna work... * No modules imported *😄
@kawaifreefirefrota4526
@kawaifreefirefrota4526 10 ай бұрын
Can i use it for write hot romances NSFW?
@alborzdesign
@alborzdesign Жыл бұрын
PEMDAS Parentheses, Exponents, Multiplication/Division, Addition/Subtraction the reason Multiplication/Division and Addition/Subtraction have / between them is that there's no priority between them. you're supposed to do them in the order they are written. so 4-2+2=4 and not 0. The model actually got that one right. for the killer in the room problem, the question says how many left in the room. the dead killer is still considered a killer's dead body. so technically there are two killers left in the room. it got the right answer, but the wrong logic.
@LG-bb5zq
@LG-bb5zq Жыл бұрын
mediocre compared to gpt 4. This is like censored text davinci3
@mirek190
@mirek190 Жыл бұрын
4-bit models suck ...why you still using them? Use at least ggml 5bit or 6 bit models ... even 4bit ggml like q4k_s or m are better than any GPTQ 4 bit models.
@fusseldieb
@fusseldieb Жыл бұрын
GGML are slow af
@mirek190
@mirek190 Жыл бұрын
@@fusseldieb SLOW? Maybe few months ago but not now. First ggml models have far better precision much closer to native FP16 ... hybrid high fp for weights ( q5k_m for instance fp16 and 5 bit precision for data ) ... GPTQ is ancient 4 bit to everything it has so big perplexity .... Even GGML 4 bit is far more advanced than old 4bit...like q4k_m uses fp12 ... Second are fast nowadays and you can use GPU + CPU - so you are not limited to VRAM size. You can use Witch my RTX 3090 7BB - 90 T/s 13B - 55 T/s 33B - 35 T/s 65B - 2.5 T/s....as cannot put into VRAM so only CPU + GPU ( 40 layers on GPU ) 70B ( llama 2 ) - 2.3 T/s
@Keksent
@Keksent Жыл бұрын
Can you give us a list of uncensored and good models? I just subbed on my first video of yours so enjoy the cream
@bitcoinisfreedommoney.fckt2663
@bitcoinisfreedommoney.fckt2663 Жыл бұрын
Uncensored models are the only way to go as we’re seeing from ChatGPT getting worse. Ancedotally I’ve noticed it not performing well on a regular simple task I give it
@tea_lover_418
@tea_lover_418 Жыл бұрын
I have a suggestion for your JSON question. It is currently not very realistic for real life applications, because as a programmer you need your response to follow a very specific format for it to be any good, otherwise it would cause syntax errors or not register the data as you expect it. The question as prompted now allows the model total freedom, which would still result in unstructured data for a programmer. I've actually performed this test in the past with a prompt detailing how I want the output to be like. This is what I used: --- Format the following text to valid JSON. Extract the first_name, last_name, job, birthdate (yyyy/mm/dd), gender (male/female/other), phone_number, married (true/false) and a list of hobbies ([]). If any information is not present, return null. Extract from this: Hi! My name is Jane Doe, and I was born December 5th 2002. I identify as they/them but I was assigned female at birth. My phone broke a few weeks ago so right now you can't call me! But if you must you can call my husband at (555) 555-1234. In my free time (when I'm not at work programming) I spend my time riding my horse Daisy, going out hiking in the rocky mountains, or watching movies. --- The idea behind each field is different. first/last name to see if it follows the provided structure. Birthdate to see if it formats it the right way (2002/12/05), gender to see how it deals with more complicated info, phone number wether it can return null values, and married to see if it can return booleans. The list of hobbies to see if it can format lists (["horse riding", "hiking", "watching movie"] and can ignore verbose details, and the job to see if it can take the liberty from "programming" to "developer". If any of these fields are wrong it will cause issues in the real world for a programmer, but luckily most models are very good at formatting JSON. Suprisingly smaller models were better at this, with Vicuna-7b beating ChatGPT. ChatGPT did not follow gender as instructed, and wasn't sure if Jane was married, and kept returning the phone number of the husband. Vicuna got it right consistently. I hope this helps you and see if you can include it in your tests.
Google Releases AI AGENT BUILDER! 🤖 Worth The Wait?
34:21
Matthew Berman
Рет қаралды 229 М.
Large Language Models (LLMs) - Everything You NEED To Know
25:20
Matthew Berman
Рет қаралды 72 М.
Sigma girl and soap bubbles by Secret Vlog
00:37
Secret Vlog
Рет қаралды 15 МЛН
ЧУТЬ НЕ УТОНУЛ #shorts
00:27
Паша Осадчий
Рет қаралды 10 МЛН
I'm Excited To see If Kelly Can Meet This Challenge!
00:16
Mini Katana
Рет қаралды 28 МЛН
Ollama: The Easiest Way to Run Uncensored Llama 2 on a Mac
11:31
Ian Wootten
Рет қаралды 31 М.
This Could Be a MASSIVE AI Business...and Now It's Yours 🤑
9:58
Matthew Berman
Рет қаралды 43 М.
LLaMA 3 Tested!! Yes, It’s REALLY That GREAT
15:02
Matthew Berman
Рет қаралды 218 М.
AI Pioneer Shows The Power of AI AGENTS - "The Future Is Agentic"
23:47
Llama-2 with LocalGPT: Chat with YOUR Documents
23:14
Prompt Engineering
Рет қаралды 166 М.
Terrence Howard Talks About a 6000-Year-Old Secret (OMG!!!)
21:12
Video Advice
Рет қаралды 3 МЛН
"okay, but I want Llama 3 for my specific use case" - Here's how
24:20
Grok-1 FULLY TESTED - Fascinating Results!
8:27
Matthew Berman
Рет қаралды 168 М.
Rate This Smartphone Cooler Set-up ⭐
0:10
Shakeuptech
Рет қаралды 6 МЛН
Look, this is the 97th generation of the phone?
0:13
Edcers
Рет қаралды 8 МЛН
8 Товаров с Алиэкспресс, о которых ты мог и не знать!
49:47
РасПаковка ДваПаковка
Рет қаралды 175 М.
#samsung #retrophone #nostalgia #x100
0:14
mobijunk
Рет қаралды 13 МЛН
Копия iPhone с WildBerries
1:00
Wylsacom
Рет қаралды 8 МЛН