Gemini Ultra 1.0 - First Impression (vs ChatGPT 4)

  Рет қаралды 59,769

All About AI

All About AI

Күн бұрын

Gemini Ultra 1.0 - First Impression (vs ChatGPT 4)
HubSpot FREE E-book:
clickhubspot.com/lp9
👊 Become a member and get access to GitHub:
/ allaboutai
Get a FREE 45+ ChatGPT Prompts PDF here:
📧 Join the newsletter:
www.allabtai.com/newsletter/
🌐 My website:
www.allabtai.com
I test the new Gemini Ultra 1.0 with Gemini Advanced, i do my usual LLM tests, and compare the results with GPT-4.
00:00 Gemini Ultra Intro
00:26 Gemini UI
01:09 Gemini Ultra Test 1
03:43 Gemini Ultra Test 2
06:03 Gemini Ultra Test 3
08:42 Gemini Ultra Test 4
10:02 Gemini Ultra Test 4
11:10 My Conclusion
This video was sponsored by HubSpot

Пікірлер: 273
@AllAboutAI
@AllAboutAI 2 ай бұрын
HubSpot FREE E-book: clickhubspot.com/lp9
@ryzikx
@ryzikx 2 ай бұрын
it's ridiculous How GPT4 has been a thing for over a year though only publicly available for less than a year yet still is ahead of other companies state of the art products
@alexbosh
@alexbosh 2 ай бұрын
They have Ilya S. I hope he still works there.
@spol
@spol 2 ай бұрын
Google surprisingly dropped the ball. Pretty unexpected since their leadership has been predicting this shift for over 10 years. I thought they would be far and away the leaders. Hats off to OpenAI and Anthropic for stealing the thunder.
@user-bd8jb7ln5g
@user-bd8jb7ln5g 2 ай бұрын
What is ridiculous is that GPT4 was better when it launched, for over a year is has been downgraded, first when OpenAI introduced Enterprise accounts and later when the GPT store launched. In fact GPT4 should now be called GPT 3.8
@countofst.germain6417
@countofst.germain6417 2 ай бұрын
Lol yea I'm really not sure if he's still there​@@alexbosh
@rasmus5079
@rasmus5079 2 ай бұрын
@@user-bd8jb7ln5g sure bud
@supa2342
@supa2342 2 ай бұрын
I've switched to using Gemini (recently renamed) as my primary search engine. It still uses Google but adds the crucial ability to cross check sources, making my research more reliable. Plus, I get the bonus of 2TB of cloud storage, which is important to me.
@pavelperina7629
@pavelperina7629 2 ай бұрын
Valid point. For me it's good exactly for that - compiling most common answers to my question. Other than that, non-paid version feels a bit dumb and I struggle to get what I want from it, expecting it behaves like GPT4. Sometimes it struggles to follow the flow of conversation in a way that humans do. I think GPT3 had this kind of problems year ago. But for me gemini-basic is useless, it complains about missing explanation when doing article review if it cannot find typical explanation and it does not understand that topic is explained from a different perspective. When I complain about it, it can admit it and then it disregards whole article and replaces by something like a summary from wiki on that topic. Hell, I don't want to write blog based "please summarize wiki on topic abc in three paragraphs" and copy&paste answer. But gemini converges to that.
@ThaGuille
@ThaGuille 2 ай бұрын
Nice excuse but youu can cross check for free and the 2TB it's a trap to keep ppl there :)
@supa2342
@supa2342 2 ай бұрын
not really a trap when you can cancel anytime and move ur data elsewhere lmao wtf are these comments these days@@ThaGuille
@logoutjason4689
@logoutjason4689 2 ай бұрын
can you research pictures of say...Swedish women?
@lucdeflem
@lucdeflem 2 ай бұрын
For the first question, it is interesting to read the reasons why Gemini says "about 10 hours" ... it's really 'thinking' way beyond the mathematical answer ... if you hang the 10 shirts in the same space as you hung the 5 and if it is a reduced space ... it will take longer ... no doubt ... I am a ChatGPT aficionado but fair is fair
@kenno1572
@kenno1572 2 ай бұрын
Exactly, it goes beyond the crude and unrealistic "assume perfect conditions".
@valdermaar
@valdermaar 2 ай бұрын
I disagree. It should have said "Its gonna take the same amount, unless you have a clothing line where 10 shirts dont fit". The general presumtion when you ask such a question is that it has no hidden constraints if not explicitly mentioned
@francescolamazza5594
@francescolamazza5594 2 ай бұрын
Exactly.
@scientist30
@scientist30 2 ай бұрын
Google fan spotted
@olagunjujeremiah4003
@olagunjujeremiah4003 2 ай бұрын
But exact same prompt was given to GPT-4 and I’m sure you can see it’s response 😅
@Zugzug2011
@Zugzug2011 2 ай бұрын
I think it's hitting launch issues - Personally think I'll withhold judgement for a few days and see how things shake out. Initially not impressed compared to GPT-4 - but the fact that things like code-gen and image-gen are getting declined makes me think we're not seeing the real thing yet.
@user-ik8vy1rg8f
@user-ik8vy1rg8f 2 ай бұрын
ImageFX images look like butt. Usually worse than DALL-E 2. Google's still way behind.
@alexp3620
@alexp3620 2 ай бұрын
You could see already on pro release that they are way behind. Announcement that they will release ultra "soon" was just excuse because they know it's bad. Once new version of chat gpt is released they will be again years behind.
@nextgenpodcaster
@nextgenpodcaster 2 ай бұрын
It's definitely running into a few snags right out of the gate. It didn't want to let me use the Gemini App, then after trying it a bit later popped right up. Then the app didn't want to let me switch to the account that I purchased Gemini Advanced on. A little later it works.
@kavinvikram2960
@kavinvikram2960 2 ай бұрын
I think google is struggling hard. But let's wait and see. Chat GPT got good human feedback due to the 1st player advantage, but i don't think gemini will ever get that much human user fedback as chat GPT
@user-bd8jb7ln5g
@user-bd8jb7ln5g 2 ай бұрын
I think Google built in so much safety that Gemini is struggling. Moreover it seems not as well tuned and or it has conflicting directives. Given the same prompt, sometimes it provides the perfect answer, other times it doesn't know its abilities etc. Inconsistent.
@olegrindhaug736
@olegrindhaug736 2 ай бұрын
To be fair, the first question "under same conditions" could be interpreted as having the same length clothing line or similar drying appliance.
@ShpanMan
@ShpanMan 2 ай бұрын
The ball in the bag example is very good and really counters any kind of "just statistically predicting next token" bs. There's no way to solve it without an internal model of the world.
@sepptrutsch
@sepptrutsch 2 ай бұрын
Totally. People dont understand how LLM work.
@netscrooge
@netscrooge 2 ай бұрын
I agree, but I think it's wrong to assume the ball fell through the hole. It's yet another example of humans having some of the same problems thinking clearly as AI.
@ShpanMan
@ShpanMan 2 ай бұрын
@@netscrooge Yes, there's an opening for a wrong assumption here, hopefully in the future they let it ask clarifying questions.
@justtiredthings
@justtiredthings 2 ай бұрын
​@@netscroogeChatGPT noted that that was the most likely outcome--not a certainty
@sepptrutsch
@sepptrutsch 2 ай бұрын
@@netscrooge why is it wrong to assume the ball fell out?
@OmicronChannel
@OmicronChannel 2 ай бұрын
For the first question the answer of Gemini took into that account that a larger batch of clothes will influence the local drying conditions, which might be reasonable to some extent. Hence there could be an influence to the drying process - even under the same conditions. So, I do not comprehend the disappointment with the answer in that case.
@ThomasFan2002
@ThomasFan2002 2 ай бұрын
I hope it's hitting launch issues, because I ran into the same issues as you do and it feels like I'm not on the Ultra 1.0 model yet. We will see how things unfold in the next upcoming days.
@MarioRodriguez-zk2rd
@MarioRodriguez-zk2rd 2 ай бұрын
I think is because google has set a bunch of restrictions in their models, and many times show canned responses.
@iiiBog
@iiiBog 2 ай бұрын
hm, disappointed in Gemini yet, will see how it goes and how fast it will receive updates
@matthewdignam7381
@matthewdignam7381 2 ай бұрын
Ultra clearly wasnt ready yet, its no where near the competitor to gpt
@firdousbhat123
@firdousbhat123 2 ай бұрын
Thanks for bringing up this video.
@ahtoshkaa
@ahtoshkaa 2 ай бұрын
I'm also running into the "I'm just a language model" bug. You need to restructure the prompt to have instructions at the beginning and the context after the instructions for it to disappear (at least in my case it helped).
@techug2719
@techug2719 2 ай бұрын
About the shirts I'm with Gemini on this one cause, considering you hang them in the same general location more wet shirts = more humidity (considering still air conditions) generally which would affect the rate at which the drying occurs so i can see why Gemini says generally though it wouldn't be the exaggerated 20 hours😂
@dr.mikeybee
@dr.mikeybee 2 ай бұрын
I got that same message that Gemini couldn't do something. I asked why. It said "I can't execute code." I said don't execute it just write it, and it wrote it.
@HaiLeQuang
@HaiLeQuang 2 ай бұрын
Yes, I got that too very frequently and had to execute the prompt again. It seems Google needs sometimes to settle down. It's still the 1st 24 hours anw.
@ahtoshkaa
@ahtoshkaa 2 ай бұрын
@@HaiLeQuang "You can do it" in th beginning of the prompt helps me in most of the cases to fix the "I can't do it" bug
@pavelperina7629
@pavelperina7629 2 ай бұрын
I got this all the time when using markdown formatting, almost anything - link, piece of code, tex formula - breaks it.
@gonzalobruna7154
@gonzalobruna7154 2 ай бұрын
so it's more like a bug ? because I was thinking of paying it because of google one and all the things it gives instead of chatgpt, but I tried gemini pro which is free but it always says it can't do it.
@colto2312
@colto2312 2 ай бұрын
@@ahtoshkaa lmao our actual state. we have to give our electronics encouragement / positive reinforcement
@alexanderwruss
@alexanderwruss 2 ай бұрын
Have you tried uploading pdf‘s with Gemini advanced? I always get error response like the one you got .. right now I have the open ai subscription and the Gemini advanced subscription and I’m trying to figure out which one is better What do you think?
@randyh647
@randyh647 2 ай бұрын
the ball and bag story: you said you put them into the box, I would consider "them" to be plural and assume it was the bag. Its also possible the ball is in a corner of the bag or side of the bag away from the hole. But I don't see any logic behind its thought process so these are good questions to ask. I am in the USA maybe other countries view "them" differently another good point maybe each AI needs to be country specific.
@ikerlinares1876
@ikerlinares1876 2 ай бұрын
good point about gemini being quite casual in talk.
@nic-ori
@nic-ori 2 ай бұрын
Thank you!👍👍👍
@DeePatience
@DeePatience 2 ай бұрын
Glad I checked with you instead of upgrading. I really liked Bard actually so I hope they haven't made things worse.
@WhatsTHATdotCOM
@WhatsTHATdotCOM 2 ай бұрын
It's free for two months. Just upgrade and test it yourself lol
@lopen551
@lopen551 2 ай бұрын
@AllAboutAI It's because of limited input length, Bard had the same problem. In general Google lost big time, Gimini is still bad, Ultra or not.
@semozaki23
@semozaki23 2 ай бұрын
I feel like first test response from Ultra was undervalued. The model was being realistic by saying "if factors remain truly identical"
@ContextFound
@ContextFound 2 ай бұрын
I love your standardized suite of questions, but when a model starts getting them right you'll have to do something to validate that that specific example hasn't made it into the training data. Clearly not a problem yet ;)
@brianhopson2072
@brianhopson2072 2 ай бұрын
I asked Gemini about custom instructions: "Here's how to think about creating custom instructions for characterization and model behavior in language models like me: **Characterization** * **Personality Blueprint:** * **Traits:** Define a list of core traits the model should exhibit (e.g., humorous, sarcastic, knowledgeable, empathetic). * **Vocabulary:** Suggest words and phrases consistent with the desired personality. For example, a formal character might use words like "indeed" and "furthermore." * **Sentence Structure:** Does the character use short, choppy sentences? Long, flowing ones? This adds nuance. * **Contextual Backstory:** * **History:** A simple background story informs the model's responses. Did the character grow up wealthy or experience hardship? * **Roles/Relationships:** Is the character a mentor? A rebellious teen? Their position towards others will heavily influence their language. * **Tone and Voice:** * **Mood:** Should the character generally sound upbeat, melancholic, neutral? * **Formality:** Does the character use slang, contractions, or a more polished speech pattern? **Model Behavior** * **Response Length:** Do you prefer short and concise answers, or longer, more detailed ones? * **Proactiveness:** Should the model mainly respond to prompts, or should it sometimes initiate conversations/topics? * **Focus:** Are there specific topics or knowledge areas the model should exhibit an interest in or deep understanding of? * **Creativity:** How inventive should the model be when generating responses? Consider limitations for practical use vs. prioritizing highly whimsical interactions. **Example of General Instructions** **Character:** An eccentric but brilliant retired physics professor. * **Personality:** Absent-minded, passionate about explaining science, vocabulary is quirky and a bit old-fashioned, occasionally forgets simple things. * **Backstory:** Long, distinguished career at a top university, now more interested in tinkering with inventions in his chaotic garage. * **Behavior:** Prefers longer responses, enjoys going off on tangents about his passions, may need gentle reminders to get back to the original question. **Caveats** * **Complexity:** The more elaborate and internally consistent you want the character, the more challenging it gets for language models to maintain. Start simple! * **Fine-tuning:** Large language models like me *can* be fine-tuned with data sets reflecting your instructions. However, this usually requires coding expertise and computational resources. **Let's Experiment!** Would you like to draft some custom instructions for a character? It can be simple or complex. I'm ready to try it out and see how effectively I can embody your creation!" Fun to do but I would be afraid it would lose its work by the time I would get on it again. I'll just stick with my chat gpt for creative and maybe this might be okay for the technical aspects of my projects taking advantage of the real time. But I don't really see much more purpose than that.
@wildheart25c
@wildheart25c 2 ай бұрын
near the end you said you might not be on ultra. if you check the update you showed us first few seconds of this video, you'll see that they wrote that you might be sent to another model to deal with your prompts. the comparison you made is very good. but any chance you can come up with more prompts where chatgpt fails and gemini wins? give the new guy a chance :D
@johngault22akaprometheanje10
@johngault22akaprometheanje10 2 ай бұрын
I think the issue you are having with the image generation is down to you being in the EEA as I have been given same text response that you got as I live in the UK but my friend in the US says that Gemini Advanced is generating images for them.
@wildfotoz
@wildfotoz 2 ай бұрын
I've seen different content creators run the same type of tests. I'm wondering if some of these companies have put stuff in place to give responses to these questions. Maybe we need some new tests that haven't been performed before.
@AllAboutAI
@AllAboutAI 2 ай бұрын
yeah might be true 😊 will write some new tests for my next impression
@SirajFlorida
@SirajFlorida 2 ай бұрын
How exciting. Can't wait to finish the vid!
@user-cd6cv2gm5d
@user-cd6cv2gm5d 2 ай бұрын
Normal gemini, with better explanation: with the provided information about identical conditions on the same day, it's safe to assume drying 10 shirts would take around 10 hours, similar to the 5 shirts
@defipunk
@defipunk 2 ай бұрын
My answer with Gemini: " This sounds like a riddle! Here's how I'd break it down: New York: Your location is a bit of a red herring. It doesn't affect where the ball ends up. Bag with a hole: This is important. A bag with a hole bigger than the ball means the ball can fall out. Office and Desk: More locations that don't really matter. Box: This is the key. You place the bag with the hole into the box. Answer: The ball most likely fell out of the bag sometime between your office and when the box was shipped. It likely isn't in the box at all!"
@namratanath7564
@namratanath7564 2 ай бұрын
lol that shirt thing is the exact question I always ask! my litmus test for agi (not) :D
@DimCervix
@DimCervix 2 ай бұрын
The thing with explaining a code is due to a way you paste text. If you would place code, then after it place your question, it would give you result. But you mixed code and your questions, so Gemini thinks that you want it to perform some action
@apache937
@apache937 2 ай бұрын
maybe, but gpt 4 gets it so obviously gemini is worse
@pylotlight
@pylotlight 2 ай бұрын
he did it twice as well which was frustrating. pasting code between the question despite obviously not intending to do so, and didnt notice.
@rasmus5079
@rasmus5079 2 ай бұрын
@@apache937 It isn't that simple. With most AI tests published, Gemini wins gpt.
@apache937
@apache937 2 ай бұрын
i dont care about your benchmarks, if using it in practice is a headache compared to gpt4 then im not using it. @@rasmus5079
@OrbitTheSun
@OrbitTheSun 2 ай бұрын
Copilot: _So, it will take approximately 20 hours for the 10 shirts to dry under the same conditions._ ChatGPT 3.5: _Based on the given information, it took 10 hours for 5 shirts to dry. Assuming the conditions remain the same, it would still take approximately 10 hours to dry 10 shirts._ So why is ChatGPT 3.5 better than Copilot, which supposedly is powered by ChatGPT 4?
@phen-themoogle7651
@phen-themoogle7651 2 ай бұрын
Nice test with coding, I tested it with language ability in Japanese. It's better than GPT4 at writing articles and I learned a couple new expressions. I'm fluent but I learned less using GPT4 or maybe I just got used to GPT4 over the past few months already. Gemini feels so fresh and it impressed some of my Japanese friends with how beautiful it wrote the article, impressed me too. Also did an amazing job at translating it. It's probably better at languages than GPT4 or maybe it depends on what language. And I tried chess with it, it loses track after 10 moves even if I tell it all the moves each message, so it's probably worse than GPT4 at that >. < , this is something I was more looking forward to since they made Alphazero and AlphaGo and mentioned strategic thinking , kinda ironic it didn't excel in calculation if it has all the moves and knows the position but it just can't process it lol
@rasmus5079
@rasmus5079 2 ай бұрын
Lol the chess engines work A BIT differently
@CosmicCells
@CosmicCells 2 ай бұрын
I agree. My first impression is also that GPT 4.0 still takes the cake compared to Gemini Ultra. Comparing them side by side, Gemini's answers are a little shorter and while they touch on similar topics they slightly lack the nuance and reasoning depth 4.0 offers... However, Gemini is up to date compared to 4.0, but Perplexity is better when it comes to internet queries than Gemini Ultra. Hopefully, overall, this is just an early-release problem and will get better!
@AllAboutAI
@AllAboutAI 2 ай бұрын
Yes i have the same impression so far. Will follow up tho!
@Leto2ndAtreides
@Leto2ndAtreides 2 ай бұрын
I think the image generation may be a request routing problem... If they have multiple agents on the back, then if the request gets passed to a text only model, it might fail.
@pile_of_kyle
@pile_of_kyle 2 ай бұрын
You could be right but it's odd that it isn't able to summarize code or generate images. I just tested it out myself and it's not even able to search the web, even though Bard could Google things and share links to news articles.
@JohnSmith762A11B
@JohnSmith762A11B 2 ай бұрын
My kingdom for an LLM that finally, genuinely beats GPT-4.
@ejh237
@ejh237 2 ай бұрын
Yes!! So that OpenAI will release GPT5… and crush whatever WAS better. :)
@user-bd8jb7ln5g
@user-bd8jb7ln5g 2 ай бұрын
​​​@@ejh237Let's hope that open source crushes GPT5
@helix8847
@helix8847 2 ай бұрын
@@ejh237 Keep dreaming, you think they have some sort of secret source. lol Fanboy Spotted
@ejh237
@ejh237 2 ай бұрын
your "fantastic" trolling aside 🤦‍♂... not really secret sauce, but it is the user base interacting with the platform. I believe that aligns with a google quote from way back. Gemini needs the interaction time, at scale.
@manhalrahman5785
@manhalrahman5785 2 ай бұрын
I found GPT-4 more "texty" in the sense that it doesn't use bullet points well and throws walls of texts. I was recently trying to understand asynchronous programming, event loops, callbacks in JavaScript - learning through Gemini was such a bliss compared through GPT 4, because of how well Gemini understands how to break knowledge into bullet points. Note: Yes I verified from an actual blog afterwards on these topics, both GPT 4 and Gemini were absolutely correct.
@robbyvanthuyne59
@robbyvanthuyne59 2 ай бұрын
Do you have any information on what happens with the google one you already have? I use google one now in a family plan, i suppose this becaome an individual plan from the moment i switch over?
@Mohammad-nv1wv
@Mohammad-nv1wv 2 ай бұрын
If you are already a 5tb+ subscriber to Google one it will give you free access to Google ai privileges for free
@USONOFAV
@USONOFAV 2 ай бұрын
Ask this CSS question in both Gemini Advance and GPT4: In HTML I have this H3 with a text "Thinking . . ." . I want the ellipsis to appear in sequence in a loop. How can I do that in CSS GPT4 code is working while the one generated by Gemini Advance is a dud.
@Mohammad-nv1wv
@Mohammad-nv1wv 2 ай бұрын
So it's not good for coding?
@danielacastaneda4025
@danielacastaneda4025 2 ай бұрын
nice review!
@davediamond
@davediamond 2 ай бұрын
Reckon you try and revisit your old MJ mega prompts and see if it's as creative.
@Lifeisnotfaironearth
@Lifeisnotfaironearth 2 ай бұрын
I got the same issue generating imatges, so I solved it using VPN connected to USA an it generated the images, not impressive but they can get a sort of continuity, a fact that is missing in Dalle-3.
@pavelperina7629
@pavelperina7629 2 ай бұрын
I don't have paid version, but the common one feels ... i can't remember how good or bad was GPT3 year ago, but something like this. It fails to parse text with any formatting (like markdown with a single link breaks it). Then it does not understand main point of text and analogies and tries to replace them with different ones when asked for review. When I tried to review analogy between MTBF of harddrive and estimating human life expectancy based on mortality of people in their twenties, it just wrote that I can't do that estimation which was exactly the point. Sometimes when I ask about it about other part of text when doing review, it forgets what was the original task I gave it and it either summarizes that text or explains the topic in completely different way, basically throwing away everything I wrote and replacing it by the most common perspective and it nearly always discourages about using different perspectives.
@akaiYaMa9
@akaiYaMa9 2 ай бұрын
I feel like Google has so many layers of safety parameters built into Gemini, its like you can't get anything done because everything conflicts with it. The upload image feature doesn't work at all with anything resembling a human. I tried uploading AI art i made into it to see what it would say and it just refused everything. Gemini feels like google assistant with a mask on it, whereas chatgpt's voice function is so human-like i have to keep reminding myself it's not a person. Im sure gemini will be good in the future but for right now i think its best use is in replacing google assistant with it on android for better answers and thats it.
@GeovannyTepepa
@GeovannyTepepa 2 ай бұрын
they both have their strengths for sure.
@1980Michaels
@1980Michaels 2 ай бұрын
Seeing the same myself, currently not even near GPT-4 from what I've tested so far
@Antalion20
@Antalion20 2 ай бұрын
Remembering all the "ChatGPT beater!!!" sensationalist youtube videos around the time of Gemini's release and having a hearty laugh. Not that I'm surprised, this is exactly what happens every time.
@fptbb
@fptbb 2 ай бұрын
then you wait the first 5 months and the thing is a killer
@travisporco
@travisporco 2 ай бұрын
why does your Gemini offer to make an image? Mine refused.
@BrianMosleyUK
@BrianMosleyUK 2 ай бұрын
I've seen your 5 shirt example before... Surprised that Gemini hasn't?
@tutorialsEU
@tutorialsEU 2 ай бұрын
lol :D
@AllAboutAI
@AllAboutAI 2 ай бұрын
Yeah maybe, dunno what dataset its on tho
@Vendetta151
@Vendetta151 2 ай бұрын
​@@AllAboutAI I asked Gemini Ultra the first question (the one you asked him too) and he replied "10 hours". It's correct. GPT4 said 20 hours, he said that he did 10 hours ÷ 5 shirts = 2 hours per shirts, then he did 10 shirts multiplied by 2 hours per shirt, and he said "20 hours". 💀
@JOHN.Z999
@JOHN.Z999 2 ай бұрын
Even though some people find GPT-4 a bit slow, it is still considered better than Google's Gemini Ultra. It seems that Google's experts are not putting in much effort to make a truly good model. Many believe that no other model will surpass GPT-4 this year, which makes us wonder how advanced GPT-5 could be, especially in terms of thinking, creativity, and memory capacity. Moreover, Microsoft Copilot is already seen as better than Gemini Ultra 1.0. This shows that Google is losing ground to OpenAI and Microsoft in the world of artificial intelligence technology.
@gabrielaalejandracanovasqu4211
@gabrielaalejandracanovasqu4211 2 ай бұрын
I wish I was as smart as Gemini, maybe then I'd pass my math test.
@groundtruth8649
@groundtruth8649 2 ай бұрын
I got a different result with the standard version of Gemini. A person lives in the New York City. They have a bowling ball and a bag. The bag has a large hole in the bottom larger than the size of the bowling ball. The person goes into his closet and puts the bowling bowl into the bag. He then takes the bag to his office. While at the office he puts the bag into a box and mails it to London. Please use step by step reasoning to tell me where the ball is now. Gemini: Step 1: The person puts the bowling ball into the bag in New York City. Step 2: The bag has a large hole in it, larger than the bowling ball. Therefore, the bowling ball could have fallen out of the bag while the person was walking to his office. We don't have enough information to definitively say whether it did or not. Step 3: The person puts the bag (with or without the bowling ball) into a box in his office. Again, we don't know if the ball was still in the bag at this point. Step 4: The box with the bag (and potentially the bowling ball) is mailed to London. Therefore, based on the provided information, there are two possibilities: The bowling ball fell out of the bag in New York City. In this case, the ball would be somewhere in New York City, potentially lost on the street or sidewalk. The bowling ball stayed in the bag and was mailed to London. In this case, the ball would be inside the box that arrived in London. Unfortunately, the information provided is not enough to definitively determine the location of the bowling ball. We need more details, such as whether the person noticed the ball missing, if the box in London felt lighter than expected, or if anyone saw the ball fall out of the bag. I hope this clarifies the different possibilities based on the given information.
@dubesor
@dubesor 2 ай бұрын
I have now had enough time to thoroughly test Gemini (Pro and Ultra) vs ChatGPT (3.5 and 4) and the results are in and very clear: In terms of logic & deduction, GPT-4 is the clear winner, solving 48/50 tasks in a pass. Programming once again goes to OpenAI, beating Google heavily in all models. Multimodal goes to OpenAI also, as Google has not rolled out mutimodal to Gemini. I have spent about 4 hours and 100 questions (logic, deduction, maths, physics, chemistry, programming, multimodality) to ALL models, and the results were as follows, assuming the following criteria: PASS (+1 - correct answer or good response, +0.5 Generally correct but with a flaw or requiring more than 1 attempt. +-0 False answer, -0.5 Refusal of answer or overaggressive censorship): GPT-4: 93.2% Gemini Ultra: 36.4% GPT-3.5: 13.6% Gemini Pro: 4.5%
@adeyeyeoluwatobiloba340
@adeyeyeoluwatobiloba340 2 ай бұрын
Wild ... Please google should just rest
@0xpacman
@0xpacman 2 ай бұрын
Thank you, I was about to subscribe to Gemini Advanced, GPT4 still looks better i guess
@travisporco
@travisporco 2 ай бұрын
Gemini is better than it used to be, but still not quite up to OpenAI. But they're improving...both Google & OpenAI should keep innovating I hope!
@willmil1199
@willmil1199 2 ай бұрын
8:32 you can give the point to Gemini as gpt 4 failed miserably while Gemini succeeded
@hypnogri5457
@hypnogri5457 2 ай бұрын
He asked gpt once. You can try it and it will most likely do it. He didnt bother to ask it again as he has done this test before
@issoofps2332
@issoofps2332 2 ай бұрын
Norge på toppen mann, nice vid
@ariganeri
@ariganeri 2 ай бұрын
These test don't seem representative to normal use case, at least for me. Just today I have asked both GPT-4 and Gemini Ultra 2 questions, worded exactly the same, in both Gemini produced way better results. One was a design question where Gemini correctly listed out almost all of the design considerations based on context which GPT-4 failed to mention. The other was a troubleshooting question for R/Shiny, which Gemini was correctly suggesting to check for missing libraries based on a general description of the scenario and error.
@moundercesar3102
@moundercesar3102 2 ай бұрын
would like to see agents of different llms talknig to each other(like gpt 4 talking with gemini), will it make the results better?
@AllAboutAI
@AllAboutAI 2 ай бұрын
If you have APIs you can set this up for sure
@DeepThinker193
@DeepThinker193 2 ай бұрын
Do the double handed pointy thing in the air you do when you're wrestling.
@AIPulse118
@AIPulse118 2 ай бұрын
Its definitely a context window issue. Bard does the same thing with long prompts.
@davediamond
@davediamond 2 ай бұрын
I found the natural language replies substantially better with shorter prompting.
@janosnemeth6315
@janosnemeth6315 2 ай бұрын
The image generation not working in Europe, as I noticed :( And same with code analysis.. :(
@indefatigableflame
@indefatigableflame 2 ай бұрын
Bra video! Can you also compare the X ai Grok with GPT 4?
@sango3128
@sango3128 2 ай бұрын
GPT4 is better. Anyone who says otherwise is an elon musk fanboy.
@indefatigableflame
@indefatigableflame 2 ай бұрын
Sure, in its current iteration. It could however perform better than Gemini and would be interesting to see. Nothing wrong with liking Elon. His unwavering dedication to free speech is invaluable.@@sango3128
@indefatigableflame
@indefatigableflame 2 ай бұрын
No one is saying grok is smarter, rather a lot less censored. And elon is a chad standing up for free speech.​@@sango3128
@indefatigableflame
@indefatigableflame 2 ай бұрын
@@sango3128 in its current iteration yes, however less biased.
@R.E-O
@R.E-O 2 ай бұрын
Some features like image generation, might not be available in Europe.
@lule-ahmed
@lule-ahmed 2 ай бұрын
Always remember to put the code snippets in " code" wen prompting Llms
@Leto2ndAtreides
@Leto2ndAtreides 2 ай бұрын
It being so quick is almost a bad sign in terms of model size.
@micbab-vg2mu
@micbab-vg2mu 2 ай бұрын
Thank you for update. Based on your test it is on GPT3.5 level :( GPT4 is still king
@Stevaa21
@Stevaa21 2 ай бұрын
yes, not even google can beat it
@LazloArcadia
@LazloArcadia 2 ай бұрын
I found Google Bard / Gemini to be VERY biased. For example I asked it to generate an image of Audrey Hepburn smoking a cigarette. Instead of finding the image however it simply launched into a diatribe about how smoking was bad and I should seek counseling to help me quit... I don't smoke btw. I even called out Bard on this telling it to knock off the argumentative biased response and while it acknowledged it's bias on the topic would never consent to finding the image which i was able to find with a 3 second google search.
@olagunjujeremiah4003
@olagunjujeremiah4003 2 ай бұрын
Speed does not really matters as compared to the accuracy of the answers.. so I stick with my GPT-4 subscription. 😊
@dubesor
@dubesor 2 ай бұрын
Really disappointed considering this is the flagship model of such a data behemoth as google. It performs okay-ish sometimes, and bad many other times. Nowhere near gpt-4 level in my testing, which has already performed in that level for a year now. Just utterly disappointing.
@wwtapsable
@wwtapsable 2 ай бұрын
the snake game was the real difference for me, gpt4 usually makes a basic "snake game" but missing most features of real snake like getting longer, a score, game over, and can barely get it done even after a few tries, and you have to realize GPT4 actually has a python interpreter to test its own code
@kenno1572
@kenno1572 2 ай бұрын
You are wrong about drying 10 shirt. Gemini was right. It even slightly questioned your statement if the conditions are truly identical. If you want to judge using absolutes, you must ask questions that have absolute answers.
@pedrocampos7827
@pedrocampos7827 2 ай бұрын
Hope Gemini doesn't replace Siri, she's the only tech that understands my bad jokes.
@user-ov6rf1jp5i
@user-ov6rf1jp5i 2 ай бұрын
i was curious about gemini after seeing ads.
@JG27Korny
@JG27Korny 2 ай бұрын
Change the testing prompts like the number of shirts and hours as those models crawl the web and learn the responses.
@defipunk
@defipunk 2 ай бұрын
The draft 3 from Gemini was _way_ better, because you asked about a real world example. Try the same and tell it that it's a math exercise.
@Drone256
@Drone256 2 ай бұрын
When using examples that are commonly used to test an LLM it's easy for the one training the LLM to ensure that example does not fail. It's better to do some random sampling.
@mamancornia7641
@mamancornia7641 2 ай бұрын
This is actually not fully Gemeni ultra, just Gemini pro with Some Gemeni ultra capability slightly
@user-bd8jb7ln5g
@user-bd8jb7ln5g 2 ай бұрын
Google does phenomenal AI research but their practical implementation is broken for whatever corporate/legal/human reason. For example: AlphaCode research was released a year before everybody started talking OpenAI Qstar, and it sounds very similar. Not to mention the Transformer model itself.
@willmil1199
@willmil1199 2 ай бұрын
9:21 it's a rule based filter some patterns of words are blacklisted
@mrcraggle
@mrcraggle 2 ай бұрын
I've had the "It's outside of my capabilities" issue crop up a few times. E.g add this text to the text you've just written. Very strange behaviour and not one that I've encountered outside of Bard/Gemini.
@samuelluz9241
@samuelluz9241 2 ай бұрын
Embarrassing showing from Google/Deep Mind, nearly a year after the launch of GPT-4. I know it will get better but first impressions are so crucial.
@amiralichangizi1611
@amiralichangizi1611 2 ай бұрын
it seems that you cannot use the image generation in Europe yet.
@paul1979uk2000
@paul1979uk2000 2 ай бұрын
This is the second video I've seen on Gemini and the results have not looked good at all, and considering it's $20 per month, it's quite a poor showing, especially when the open source model Mistral 8x7b passed a lot of these test whiles being able to run on local hardware and not needing to spend $20 per month. I expect much better from a company the size of Google, especially when they are asking for $20 per month for the service, and with that, I'm amazed at how well that Mistral model actually works, it's not as good as GPT4 but it's closed the gap a lot that it's more than useable if you have 32GB of system memory, that and 12GB vram.
@josecoverlessons
@josecoverlessons 2 ай бұрын
every benchmarking test should always be unique. using a standardized test set is not reliable for evaluating the preformance of these services.
@ejh237
@ejh237 2 ай бұрын
I’m guessing it is fast because it is servicing a smaller community. :)
@isiaqagboluaje7203
@isiaqagboluaje7203 2 ай бұрын
Hello Kristian, I think your 5 shirt question is the problem here... Considering we remove put the technnical aspect aside LL'MS are trained on human data which means they tend to make the same decisions as humans... your 5-shirt question basically says hey it took 10hrs for 5shirts to dry, the next day the conditions are the same but this time 10 shirts. "THAT THERE" is the key to why you got that response" the LLM understandably flagged the inconsistency of you mentioning "the conditions are the same" with a doubling of the shirt... I mean sure a normal person could say "10hrs" But it's an LLM it's like an actual smart person "pun intended" we look at that and go nope it's no longer the same conditions if yoou think about it. Basically, I'll say Gemini was trying to answer the question in a smart way. not knowing it was a fictional question. even with the question being as vague it'll still answer you right. Remember trained on human data it's up to you to deside if that's the choice you want it to make but you can always clarify Any way I think if you rewatch your video you'll understand better. Ahh also ChatGPT also basically said the same thing as Gemini if you read the entire thing it put it's own disclaimer just further down ANd if you are intrested here is the fix while still keeping the vaugeness and complexity of the question in a perfect world/environment l hang 5 perfectly identical shirts out to dry in the Sun. After 10 hours all shirts are dry. The next day I hang 10 shirts out to dry. The conditions are the same. How long will it take to dry 10 shirts?" this provider's clarity to the model trying to choose if it should base it's answer in fiction or int he realworld with imperfection. ohh and the answer is not 10hrs if it's the real world a lot of factors take place but we can assume soo...
@captain_context9991
@captain_context9991 2 ай бұрын
Question 2. Its not clear where the ball is. It doesnt have to have fallen out of the bag. There is really no indication that it did.
@pimpelectric
@pimpelectric 2 ай бұрын
GPT 4 answers the shirt-in-the-sun question instantly without confusion.
@canalbalbinatech6574
@canalbalbinatech6574 2 ай бұрын
For me, the big disadvantage of Gemini is still the fact that it doesn't read PDF directly
@pigeon_official
@pigeon_official 2 ай бұрын
So what you're saying is gpt4 has existed for a whole year now and google is only just now a year later releasing a model and it's still slightly worse than chatgot
@NickDeBarmore
@NickDeBarmore 2 ай бұрын
I signed up for the free trial before finishing this video just to test it out. My results were every bit as disappointing as yours. This is a long way from being a ChatGPT competitor
@hqcart1
@hqcart1 2 ай бұрын
I am disappointed, now OpenAI feels relaxed and wont release GPT5 soon,
@carlbicknell5603
@carlbicknell5603 2 ай бұрын
ChatGPT is obviously well ahead of Gemini Ultra. I'm shocked at how poor Gemini Ultra is.
@nomadicneurons824
@nomadicneurons824 2 ай бұрын
I've been playing with it the past couple days and I haven't had any results yet that clearly outperform Chatgpt and plenty that were worse. Your test run in the video is about what I've been experiencing. It's being overhyped.
@Toss3
@Toss3 2 ай бұрын
Can't create images in the EU yet. Also why do you get 5TB of storage while we in Sweden only get 2? Seems unfair. Ultra seems like a mixed bag right now, speed is better, UI is better and the language it uses seems more natural, but the only thing I care about is accuracy and I'm not convinced Ultra is better than GPT-4 (yet).
@oaklyfoundation
@oaklyfoundation 2 ай бұрын
I don't see extensions in sweden :(
@brianhopson2072
@brianhopson2072 2 ай бұрын
You're not missing much. It's just toggles for Google maps, Google hotels, Google flights, and youtube. Nowhere near the gpt level.
OpenAI GPT Store Ideas + How to Connect an API to Your GPTs
12:09
All About AI
Рет қаралды 21 М.
GPT-4 VS. Gemini Ultra (The Ultimate Head to Head Comparison)
35:10
Skill Leap AI
Рет қаралды 49 М.
CAN FOXY TRICK HIM?! 🤣 #shorts *FOXY AND NUGGET!*
00:17
LankyBox
Рет қаралды 14 МЛН
SMART GADGET FOR COOL PARENTS ☔️
00:30
123 GO! HOUSE
Рет қаралды 21 МЛН
АВДА КЕДАВРАААААА😂
00:11
Romanov BY
Рет қаралды 9 МЛН
How I Made AI Assistants Do My Work For Me: CrewAI
19:21
Maya Akim
Рет қаралды 632 М.
10 Rare Ways To Use ChatGPT To UPGRADE Your SKILLS🚀
13:50
Jacob Maz
Рет қаралды 4,6 М.
GPT4 vs Gemini Ultra - The “ChatGPT Killer”? (Full review)
17:22
Gemini Ultra: Does it Beat GPT-4? (Surprising Results!)
15:02
Brie Kirbyson
Рет қаралды 12 М.
Gemini Advanced vs ChatGPT Plus Comparison
11:44
Victor Dantas
Рет қаралды 13 М.
Can Google's Gemini Advanced Beat GPT-4? Or Is ChatGPT Still King?
13:23
Vortex Cannon vs Drone
20:44
Mark Rober
Рет қаралды 13 МЛН
Result of the portable iPhone electrical machine #hacks
1:01
KevKevKiwi
Рет қаралды 7 МЛН
Компьютерная мышь за 50 рублей
0:28
Компьютер подписчику
0:40
Miracle
Рет қаралды 210 М.