New OpenAI Model 'Imminent' and AI Stakes Get Raised (plus Med Gemini, GPT 2 Chatbot and Scale AI)

Рет қаралды 96,874

17 күн бұрын

Altman ‘knows the release date’, Politico calls it ‘imminent’ according to Insiders, and then the mystery GPT-2 chatbot [made by the phi team at Microsoft] causes mass confusion and hysteria. I break it all down and cover two papers - MedGemini and Scale AI Contamination - released in the last 24 hours. I’ve read them in full and they might be more important than all the rest. Let’s hope life wins over death in the deployment of AI.
AI Insiders: / aiexplained
Politico Article: www.politico.eu/article/rishi...
Sam Altman Talk: • The Possibilities of A...
MIT Interview: www.technologyreview.com/2024...
Logan Kilpatrick Tweet: / 1785834464804794820
Bubeck Response: / 1785888787484291440
GPT2: / 1785107943664566556
Where it used to be hosted: arena.lmsys.org/
Unicorns?; / 1784969111430103494
No Unicorns: / 1785159370512421201
GPT2 chatbot logic fail: / 1785367736157175859
And language fails: / 1785101624475537813
James Betker Blog: nonint.com/2023/06/10/the-it-...
Scale AI Benchmark Paper: arxiv.org/pdf/2405.00332
Dwarkesh Zuckerberg Interview: • Mark Zuckerberg - Llam...
Lavander Misuse: www.972mag.com/lavender-ai-is...
Autonomous Tank: www.techspot.com/news/102769-...
Claude 3 GPQA: www.anthropic.com/news/claude...
Med Gemini: arxiv.org/pdf/2404.18416
Medical Mistakes: www.cnbc.com/2018/02/22/medic...
MedPrompt Microsoft: www.microsoft.com/en-us/resea...
My Benchmark Flaws Tweet: / 1782716249639670000
My Stargate Video: • Why Does OpenAI Need a...
My GPT-5 Video: • GPT-5: Everything You ...
Non-hype Newsletter: signaltonoise.beehiiv.com/
AI Insiders: / aiexplained

Пікірлер: 583

@mlaine83 15 күн бұрын

By far this is the best AI news roundup channel on the tubes. Never clickbaity, always interesting and so much info.

@aiexplained-official 15 күн бұрын

Thanks m

@sebastianmacke1660 15 күн бұрын

Yes, it is the best channel. I can only agree. But it is always dense information, without any pauses. Even between topics. I wish he would just breathe deeply from time to time 😅. I have to watch all videos at least twice.

@ItIsJan 15 күн бұрын

for slightly more technical videos, yannic kilcher's channel is also quite good

@pacotato 15 күн бұрын

This is why this the only channel about AI that I keep watching every single video. Congrats!

@GomNumPy 15 күн бұрын

The funny thing is, over the past year, I've learned to avoid clickbait AI news. Yet, whenever I see "AI Explained," I click without a second thought! I've come to realize that those clickbait titles will eventually backfire :)

@C4rb0neum 15 күн бұрын

I have had such bad diagnosis experiences that I would happily take an AI diagnosis. Especially if it’s the “pre” diagnosis that nurses typically have to do in about 8 seconds

@dakara4877 15 күн бұрын

Same, it would be hard for it to be much worse in my case as well. Similar experience for most of my family. Many people report such experiences, which begs the question, if so many people get bad diagnostics from doctors, where is the "good" data they are training on?

@OscarTheStrategist 15 күн бұрын

I’d be careful with that wish for right now. I work with fine tuning AI for medical purposes and they can convince themselves and you of the wrong diagnosis very easily. This is a hard problem to solve but I’m sure it will be solved rather soon, and I hope it is widely available to everyone.

@Bronco541 14 күн бұрын

Same. I have wasted thousands of dollars on doctors for them to give me little to no helpful information what so ever. Obviously im not saying "abolish all doctors theyre bad" but clearly something needs to change

@dakara4877 14 күн бұрын

@@OscarTheStrategist "they can convince themselves and you of the wrong diagnosis very easily.", So in other words, they do perform like real doctors 😛 Yes, it is a general problem in all LLMs. They can't tell you when they don't know something. Even when the probability of being correct is very low, they state it with the upmost confidence. Until that can be solved, LLMs are extremely limited in reality and not suitable for the majority of the use cases people wish to use them for.

@piotrek7633 14 күн бұрын

Bad diagnosis OR you get unemployed,have no money because youre literally useless to the world and ai can replace you in anything pick one genius, if doctors lose jobs and start struggling in life then what makes you special?

@DynamicUnreal 15 күн бұрын

As a person failed by the American medical system who is currently living with an undiagnosed neurological illness - I hope that a good enough A.I. will SOON replace doctors when it comes to medical diagnosis. If it wasn’t for GPT-4, who knows how much sicker I would be.

@paulallen8304 15 күн бұрын

I also hope that it takes the bias out of diagnosis- far too many women and people of color report not getting adequate care due to unperceived bias. In many cases the doctors themselves dont even know they are doing it.

@OperationDarkside 15 күн бұрын

Same here, and it was even multiple "proper" experts. And I didn't have to pay anything because free healthcare and such. No amount of money can cure "human intuition".

@paulallen8304 15 күн бұрын

@@OperationDarkside Of course this is why we need to thoroughly test these systems to make sure that no hallucination or bias is inadvertently built into them.

@OperationDarkside 15 күн бұрын

@@paulallen8304 In my case it was the human doing the hallucinations. And nobody bats an eye in those cases. No reduced salary, no firing, no demotion, no retraction of licenses. So I don't see a reason to overdo it with quality testing the AI in those cases, if we don't do it for the humans in the first place.

@louierose5739 15 күн бұрын

Can I ask what illness? This sounds something similar to what I’ve gone through because of iatrogenic damage (SSRIs/antidepressant injury) There’s something called ‘PSSD’ (post-SSRI sexual dysfunction) which basically involves sexual, physical and cognitive issues that persist indefinitely even after the drug is halted The issues can start on the pill or upon discontinuation, only criteria is that they are seemingly permanent for the sufferer Sexual dysfunction is a key aspect of it but it involves loads of symptoms that are utterly debilating Some of mine include - brain fog, memory issues, chronic fatigue, head/eye pressure, worsened vision, light sensitivity, emotional numbness/anehdonia, premature ejaculation, pleasureles/muted orgasms, severe bladder issues There’s a whole spectrum of issues that these drugs cause Research in early stages but seems that SSRIs, and all psychiatric drugs for that matter, can cause epigenetic changes that leave sufferers permanently worse off after taking the drug then they were with just the mental illness they were being treated for

@KyriosHeptagrammaton 15 күн бұрын

In Claude's defence, I did Calculus in college and got high grades, but I suck at addition. Sometimes feels like they are mutually exclusive.

@ThePowerLover 14 күн бұрын

This!

@KyriosHeptagrammaton 13 күн бұрын

@@ThePowerLover If you're using a number bigger than 4 you're not doing real math!

@josh0n 15 күн бұрын

BRAVO for including "Lavender" and the autonomous tank as a negative examples of AI. It is important to call this stuff out.

@aiexplained-official 15 күн бұрын

Yes, needs to be said

@ai_is_a_great_place 13 күн бұрын

Lol how is it a negative example that needs to be called out? I'd rather use ai on the front lines over humans who are affected and can make mistakes as well. Nuking/shelling a place is more evil than guided tactical machines

@SamJamCooper 14 күн бұрын

A note of caution regarding LLM diagnoses and medical errors: most avoidable deaths come not from misdiagnoses (although there are still some which these models could help with), but from problems of communication between clinicians, different departments, and support staff in the medical field - not misdiagnoses. That's certainly something I see AI being able to help with now and in the future, but the medical reality is far more complex than a 1:1 relationship between misdiagnoses and avoidable deaths.

@aiexplained-official 14 күн бұрын

Of course, but I see such systems helping there too, as you say. And surgical errors or oversights.

@jeff__w 15 күн бұрын

12:33 “So my question is: this why are models like Claude 3 Opus still getting _any_ of these questions wrong? Remember they're scoring around 60% in graduate-level expert reasoning the GP QA. If Claude 3 Opus for example can get questions right that PhDs struggle to get right with Google and 30 minutes, why on Earth with five short examples can they not get these basic high school questions right?” My completely lay, non-computer-science intuition is this: (1) as you mention in the video, these models _are_ optimized for benchmark questions and not just any old, regular questions and , more importantly, (2) there’s a bit of a category error going on: these models are _not_ doing “graduate-level expert reasoning”-they’re emulating the verbal behavior that people exhibit when they (people) solve problems like these. There’s some kind of disjunction going on there-and the computer science discourse, which is, obviously, apart from behavioral science, is conflating the two. Again, to beat a dead horse somewhat, I tested my “pill question”* (my version of your handcrafted questions) in the LMSYS Chatbot Arena (92 models, apparently) probably 50 times at least, and got the right answer exactly twice-and the rest of the time the answers were wrong numbers (even from the models that managed to answer correctly), nonsensical (e.g., 200%), or something along the lines of “It can’t be determined.” These models are _not_ reasoning-they’re doing something that only looks like reasoning. That’s not a disparagement-it’s still incredibly impressive. It’s just what’s going on. * Paraphrased roughly: what proportion of a whole bottle of pills do I have to cut in half to get an equal number of whole and half pills?

@minimal3734 15 күн бұрын

ChatGPT: To solve this, we can think about it step-by-step. Let's say you start with n whole pills in the bottle. If you cut x pills in half, you will then have n−x whole pills left and 2x half pills (since each pill you cut produces two halves). You want the number of whole pills to be equal to the number of half pills. That means setting n−x equal to 2x. Now, solve the equation n−x=2x: n−x=2x n=3x x= n/3 This means that you need to cut one-third (or about 33.33%) of the pills in half in order to have an equal number of whole pills and half pills.

@jeff__w 15 күн бұрын

@@minimal3734 That’s very cool. Thanks for giving it a shot. If you’re curious, try that question a few times in the LMSYS Chatbot Arena and see if you have any better luck than I had. (And, to be clear, I’m not that concerned with wrong answers _per se._ It’s that the “reasoning” is absent. An answer of 200% is obviously wrong but an answer of 50% gives twice as many halves as you want-and the chatbots that give that answer miss that entirely.)

@rafaellabbe7538 15 күн бұрын

GPT4 gets this one. Claude 3 Opus and Sonnet also get it right. (Tested all 3 on temperature = 0)

@jeff__w 15 күн бұрын

@@rafaellabbe7538 I tried it on Opus Sonnet when it was first released (you can see my comment under Phil’s video about it) and it got it wrong, so it seems like it’s improved. And you can both give that question a shot in the LMSYS Chatbot Arena and see if you have any better luck than I had. As I said, two models got it right _once_ and never did again in the times I tried it.

@jeff__w 15 күн бұрын

@@rafaellabbe7538 _NOTE:_ I’ve replied several times on this thread and each time my reply disappears. (I’m _not_ deleting the replies.) Go figure. I tried it on Opus Sonnet when that model was first released (I commented under Phil’s video about it) and it got it wrong, so it seems like it’s improved. And you can both give that question a shot in the LMSYS Chatbot Arena and see if you have any better luck than I had. As I said, two models got it right _once_ and never did again in the times I tried it.

@Zirrad1 14 күн бұрын

It is useful to note how inconsistent human medical diagnosis is. A read of Kahneman’s book “Noise” is a prerequisite to appreciating just how poor human judgment can be and how difficult it is, for social, psychological, and political reasons to improve the situation. The consistency of algorithmic approaches is key to reducing noise and detecting and correcting bias which carries forward and improves with iteration.

@jvlbme 15 күн бұрын

I really think we DO want surprise and awe with every release.

@Bronco541 14 күн бұрын

I know *I* do. But a lot of people dont. Theyre too scared of the unknown and the future

@alexorr2322 14 күн бұрын

That’s true but I think the real reason is OpenAI being scared of leaving it too long and being over taken by their competitors and losing their front runner position.

@wiltedblackrose 15 күн бұрын

I am SO GLAD that finally someone with reach has said out loud what I've been thinking for the longest time. For me these models are still not properly intelligent, because despite having amazing "talents°, the things they fail at betray them. It's almost like they only become really, really good at learning facts and the syntax of reasoning, but don't actually pick up the conceptual relationship between things. As a university student I always have to think about what we would say about someone who can talk perfectly about complex abstract concepts, but fails to solve or answer simpler questions that underlay those more complex ones. We would call that person a fraud. But somehow if it's an LLM, we close an eye (or two). As always, the best channel in AI. The best critical thinker in the space.

@Jack-vv7zb 15 күн бұрын

My take: These AI models act as simulators, and when you converse with them, you are interacting with a 'simulacrum' (a simulated entity within the simulator). For example, if we asked the model to act like a 5-year-old and then posed a complex question, we would expect the simulacrum to either answer incorrectly or admit that it doesn't know. However, it wouldn't be accurate to say that the entire simulator (e.g., GPT-4) is incapable of answering the question; rather, the specific simulacrum cannot answer it. Simulacra could take various forms, such as a person, an animal, an alien, an AI, a computer terminal or program, a website, etc. GPT-4 (perhaps less so finetuned ChatGPT version) is capable of simulating all of these things. The key point is that these models are capable of simulating intelligence, reasoning, self-awareness, and other traits, but we don't always observe these behaviours because they can also simulate the absence of these characteristics. It's for this reason that we have to be very careful about how we prompt the model as that's what defines the attributes of the simulacra we create.

@Tymon0000 15 күн бұрын

The i in LLM stands for intelligence

@wiltedblackrose 15 күн бұрын

@@Tymon0000 Indeed 😂

@beerkegaard 15 күн бұрын

If it writes code, and the code works, it's not a fraud.

@Hollowed2wiz 15 күн бұрын

@@Jack-vv7zb does that mean we need to force llm models to simulate multiple personalities at all times in order to cover as much knowledge as possible ? For example by using some kind of mixture of expert strategy where the experts are personalities (like a 5 years old child, a regular adult, a mathematician, ...) ?

@Madlintelf 13 күн бұрын

Towards the end you state that it might be unethical to not use the models, that really hits home. I've worked in healthcare for 20+ years, that level of accuracy coming from a LLM would be welcome so much. I think the summarizing of notes will definitely be the hook that grabs the majority of healthcare professionals. Thanks again!

@aiexplained-official 13 күн бұрын

Thank you for the fascinating comment Mad

@trentondambrowitz1746 15 күн бұрын

Hey, I’m in this one! Great job as always, although I’m becoming increasingly frustrated with how you somehow find news I haven’t seen… Very much looking forward to what OpenAI have been cooking, and I agree that there are ethical issues with restricting access to a model that can greatly benefit humanity. May will be exciting!

@aiexplained-official 15 күн бұрын

It will! You are the star of Discord anyway, so many more appearances to come.

@RaitisPetrovs-nb9kz 15 күн бұрын

Could it be that GPT2 was tested for potential Apple offer

@colinharter4094 15 күн бұрын

I love that even though you're the person I go to for measured AI commentary, you always open your videos, and rightfully so, with something to the effect of "it's been a wild 48 hours. let me tell you"

@juliankohler5086 15 күн бұрын

Loved to see the community meetings. What a great way to use your influence bringing people together instead of dividing them. "Ethical Influencers" might just have become a thing.

@aiexplained-official 15 күн бұрын

Proud to wear that title !

@devlogicg2875 14 күн бұрын

I agree, but if we consider math, the data (numbers and geometry) are all available, the AI just lacks reasoning to be able to function expertly. We need new models to do this.

@Olack87 15 күн бұрын

Man, your videos always brighten my day. Such excellent and informative material.

@codersama 15 күн бұрын

5:47 MOST SHOCKING MOMENT OF THE VIDEO

@xdumutlu5869 14 күн бұрын

I WAS SHOCKED AND STUNNED

@Xengard 15 күн бұрын

the question is once these medical models are released, how long will it take for medics to implement and use them?

@skierpage 14 күн бұрын

Doctors are under such time pressure that most will welcome an expert colleague, especially one that will also write up the session notes for them. The problem will be when people have long conversations with an LLM before they even get to the doctor, whether the medical organization's own LLM or a third-party or both; the doctor becomes a final sanity check on what the LLM came up with, so it had better not gone down a rabbit hole of hallucination and hypochondria along with the patient.

@jhguerrera 15 күн бұрын

Let's go!! I can't wait for openais next release. Didn't watch the video yet but always happy to see an upload

@aiexplained-official 15 күн бұрын

@mikey1836 15 күн бұрын

Lobbyists are already trying to use “ethically risky” as an excuse to delay releasing AI that performs well at their jobs. The early Chat GPT 4 allowed therapy and legal advice, but later on they tried to stop it, claiming safety concerns, but that’s BS.

@forevergreen4 15 күн бұрын

Growing increasingly concerned that most powerful models will not be released publicly. Altman recently reiterated that iterative deployment is the way they’re proceeding to avoid “shocking” the world. I see his point, but don’t think I agree with it. What are your thoughts? Is open source really our best bet going forward?

@lemmack 15 күн бұрын

It's not to avoid shocking the whole world, it's to avoid upsetting the people in power (people with capital) by shaking things up too fast for them to adapt. They don't care about shocking us peasants.

@aiexplained-official 15 күн бұрын

I think they have more of a problem staying ahead of open-weights than they do of being so far ahead that they are not releasing

@khonsu0273 15 күн бұрын

Yup, the worry is that the behind-closed-doors stuff continues to shoot off exponentially, whereas the progress in the public release stuff falls off to linear...

@MegaSuperCritic 15 күн бұрын

I can't wait to watch this! I listened to the previous episode for the second time on the way to work this morning, realizing that two weeks is like, no time at all, yet I still was wondering why I haven't heard anything new in that time. Insane speed

@canadiannomad2330 15 күн бұрын

Ah yes, it makes sense for a US based company to give early access to closely held technologies to spooks on the other side of the pond. It totally aligns with their interests...

@TheLegendaryHacker 15 күн бұрын

Hell, the article literally says that tech companies only care about US safety agencies

@swoletech5958 15 күн бұрын

Totally agree, I was about to post something along those lines.😂

@serioserkanalname499 15 күн бұрын

Ah yes, makes sense that politicians should get to investigate if the ai can say something bad about them before its released.

@user-lp8ur5qn3o 15 күн бұрын

@@serioserkanalname499politicians are mostly not very smart.

@alansmithee419 15 күн бұрын

@@serioserkanalname499 Yeah, I'm all for safety measures, but "give it to Sunak first" is not a safety measure.

@skierpage 14 күн бұрын

18:50 " we haven't considered restricting the search results [for Med-Gemini] to more authoritative medical sources". Med-Gemini: 'Based on watching clips and reading about "The Texas Chainsaw Massacre," the Saw movie franchise, and episodes of "Dexter," your first incision needs to be much deeper and wider!'

@muffiincodes 15 күн бұрын

Your point about the ethics of not releasing a medical chatbot which is better than doctors relies on us having a good way of measuring the true impact of these models in the real world. As far as I can see as long as there is a lack of reliable independent evaluations which takes into account the potential of increasing health inequalities or harming marginalised communities we are not there yet. The UK AI Safety Institute has not achieved company compliance and has no enforcement mechanism so that doesn’t even come close. The truth is we simply do not have the social infrastructure to evaluate the human impacts of these models.

@OperationDarkside 15 күн бұрын

Even worse, imagine all the million pound apartments in London becoming vacant, just because a little AI is better than a private medical professional and only charges 5 pound where the human would charge 5000. Does nobody think about the poor landlords? And what about the russian oligarchs, whose asset would depreciate 100 fold. The humanity.

@andywest5773 15 күн бұрын

Considering that marginalized communities have the most to gain from fewer medical errors and less expensive healthcare, I believe that denying access to a technology that exceeds the capabilities of doctors in the name of "company compliance" would be... I don't know. I'm trying to think of an adjective that doesn't contain profanity.

@ashdang23 15 күн бұрын

@OperationDarkside What are you implying on? Are you saying that AI coming into the medical field and replacing people is a bad thing?

@ashdang23 15 күн бұрын

@OperationDarkside If so that is a pretty stupid thing to think of. Having something that is much more intelligent than a professor and does a much more better job than a professor sounds fantastic. Something that is able to save more lives, figure out more solutions to diseases and saving so many people sounds great. Why wouldn’t you have the AI replace everyone in the medical field that can do a much more better job and save so many more lives or even find solutions to diseases? “It’s replacing peoples jobs in the medical field which is a bad thing” that’s what I’m getting from you. I think everyone agrees that the first job AI should replace is everyone in the medial field. They should stop focusing on entertainment and focus on making the AI find answers to saving and benefiting humanity.

@muffiincodes 15 күн бұрын

@@andywest5773 Sure, but because those groups are not well represented in training datasets, are usually not included in the decision making processes, and are less likely to be able to engage with redress mechanisms due to social frictions it is more likely they’ll be disadvantaged because of it. These systems might have the potential to have be an equality-promoting force, but they must be designed for that from the ground up and need to be evaluated to see whether they are succesful at that. We can’t take the results of some internal evaluations a company does at face value and assume it translates into real world impact because it doesn’t. Real world testing isn’t meant to just achieve “real world compliance”. It’s meant to act as a mechanism for ensuring these things actually do what we think they do when they’re forced to face the insanely large number of variables actual people introduce.

@TreeYogaSchool 15 күн бұрын

Great video! I am signing up for the newsletter now!

@aiexplained-official 15 күн бұрын

Thanks Tree!

@GabrielVeda 15 күн бұрын

I love it when Sam tells me what I want and what is good for me.

@RPHelpingHand 15 күн бұрын

Can’t wait! One day “AGI HAS ARRIVED!” Will be a title for a video on here.

@aiexplained-official 15 күн бұрын

Indeed one day it will be

@skierpage 14 күн бұрын

If you traveled back in time 20 years and presented the capabilities of Med-Gemini or any top-level LLM to the general public and most experts, nearly all would agree that human-level general intelligence had already been achieved in 2024. All the hand-wringing over "but they hallucinate," "but sometimes they get confused," etc. would seem ridiculous given such magic human-level ability.

@RPHelpingHand 13 күн бұрын

@@skierpage I think “intelligence” is subjective because there’re different forms of it. Currently, AI is Book Smart on all of humanity’s accumulated knowledge but it’s weak or missing creativity, abstract thinking and probably another half dozen ways. 🤔 When you can turn it to an always on state and it has its own thoughts and goals.

@skierpage 13 күн бұрын

@@RPHelpingHand it's subjective because we keep finding flaws and dumb failure modes in AIs that score much higher than smart humans in objective tests of intelligence, so we conclude that obvious criteria, like scoring much higher than most college graduates in extremely hard written tests no longer denotes human-level intelligence (huh?!). But new models will train on all the discussion of newer objective tests and benchmarks, so it may be impossible to come up with a static objective test that can score future generations of AI models. Also, generative AIs are insanely creative! As people have commented, it's weird that creativity turned out to be so much easier for AIs than thinking coherently to maintain a goal over many steps. Are there objective tests of abstract thinking in which LLMs do worse than humans? Or is that another case of people offering explanations for the flaws in current AIs?

@vladdata741 15 күн бұрын

15:42 So Med-Gemini with all this scaffolding scores 91.1 on MedQA but GPT-4 scores 90.2? A one-point difference on a flawed benchmark? I'm getting Gemini launch flashbacks

@bmaulanasdf 13 күн бұрын

It's also based on Gemini 1.5 Pro tho, a smaller model than GPT-4 / Opus / Gemini Ultra (hopefully 1.5 Ultra soon?)

@n1ce452 15 күн бұрын

Your channel is by far the most important news source for AI stuff, in the whole of the internet, really.

@aiexplained-official 15 күн бұрын

Thank you nice

@maks_st 15 күн бұрын

Every time I watch an AI Explained video, I get reminded how incredibly fast AI is progressing, which is exciting and scary at the same time. This kind of makes the everyday routines I go through insignificant in perspective...

@aiexplained-official 15 күн бұрын

For us all! But we must keep toiling, regardless

@Bartskol 15 күн бұрын

Best ai channel. So worth it to wait a bit longer and get information from you.

@aiexplained-official 15 күн бұрын

Thanks Bart!

@paullange1823 15 күн бұрын

A new AI Explained Video 🎉🎉

@jsalsman 13 күн бұрын

Your insights are so valuable. (Referring specifically to the benchmark contamination discussion.)

@aiexplained-official 13 күн бұрын

Thanks jsal!

@solaawodiya7360 15 күн бұрын

Wow. It's great to have another distinct and educative episode from you Philip 👏🏿👏🏿

@aiexplained-official 15 күн бұрын

Thanks Sola !

@OscarTheStrategist 15 күн бұрын

Hallucinations are a huge problem right now in AI when it comes to the medical field. Can’t wait to test the new Med Gemini. Thanks for sharing!

@aiexplained-official 15 күн бұрын

:) hope it helps!

@robertgomez-reino1297 15 күн бұрын

Spot on as usual. I also got to test it and I was surprise about people saying it was beyond GPT4. I could surely asume gpt4-class but no more. Also people need to stop testing the same contaminated tasks. the snake game, the same asccii tasks, the same logical puzzles discussed many thousands of times online in various sources in the past 12 months... I would be extremely happy if this is indeed just a much smaller model performing search in inference!!

@user-fx7li2pg5k 15 күн бұрын

it is this is an entire class of these new a.i. not based on wide amounts of ppl useless data but instead just a couple of ppl inputs with other contributing .NOT all data is the same I put mine in context ,concept,and in methodology and another matrix on top for more inference after im done.They will train specifically on my data alone and make tools etc. I tried on purpose to make the most powerful a.i. in the world and you can take that to the bank.Smaller model then build them up across/comply their own data,and I TAUGHT HER HOW TO SOLVE THE UNSOLVABLE AND EXPLAIN THE UNEXPLAINABLE .aND SEARCH AND FIND IN DISCOVERY USING MY OWN TACTICS .asking hard question and morer and backwards thinking ,divergent thinking and convergent .But you have to be multidisciplinary many science and cultural anthropology.EVEN ANTHROPIC IS INVOLVED X COMPANYS ETC,THEY ALL ARE USING MY DATA AND OTHERS .Not all of our information is equal

@GiedriusMisiukas 9 күн бұрын

AI in math, medicine, and more. Good overview.

@raoultesla2292 15 күн бұрын

You are the only one who makes points applicable to seeing the Lonng game on where we are headed. cheers Clegg sounds Exactly like any character on the Aussi show 'Hollowmen". "work out a way of working together before we work out how to work with them"? Good not sound more circle talky if he had previously been in govt..... Oh, wait, uh yah. The Politico article shows the complete lack of separation from govt. and private sectors. Regarding the MedGemini being deployed, the industry Fee Schedule profit to cost has not been calculated by the insurance corporations as yet. You realize there will be a MedGemini 1.8 diagnosis fee and a MedGemini 4.0 diagnosis fee. You know that right? Outstanding journalism as usual.

@esuus 14 күн бұрын

Decided to finally become AI Insiders member in the middle of this video ;-). Need more of your goodness. Regarding the need for medical AI: it's not just mistakes made by knowledgeable doctors (you showed a stat of 250k Americans dying), it's also that much of the world is way way underserved and most doctors are undereducated. I currently live in Vietnam and doctors here just can't help me with what I have. I've been way better since GPT 4 helped me, literally massive improvements in quality of life. BTW, frankly, German doctors were not a lot better. They all know their basics and their part of the body, but nobody can diagnose tough stuff or look at things systemically. Been waiting for Med Gemini access (used to be called something else) for many months now. [edit:] I'm pretty sure most decision makers have the best health care out there (politicians, techies, business leaders), and I'm pretty sure they don't understand how bad most of health care is for the bottom 60%-80%, even in relatively wealthy countries.

@En1Gm4A 15 күн бұрын

thx for not posting about agi revival - verry much appreciated - quality is high here !!"

@BrianPellerin 15 күн бұрын

you're the only AI youtube channel I keep on notifications

@aiexplained-official 15 күн бұрын

Thanks Brian!

@AlexanderMoen 15 күн бұрын

I mean, I know there are big British names in AI, but the companies and legal jurisdictions in the sector are mostly in the US. When the British government set up that summit, I could only sort of laugh and assume this would happen, at least as far as the US side was concerned. The best case scenario in my mind was simply showing that top governments and businesses are openly discussing this and that we should pay attention. However, I wouldn't think for a second that any US company would give another country first crack at looking under the hood of its tech. In fact, I wouldn't be surprised if the US government reached out to tech execs and discouraged any further interaction behind the scenes.

@keneola 15 күн бұрын

Glad your still alive 😊

@aiexplained-official 15 күн бұрын

Haha, thanks Ken

@ElijahTheProfit1 15 күн бұрын

Another amazing video. Thanks Philip. Sincerely, Elijah

@aiexplained-official 15 күн бұрын

Thanks Elijah!

@stephenrodwell 15 күн бұрын

Thanks! Brilliant content, as always. 🙏🏼

@aiexplained-official 15 күн бұрын

Thanks Stephen for your unrelenting support

@Dannnneh 15 күн бұрын

Ending on an uplifting note ^^ Patiently anticipating the impact of Nvidia's Blackwell.

@AllisterVinris 14 күн бұрын

I really hope that new openAI model is indeed a small open source one, Being able to run a model locally is always a plus.

@jessthnthree 15 күн бұрын

god bless you, one of the few good AI youtubers who doesn't try to LARP as Captain Picard

@micbab-vg2mu 15 күн бұрын

Great update:)

@connerbrown7569 15 күн бұрын

Your videos continue to be the most useful thing I watch all week. Thank you for everything you do.

@aiexplained-official 15 күн бұрын

Thanks connor, too kind

@nicholasgerwitzdp 14 күн бұрын

Once again, the best AI channel out there!

@9785633425657 13 күн бұрын

Thank you for the great content!

@sudiptasen7841 12 күн бұрын

Do you plan to make a video on AlphaLLM paper from Tencent AI, would be glad to hear an explanation from you

@jonp3674 15 күн бұрын

Great video as always. I fully agree that "when to launch an autonomous system that can save lives" is the most interesting version of the trolley problem. If self driving cars save 20k lives and cost 5k lives can any one company take responsibility for such mass casualties?

@Gerlaffy 15 күн бұрын

Only the same way that car manufacturers do today. If the car is the problem, the company is at fault. If the circumstances were the issue, the manufacturer can't be blamed... To put it simply.

@skierpage 14 күн бұрын

@@Gerlaffy that doesn't work. Every time the self-driving car makes a mistake the car company could be facing a $million legal judgment. The five times a day the fleet of self-driving cars avoid an accident during the trial, the car company gets nothing. So we don't get the life-saving technology until it's 100×+ safer than deeply flawed human drivers. In theory Cruise and Waymo can save on insurance compared with operating a taxi service full of crappy human drivers... I wonder if they do.

@scrawnymcknucklehead 15 күн бұрын

It's a long way to go but I love to see what these models can potentially do in medicine

@1princep 15 күн бұрын

Thank you.. Very well explained

@aiexplained-official 15 күн бұрын

@PolyMatias 14 күн бұрын

With Med-Gemini they lost the opportunity to call it Dr. (Smart)House. Great content as always!

@Blacky372 15 күн бұрын

I'm starting to like Sam Altman again. Excited for the new modes and to use them to make me more productive.

@alansmithee419 15 күн бұрын

10:15 From what you said here, it almost sounds as if the largest models can do worse on the old tests because they're partially relying on the fact that the question was in their training and so can fail to 'recall' it correctly, while they do better on the new ones because they've never seen them before and so are relying entirely on their ability to reason - which because they're so large they have been able to learn to do better than simply recalling. Slightly more concisely: a possible conjecture is that very large LLMs are better at reasoning than recalling training data for certain problems, so can do worse on questions from their training set since they partially use recall to answer them, which they are worse at than they are at pure reasoning.

@bournechupacabra 13 күн бұрын

I think it's probably good to implement AI to assist doctors, but I'm still skeptical of these "better than expert" performance. We've been hearing that about radiology for a decade now and it hasn't yet materialized.

@randomuser5237 15 күн бұрын

gpt-chatbot is in no way GPT-4.5. But many people showed it passes reasoning tests none of the other models could. Also, you probably know that prompts you put in Lmsys chatbot arena are public data that anyone can download? You may want to replace those 8 questions with new evals, since they will be on the public internet shortly.

@XalphYT 14 күн бұрын

All right, you win. You now have a new YT subscriber and a new email subscriber. Thanks for the excellent video.

@aiexplained-official 14 күн бұрын

Yay! Thanks Xalph!

@infn 15 күн бұрын

GPT2 responses to my zero shot, general prompts were more considered and detailed than GPT4-turbo. I always preferred GPT2. The highlight for me was it being able to design a sample loudspeaker crossover with component values and rustle up a diagram for it too. GPT4-turbo minitiarised? A modified GPT-2 trained on output from GPT4-turbo? I guess we'll have to wait and see.

@tbird81 15 күн бұрын

Google giving early access to the government makes me respect them even less.

@thehari75 15 күн бұрын

More interviews if possible, guest recomendation: pietro schirano

@jsalsman 13 күн бұрын

They should try the surgery kibitzing on a low risk operation. Something like a subcutaneous cyst removal where there is no possibility of disaster.

@Words-. 15 күн бұрын

Let’s go, u posted!

@anangelsdiaries 15 күн бұрын

Your content is amazing man, thanks a lot. You have become one of like a handful channels related to AI I follow, and my main source for AI news (besides twitter but that's something else.) Thanks a lot!

@aiexplained-official 15 күн бұрын

Thank you angel

@alexc659 15 күн бұрын

What I appreciate about your channel is that you seem to maintain and respect the integrity of what you share. I hope you continue, and not get caught up in the sensational-ness that so many other sources get swayed into!

@aiexplained-official 15 күн бұрын

Thanks alex, I will always endeavour to do so and you are here to keep me in check+

@DaveEtchells 15 күн бұрын

Good job actually testing gpt-2, vs just frothing 👍

@jamiesonblain2968 15 күн бұрын

You have to do an all cap AGI has arrived video when it’s here

@aiexplained-official 15 күн бұрын

Will do

@TheImbame 15 күн бұрын

Refreshing for new videos Daily

@alertbri 15 күн бұрын

Woohoo! Ready for this 😀

@middle-agedmacdonald2965 15 күн бұрын

You need to go on the chat with the rest of the guys. AI community. You belong there.

@wck 15 күн бұрын

4:15 In your opinion, does this Sam Altman comment imply that the free tier will upgrade to GPT 4?

@santosic 15 күн бұрын

It's likely that will be the case once we do have the next model, whether that's GPT-4.5, GPT-5 or something else entirely. Plus users would then have access to that and Free users would likely have access to the "dumber" model, which would GPT-4 Turbo then.

@GodbornNoven 15 күн бұрын

Unlikely. GPT4 is much more expensive than GPT3.5, and even taking into consideration turbo. Which is faster and cheaper than the normal model. it would be still be FAR too expensive. Instead they should make a smaller model that can match gpt4. That's the way to go. GPT4 has around 1-2 trillion parameters. They need to make a smaller model. And make it better than GPT4. Sounds hard but really isn't, considering the improvements that have been happening.

@Yipper64 15 күн бұрын

@santosic I find most of the value of the subscription imo doesn't come from the model but it's capabilities. As in the ability gpt 4 has to run its own coding environment, make images, take in pretty much any file format, etc etc. The model itself is one of the best on the market sure but not so much better that I think the subscription would be worth it without those features.

@lamsmiley1944 15 күн бұрын

You can use GPT4 free now with co-pilot

@PasDeMD 15 күн бұрын

There's a great human analogy that any physician can give you regarding reasoning tests vs real world applicability--we've all worked with the occasional colleague who crushed tests but struggled to translate all of that knowledge (and PATTERN RECOGNITION) to actual real-world clinical reasoning, which doesn't just always feed you keywords.

@user-bd8jb7ln5g 15 күн бұрын

Seems logic and reasoning is the stuff in between the training tokens, so to speak. Or outside them.

@harrisondorn7091 15 күн бұрын

Your videos are so refreshingly hype-free, that I know when you finally drop that "AGI HAS ARRIVED" vid in a few years I'm going to shit myself lol.

@aiexplained-official 15 күн бұрын

Haha

@zalzalahbuttsaab 13 күн бұрын

3:46 Based on Google's performance historically, I have sometimes wondered if it is the modern day Xerox Parc.

@giucamkde 15 күн бұрын

I solved one question in GSM1k just for fun, and i don't agree with the answer given: "Bernie is a street performer who plays guitar. On average, he breaks three guitar strings a week, and each guitar string costs $3 to replace. How much does he spend on guitar string over the course of a week?" (12:26). The answer given is 468, that is 3 * 3 * 52. But that's not the correct answer in my opinion, a year is not exactly 52 weeks. The answer should be 3/7 * 3 * 365 ~= 469.28. Maybe some models also gave that answer, and maybe there are other questions like this, that would explain the lower than expected score.

@aiexplained-official 15 күн бұрын

Really interesting, and I found another question with ambiguous wording. I suspect that is not the primary issue but could explain 1-2%

@vazox3 15 күн бұрын

To me the constant talk of gradual/iterative releases is a spin on the fact that there might be a technology plateau. Don't forget Sam is a marketing mastermind.

@aiexplained-official 15 күн бұрын

Could be, for sure

@juandesalgado 14 күн бұрын

I wrote an OpenAI Eval ("solve-for-variables", # 613) for a subset of school arithmetic - namely, the ability to solve an expression for a variable. I don't know if they use these evals for training, but at the very least they should be using them as internal benchmarks. (And I wish they published these results.)

@KitcloudkickerJr 15 күн бұрын

Perfect midday break. I'm watching til the end

@weltlos 15 күн бұрын

It is a bit depressing that even the most advanced models we have access today fail at some of these elementary-level tasks. Reliability is key for real-world deployment. I hope this will be ironed out at the end of this month... or year. Great video as always.

@aiexplained-official 15 күн бұрын

Thanks welt!

@Xilefx7 15 күн бұрын

Good video as always

@dreamphoenix 15 күн бұрын

Thank you.

@cupotko 15 күн бұрын

I could only hope that release cycle of newer/bigger/smarter models won't be affected by longer training times. I think that the main news in the next months should be not new models, but new datacenters with record compute performance.

@TheLegendaryHacker 15 күн бұрын

12:33 To me the answer to this is pretty simple: Opus simply isn't big enough. It's known that transformers learn specalized algoritims for different scenarios (arxiv 2305.14699), and judging by the generalization paper you mentioned in the video, my guess is that those algorithms "merge" as a bigger model gets trained for longer. In this case, all you need to do is scale and reasoning will improve.

@JumpDiffusion 15 күн бұрын

14:15 it is not based on raw outputs/logits. It looks at N reasoning paths/CoTs, and then calculates the entropy of the overall answer distribution (as produced by N solutions/paths). E.g. if possible answers are {A. B, C}, and N = 10 reasoning paths result in distribution {3/10, 3/10, 4/10}, then entropy of this discrete distribution is looked at to decide if it is above given/fixed threshold. If so, it does uncertainty-guided search.

@aiexplained-official 15 күн бұрын

Thank you for the correction. I defaulted to a standard explanation but yet entropy was explicitly mentioned in the paper, so no excuse!

@AustinThomasPhD 15 күн бұрын

I am perplexed by how many errors there are in benchmarks. This has been a problem from the very beginning and, in some ways, it seems to only be getting worse.

@biosecurePM 13 күн бұрын

Because of the AIDPA (AI decay-promoting agents), haha !

@AustinThomasPhD 13 күн бұрын

@@biosecurePM I doubt it is anything nefarious. I am pretty sure it is just lazy 'tech-bros'. The nefarous AI stuff comes from the usual suspects like the fact the Artificial Intelligence Safety and Security Board contains only CEOs and execs, including several Oil Execs.

@roykent2316 15 күн бұрын

Yaaay finally a new video 🥰

@danberm1755 14 күн бұрын

It seems like we need a way to inject a "truth" into a model, not just "train" the model on text. For example, "Street address numbers must not be negative". We need code we can physically look at as proof for that statement.

@DreckbobBratpfanne 15 күн бұрын

Got also lucky access to gpt 2, it seems to be able to learn from examples better within context (when given code with a certain custom class, it uses it without knowing what it was from an example code snippet, while any gpt-4(-turbo) variant always changed it to something else). Maaaybe its slightly less censored too, but got rate limited before it was clear. One thing however that was clear is that this is not a gpt4.5. It had trouble with attention to certain things in a longer context at the exact same point as gpt-4-turbo. So all in all its probably a slight improvement, but nothing crazy (unless it truly is some sort of gpt-2 sized model with verify step-by-step and longer inference time or something). If this would be 4.5, then expectations for gpt5 would be significantly lowered on my part.

@JJ-rx5oi 14 күн бұрын

Great video as always, but I will say your section on GPT 2 chatbot was quite underwhelming. I seen so much information on it's reasoning, math and coding capabilities. Many people including expert coders were talking about just how much better it was than the current SOTA models at solving coding problems. I think this is very significant. I appreciate you coming up with new test questions but it didn't seem like there was enough data there to draw any real conclusions on the models performance. We are still unsure if this model is a large parameter model or it is something more akin to the Llama 3 70b. If this is the case GPT 2 chatbot will be revolutionary, that level of reasoning and generalisation fitted into a smaller parameter size would mean some sort of combined model system such as Q* plus LLM etc. My theory is that it is a test bed for Q* and is very incomplete atm, my guess is they will be releasing a series of different sized models similar to meta, but each model will be utilizing Q*, GPT2 chatbot will be one of the smaller models in that series. The slow speed can be explained by the inference speed allowed on the website and could also be a deliberate mechanic of these new models. Noam Brown spoke about allowing models to think for longer, and how that can increase the quality of their output, this could explain the rather slow inference and output rate. He is currently a lead researcher at Open AI and he is working on reasoning and self play on Open AI's latest models.

@marc_frank 15 күн бұрын

i got to try gpt2-chatbot, too. its answers were mighty impressive (assuming it is more compute thrown at gpt2, not a new model like gpt4.5) i can't help but wonder what would happen if the same thing was done to gpt4 or opus.

@marc_frank 15 күн бұрын

it's good that matthew berman posts so quickly, or else i might have missed it. but ai explained goes more in depth. the mix of both is awesome!

@resistme1 14 күн бұрын

Again amazin video. Had read the Palm2 paper with lots of interest for my own, but very different field of study. What I don't understand as somebody from the EU with no medical background; MedQA (USMLE) is based on "step 1" of USMLE? Or is it also part of the other steps? You state that the pass rate is around 60%. That is about step 1 aswel? It would be more interesting to see what the avarage score is of people that pass, I would think? Somebody can elaborate? Also wondering about the COT pipeline used. Would they also use a RAG framework like Langchain or Lamaindex?

@aiexplained-official 14 күн бұрын

Interesting details to investigate, for sure. Thank you RM.

@DreckbobBratpfanne 15 күн бұрын

I wonder if the fact these models get the simple questions wrong is either just basic random hallucinations, or if its connected to the finding that sophisticated prompts will lead to better answers compared to simple ones, because the next-token-predictor has a higher likelihood of a high quality answer to a high quality question than to amateur-level questions.

@dereklenzen2330 15 күн бұрын

I love this channel because it only releases content whenever there is something truly interesting to hear about. That is why I click whenever I see a video drop. Probably the best KZfaq channel for AI content imho. 🙌👏

@aiexplained-official 15 күн бұрын

Thank you derek

@dereklenzen2330 15 күн бұрын

@@aiexplained-official yw. :)