Claude 3.5 struggle too?! The $Million dollar challenge

Рет қаралды 14,311

Күн бұрын

The million dollar ARC AGI challenge
Get free HubSpot report of how to do AI data analysis project: clickhubspot.com/d30
🔗 Links
- Follow me on twitter: / jasonzhou1993
- Join my AI email list: www.ai-jason.com/
- My discord: / discord
- ARC prize 2024: arcprize.org/
- Kaggle: www.kaggle.com/competitions/a...
- ARC AGI 50% solution overview: www.lesswrong.com/posts/Rdwui...
⏱️ Timestamps
0:00 Intro
2:25 ARC overview
7:11 ARC dataset overview & how to participate
8:50 Tutorial - data prep
10:55 Method 1 - Direct LLM prompt
12:26 Method 2 - LLM chains
13:50 Method 3 - Multi agent
17:25 Method 4 - Search + Prompt
20:00 Method 5 - Active inference
22:51 Example of how to submit
👋🏻 About Me
My name is Jason Zhou, a product designer who shares interesting AI experiments & products. Email me if you need help building AI apps! ask@ai-jason.com
#claude #claude3 #gpt4 #gpt4o #arcagi #agi #gpt5 #autogen #gpt4 #autogpt #ai #artificialintelligence #tutorial #stepbystep #openai #llm #chatgpt #largelanguagemodels #largelanguagemodel #bestaiagent #chatgpt #agentgpt #agent #babyagi

Пікірлер: 38

@TheSchingsching007 27 күн бұрын

I love your content, thanks!!!

@sanchaythalnerkar9577 27 күн бұрын

Awesome and something unique as always!

@jamesyoungerdds7901 27 күн бұрын

Great video, thank you! Long time a.i. follower and huge fan of your channel, so thank you and please keep going :). One of the exciting things about a.i. is what it forces us to learn about ourselves, humanity and what it means to be 'uniquely human'. This video made me think of "pattern matching" as a human characteristic - so generalizing experiences, finding similarities and extrapolating to thoughts and actions based on new information. And also - just genius strategy for using layers of agents and running code to get success 🙌

@user-iy9fr5td2f 27 күн бұрын

The title is confusing, wouldn't have clicked if i wasn't familiar with your content

@AIJasonZ 26 күн бұрын

Thanks for the feedback, will try a new one!

@Blooper1980 25 күн бұрын

Dude! I love your videos!!!

@nabswai 27 күн бұрын

Nice! Is there a reason you don’t try using langgraph for multi agent frameworks?

@TheSchingsching007 27 күн бұрын

Also what tool are you using to highlight your mouse?

@janniksco 24 күн бұрын

Thanks!

@mosca204 18 күн бұрын

Could you share your code?

@Dron008 26 күн бұрын

Thank you for sharing such things and your code. That is really interesting. There a math contest which is ending soon. It would be interesting to know its results.

@EDMJUNIORBR 26 күн бұрын

What about liquid neural netowrks?

@amulyaparmar3346 27 күн бұрын

Is Jason using a Google Colab or a custom juptyer notebook? it looks really useful but never seen a notebook that looks like it

@ilianos 26 күн бұрын

that's Kaggle, a platform which (similiar to Colab) let's you use certain ressources for your Jupyter Notebooks. But of course browser-based.

@ilianos 26 күн бұрын

the link is in the video description

@FloridaMeng 27 күн бұрын

Why not do a brain scan on someone solving this problem and then make a short hand representation of that reaction in code? Apologies if this is am unhinged idea, I've just awaken.

@alexanderrosulek159 27 күн бұрын

Essentially neuralink

@harshnigam3385 27 күн бұрын

Its a very good idea but incredibly hard to implement due to noisy EEG signals

@chasebrower7816 26 күн бұрын

Scanning the brain with enough resolution to translate to digital systems is extremely difficult, mostly because neurons are so small. There have been attempts to recreate some basic representations (images, text) with fMRI but they tend to be very abstract and low-fidelity

@Dron008 26 күн бұрын

We unfortunately know to little yet how our brain works.

@zerorusher 26 күн бұрын

Transformers are fundamentally different from brains. Its like a combustion car and an electric car. Both achieve the same goal and share traits but go a bit deeper and it becomes clear that they work very differently.

@quebono100 27 күн бұрын

The ARG AGI challenge is a slap in the face of all these AI youtubers who hyped the s*it out of AI news. And all those liers like Elon, who predicted AGI by next year.

@nikhilmaddirala 25 күн бұрын

Instead of asking the agent to write code for a rule based program, can't you ask the agent to write code to train a neutral network using the training data?

@Cygx 27 күн бұрын

I think you need a translation layer to a higher abstract representation of the differences to reason and have the ability to combine the requirements back into code. Yeah we would have solved agi…

@user-wd5mp2sm8c 27 күн бұрын

But the don’t count childhood adaptation to world, and studying and those patterns , that takes several years

@jeffsteyn7174 16 күн бұрын

So it struggles with things we struggle with😂

@jtreg 27 күн бұрын

WOT

@Joe-bp5mo 27 күн бұрын

Gonna give it a try, I compared Claude 3.5 VS gpt4o, claude 3.5 is clearly better; though still struggle

@MarkoTManninen 27 күн бұрын

I can confirm on that: 21% against 8% for Claude. GitHub / markomanninen / ARC-AGI / test

@MyrLin8 26 күн бұрын

This is simply because the input mechanisms are so limited. a computer has 1 count 'em 1, while humans have at least 5, all at once. it's not just the pattern it's the noise as well. it took 'us animals' billions of years combining and discriminating between input type patterns and recognition. multiple input streams.

@jessedbrown1980 25 күн бұрын

good video. Sounds just like humans. If a human was never taught colors, then how would they know what colors are. Also comparable to humans with color blindness. So, even though the data is there, the human that does not nor never had or does not have access to it, they cannot solve the problem. We just need to expose the AI system to simulators and engines. Let them play with the simulator and gain the understanding that it needs to always do it.

@notgate2624 27 күн бұрын

Generating 5000 times and picking the best 12 is a bit far off AGI lmao

@sebby007 26 күн бұрын

How many thoughts is it ok to have until you come up with something like the theory of relativity? I get what you mean but I'm not sure how and what we should compare this to.