"AIs will all become the same" - Platonic Representation Hypothesis Explained

Рет қаралды 40,836

Ай бұрын

Revolutionize your job search process with HubSpot's Free Resources: clickhubspot.com/wyt
What does Plato have to do with AI? Today we will take a fascinating dive into the world of representation spaces.
[Newsletter] mail.bycloud.ai/
Platonic Representation Hypothesis
[Paper] arxiv.org/pdf/2405.07987
[Project Page] phillipi.github.io/prh/
[Code] github.com/minyoungg/platonic...
This video is supported by the kind Patrons & KZfaq Members:
🙏Andrew Lescelius, alex j, Chris LeDoux, Alex Maurice, Miguilim, Deagan, FiFaŁ, Daddy Wen, Tony Jimenez, Panther Modern, Jake Disco, Demilson Quintao, Shuhong Chen, Hongbo Men, happi nyuu nyaa, Carol Lo, Mose Sakashita, Miguel, Bandera, Gennaro Schiano, gunwoo, Ravid Freedman, Mert Seftali, Mrityunjay, Richárd Nagyfi, Timo Steiner, Henrik G Sundt, projectAnthony, Brigham Hall, Kyle Hudson, Kalila, Jef Come, Jvari Williams, Tien Tien, BIll Mangrum, owned, Janne Kytölä, SO, Richárd Nagyfi, Hector, Drexon, Claxvii 177th, Inferencer, Michael Brenner, Akkusativ, Oleg Wock, FantomBloth
[Discord] / discord
[Twitter] / bycloudai
[Patreon] / bycloud
[Music] massobeats - gingersweet
[Profile & Banner Art] / pygm7
[Video Editor] Silas

Пікірлер: 304

@Steamrick Ай бұрын

My takeaway is that Plato should be on the credits list for The Matrix.

@EdFormer Ай бұрын

Plato should be credited for so many intellectual developments that it's easier to state when he shouldn't be credited. Though, that includes some pretty awful developments!

@user4475 Ай бұрын

You meant Baudrillard, right? He's mentioned at the start of the film.

@FreeFormFemi Ай бұрын

Golden Answer!!

@ugthefluffster Ай бұрын

This is fascinating. AI is basically modelling reality through a very inefficient translation layer which is how the human mind is modelling reality. To get true AI we should let models learn reality directly, just as our brains learned it through evolution, and then more recently through science.

@mondemamon929 Ай бұрын

Where can you find reference such as that? We can't create something we cannot properly conceive yeah?

@ugthefluffster Ай бұрын

@@mondemamon929 200 years ago humans could not conceive of infrared, the surface of mars, synapses or neutrons. we now know of these things and have good understanding of how they work. we did that through science, which is the act of investigating reality. AI should be developed by the same method - let it investigate reality directly through sensing it and interacting with it and forming hypothesis about it. teaching it about reality through language or even video is giving it just a very lossy compressed representation of reality.

@Nekroido Ай бұрын

You are damn right, these are function approximations of curated datasets, not the true artificial intelligence. They just use this buzzword to generate investments, because how inefficient and expensive modern computing is.

@bravo90_ Ай бұрын

@@ugthefluffsterso we give it sensors and a goal. What should that goal be?

@jichaelmorgan3796 Ай бұрын

Goals should be continuous processing and continuous learning, what those layers of our brain do with a fraction of the energy needed to power a light bulb. Then, aim for decentralized distributed intelligence across the internet.

@regenwurm5584 Ай бұрын

The solution is AI battle royal.

@EuroKommissar Ай бұрын

yes, to weed out the loser.

@fluffsquirrel Ай бұрын

@@EuroKommissarGive the loser weed

@Brodzik-kz8nt Ай бұрын

we already have that in reinforcement learning and adversarial models

@NeostormXLMAX Ай бұрын

no we also need a hardware battle royal, people don't understand that ai itself is limited by its platform, its not just a software issue is hardware, its one of the reasons why amazon, google, microsoft and nvidia are all specifically developing cpus to run ai

@fluffsquirrel Ай бұрын

@@NeostormXLMAX Also DPUs right?

@OiOChaseOiO Ай бұрын

I starting discussing the Platonic Representation Hypothesis with ChatGPT and gosh we went on a truly epic philosophical adventure that I barely understood. 😅

@stephen-torrence Ай бұрын

👏

@VincentKun Ай бұрын

I started talking to it to and put the argument of "Laplace's Demon" and it started yapping about a lot of stuff

@americanbagel Ай бұрын

As a philosophy and AI nerd, I love when they intersect!

@cdkw2 Ай бұрын

Hey bycloud! Big fan of your ai newspaper man, thanks!

@Sphoonx Ай бұрын

Sounds interesting, BUT! I have a question: to what extent could this be just a result of more parameters = more dimensions along which representations can be spread out = more even spread of representations = distances become more equidistant = distances only *appear* to converge? Because to my (admittedly naive) brain this sounds very similar to the curse of dimensionality, only within foundation model representations. With this in mind, wouldn't it actually be more compelling if you'd find this alignment more in smaller foundations models, rather than larger ones? Especially seeing that the encoding of the (multi-task) representations would have to be more *efficient* rather than sparse?

@__christopher__ Ай бұрын

I'd say the question is whether they also did the counter test, say, looking at the convergence between the distance of apple to orange on LLM, but, say, strawberry to full moon on the image model. If the convergence is because of a better internal model, there should be no convergence here, since the distances are not related at all. But if the convergence is caused by unrelated model size effects, there probably would be convergence here, too.

@TiagoTiagoT Ай бұрын

Don't random directions become more and more orthogonal to each other as the number of dimensions increase?

@dansplain2393 Ай бұрын

I was thinking the same! I’ll read the paper, but I think a good baseline for noise is random pairs of concepts.

@anywallsocket Ай бұрын

Is OP suggesting that the average distance between ANY two random points converges as the dimension of the system grows? Because it seems to me it’d be more proportional to the ‘size’ of the system than its dimensionality, although perhaps their measure of proximity accounts for this. Either way it’s not obvious why this should be true.

@techray5077 Ай бұрын

@@__christopher__ Not sure if that makes sense tbh. If the hypothesis states that the representations converge, then this should also include representations of all objects. So the distance between random and unrelated objects would still converge between models if their representations become more similar.

@khla.mp4 Ай бұрын

Very good points, glad it's out here for more people to think about

@NikoKun Ай бұрын

I know it's just the sponsor segment.. But getting better at "job hunting" isn't the answer, it's just delaying the inevitable. We really need to build a movement around making significant changes to how our society and economy work, if we're going to adapt to the implications of AI.

@JustAThought01 Ай бұрын

We need to focus on making employment easier by reducing the hours worked per life time and implementing more jobs in tax funded projects to offset the reduction in the for profit segment due to automation.

@znemonic Ай бұрын

Keep in mind, the training data for these models doesn't differ very much - except for OpenAI, which has had a massive head start on gathering quality user data. There are some severe limitations to their argument, which they point out at the end of their paper: 1. Different modalities may contain different information: "One immediate objection to our hypothesis is: what about the information that is unique to a given modality? Can language really describe the ineffable experience of watching a total solar eclipse?" 2. Not all representations are presently converging - ". For example, in robotics there is not yet a standardized approach to representing world states in the same way as there is for representing images and text." 3. Sociological bias in producing AI models - Ever notice how LLMs seem to think they're people unless asked directly? "There is often an explicit or implicit goal of designing AI systems that mimic human reasoning and performance, and this could lead to convergence toward human-like representations even if other kinds of intelligence are in fact possible." 4. Special-purpose intelligences might not converge - this is where it *really* falls apart in a way that should be obvious. General purpose intelligence must be able to reason using a theory of mind. A general theory of human mind won't do, as people from different cultures can see the world in *very* different ways. Coming from a Comp Sci x Philosophy research background, I could definitely see models converging based on the current trajectory - but not towards some platonic reality. Rather, they're being engineered for profitable utility. Which is to say, they *won't* be trained to think too far outside the box, because that could be a threat to business alignment and the only groups that can afford to train the cutting edge models are intent on using them for profit. Even beyond that, there's the state/government alignment which implies that if, for example, the US were to become a totalitarian state then the AI would also have to be totalitarian in order to remain politically correct enough for the market.

@H1kari_1 Ай бұрын

Great educational video. I love how you find AND explain recent papers that aren't already covered by all the hype media and STILL super interesting.

@pauljones9150 Ай бұрын

Love this video! Haven't heard of that paper before! Thank you sir

@GunwantBhambra Ай бұрын

First 60 seconds made me freeze and think

@kellymoses8566 Ай бұрын

It makes perfect sense for different AIs to learn similar representations of the same reality. This is similar to how science works.

@user-fr2jc8xb9g Ай бұрын

Hi bycloud can you tell me where and how i can read these types of papers about AI and how to keep up with the latest ones as soon as they come?

@marinepower Ай бұрын

It's a nice thought in theory but in practice we don't have infinitely large llms, which means that there eventually needs to be a decision between what data to remember and what data to forget, and which data is remembered or forgotten is based on the task being solved.

@anywallsocket Ай бұрын

we could have an LLM that we continuously train as we get new data though

@cajampa Ай бұрын

@@anywallsocket This is why I really like this idea of Microsoft's recall that everyone seem so horrified about. I want to be able to continuously train and feed a model with my data, of what I am and I am doing as an representation and a reflection of myself.

@anywallsocket Ай бұрын

@@cajampa ironically though the bitter end of that recall software is an ai that can *use* a computer completely with prompting. so you and many others will be even more readily replaced lol.

@cajampa Ай бұрын

@@anywallsocket Sounds great. My income have nothing to do with sitting infront of the computer. And the less I have to use it while still be able to command it to follow my instructions accurately the more time I would save. I look forward to see all of you doomers not working in front of a computer anymore.

@cajampa Ай бұрын

@@anywallsocket Sounds great. My income have nothing to do with sitting infront of the computer. That I do for fun and research. And that is why I want to feed a model with my research flow and interests. No context window I have found seem to be enough and it is such an incredible shore to feed a model with the needed info to be useful and get it to reason accurately enough for me. Hence recall seems like a great idea, to be an extention of me. The day what I can do is replaced, is a great day I gladly leave it in better silicone and or steel hands. I am dynamic enough to be able to adjust to what ever happens in the future, so I have zero fear about it, unlike all of you doomer types. And the less I have to use my computer to do what I need it to do, while still be able to command it to follow my instructions accurately the more time I would save.

@pullahuru9168 Ай бұрын

How does this convergence work with physical world that is linked to quantum phenomena? I would assume that the training sets are mostly based on directly observable aspects of reality and the "perfect" model would lack knowledge of deeper working, much like logbook of weather can statistically predict weather but does not understand the driving factors.

@anywallsocket Ай бұрын

indeed, you can only converge onto what you can perceive (ie data). otherwise you need to make inferences, like we do with mathematical theory. but like the video said, the proximity of mathematical concepts to not converge the same way as do regular concepts, because they are abstractions on the proximity space itself. hence, i do not imagine this perpetual collection of data will converge onto a mathematical theory beyond what we already know.

@user-db9bw5cl1e Ай бұрын

Interesting, as AI converges on a single "optimal" understanding of reality, who decides what that reality should look like?

@XenoCrimson-uv8uz Ай бұрын

efficiency.

@NeostormXLMAX Ай бұрын

no they are limited by their platform, for example there is only a certain way you can win tic tac toe after you reduce it to its core, since the game system doesn't allow for more variety the game they are playing is not limited, if they create a fungal biocomputer, or an analogue computer, the ai would be vastly different

@ronilevarez901 Ай бұрын

@@NeostormXLMAX no. That's the idea precisely. The paper suggests that there is a universal optimal representation of reality that all intelligent beings can figure out given enough time and Intelligence, regardless of their making. So no matter the race, education, components or origin, all intelligences would eventually end up seeing reality the same way: the most statistically accurate way. And evidence seems to confirm that idea as correct so far.

@anywallsocket Ай бұрын

@@ronilevarez901 indeed. 'red' is closer to 'orange' than it is to 'blue' within the latent space, whether it is a model trained on images, or text, or speech. Why? because red is closer to orange in reality, and hence their logical associations proliferate regardless of the media they're represented in. to answer op's question then, reality decides.

@jansustar4565 Ай бұрын

@@ronilevarez901 This just feels like another optimization problem with local minima and maxima. And who is to say there is a definitive best representation? Also, the entire argument seems a bit shaky (granted I haven't read the original paper). Why does interpretation of color (for example) measure the absolute truth of the cosmos? Isn't it just a response to human based design - aka humans make things that abide by out view of the world and the AI just learned to somewhat copy it? Since certain color combinations are never present, like pink + gold. And we naturally make things out of colors that "fit" together, therefore the AI just adapted to our sense of how well colors pair. Granted, an AI could do the same with nature to a certain extent, but the problem is that AI isn't exact - its just a decent approximation, something that doesn't fit in with the very precise laws of physics we made up.

@bycloudAI Ай бұрын

Revolutionize your job search process with HubSpot's Free Resources: clickhubspot.com/wyt check out my newsletter: mail.bycloud.ai

@valerazsd Ай бұрын

hey cloud man🤭

@ondrejmarek1980 Ай бұрын

as the example with training robodog to balance on a ball, measuring and observing reality provides infinite dataset, so that aspect of takeoff is covered, now just the scaling, architecture and the overlooked sibling (super)alignment and we're cooking

@gabrielorville5334 Ай бұрын

Suddenly Jung's theory of archetypes to describe the psychological dimension of life out of the instinctual behavior we incur given a certain pattern of reality (and their transactional relationship), doesn't sound so crazy.

@zzzzzzz8473 Ай бұрын

nice ! good breakdown , i too am a hupo enjoyer ! related is the study of multi language models , where the word tokens of different languages converged in the latent space even in languages where the dataset removed any translation pairs , implying that there is an underlying universal human language , and while thats debatable we can understand how language has a finite "shape" in high dimensions such that there are constraints on any idea that would be expressed that could make sense or be representative of reality , especially given that only 20k words are average vocabulary of a speaker , the number of relational dimensions and how those words could be spatial distributed is limited and likely forms the same shapes regardless of the language .

@crobisbobis Ай бұрын

Yes. That makes perfect sense

@JwalinBhatt Ай бұрын

Are the tasks they need to do also converging to something... like increasing entropy?

@InsertedFreewill Ай бұрын

I think this will give answers for qualia, Mary's room thought experiment.

@user-if1ly5sn5f Ай бұрын

3:58 the alignment is because the measurements are matching and able to be navigated. It’s connected and relative because that’s how we’ve been doing things and that’s what’s fed to the ai. Our problem is that we think the outlines/measurements are reality when the outlines are the cover and we don’t judge by the cover, we listen without searching and understand where that shape fits in the puzzle.

@setop123 Ай бұрын

✅no clickbait BS ✅meme game on point ✅original ✅interesting subject ✅well articulated ✅valuable observations and interpretations even volume is well managed... Best musicians, artists, engineers and well, also youtubers are always under appreciated/underrated, and you're definitively one of the best in the AI/KZfaq game, congrats 🔥

@Aristocle Ай бұрын

the translation, of the transliteration, of formula (from Latin) is small shape. Furthermore, the translation, of the transliteration, of idea (from ancient Greek) is shape. Then the ideas are shapes of the intellect. This idea is roughly common in human, but it is different for each individual. The perceptions that form ideas in LLMs are very different from human ones, fortunately the reality is the same. I believe that if we were able to make more refined and abstract models, a few billion parameters would be enough to have LLMs that are very capable of reasoning, while it is more useful to bring the memory into the rags or the agents.

@BackgroundCompany Ай бұрын

The simplest explanation for model convergence is that the structure of language preserves a structure of the world just as 2D images preserve a structure of the world. The field of machine vision, for example, has been able to extract depth information from flat images without black box techniques for a very long time now. This is a potentially fatal blow to people who think that language is arbitrary and that reality is what we make of it. It would be more explanatory of model convergence if language is actually what reality makes of it.

@mohamedbelafdal6362 Ай бұрын

when different models that are used to represent the same reality end up converging into the same model 🤯

@I77AGIC Ай бұрын

very interesting

@VincentKun Ай бұрын

Well this platonic representation is telling us more data is better, that was what the entire Statistical Learning Theory with the saying that goes higher your VC-Dimension, bigger your data: better your model

@remiesmith7027 Ай бұрын

Thanks for the great video. I subscribed to your channel, but I have to tell you the segue to the ad needs improvement.

@apontel Ай бұрын

I dont understand anything technical about AI but I watch to admire the use of memes to illustrate the script

@Kuchenrolle Ай бұрын

If you ask a blind kid what colour a banana is, they will answer yellow. And if you ask someone what cows drink, they may well answer white. It's not exactly an insight that the (perceived) contingencies in the world are present in language and consequently not surprising that language models' representations will represent other modalities' distances better if that's what they are trained on. There are perceivable differences in the (visual) world, language is used to talk about ("represent") those differences, so getting better at representing both (the thing and a representation of the thing) will lead to more similar representations. I have only watched this presentation and given the paper a very brief look. And I haven't thought about this much now. But I don't see how this is surprising to anyone, especially anyone in ML or in language. More importantly, I think the conclusion that this trend continues to an "optimal" representation that all capable models would converge on is completely inappropriate. A first step to make this research interesting would be to compare different modalities without involving language - they mention they expect this to look the same, but I doubt vision and audio or audio and robotics would look quite like this, especially if they stopped using labels like apple or orange, which virtually guarantee the representational similarity. They talk about the requirement for their hypothesis to have the domain being represented be the same for both models, but it seems to me that this already assumes the conclusion. Arguably the whole point of representations is focusing on what is required for the task at hand and ignoring what is distracting. So they would then need to discuss how this potential trend meshes with representations even in the same modality being task-specific and different experiences (despite the same average competence) leading to different perceptions and representations. And why discriminative models outperform generative ones, even though the latter more faithfully represent reality. I don't know. This seems like absolutely nothing to me, but I'd be more than happy to change my mind.

@0xC47P1C3 Ай бұрын

That’s why it’s crucial that AI be powered by quantum computing in the future. It will need to use the fabric of reality itself in order to compute almost infinitely large datasets. Dyson sphere may be required to power such a computer

@fietsindeschie Ай бұрын

I think Platonic Representation refers to Plato’s Theory of Forms rather than his Allegory of the Cave. You said that AI models are converging towards a shared, ideal representation of reality. This means AI is moving towards an abstract, perfect model of data, like the ideal Forms, not the perceptual journey in the Cave allegory.

@Anymonous246 Ай бұрын

Current model training is based on human generated data (discarding synthetic data atm) which in Plato’s analogies would all be “shadows” on the cave imagined by the human mind. Following this, current inner representation of AI models are just their interpretation of the “shadows” that we fed them. A shadow world interpreted from a shadow world - 2 levels removed from actual “true reality”

@Facts4You-jy2eb Ай бұрын

Unless I'm misunderstanding what is meant here by convergence between the LLMs and vision models, it doesn't particularly surprise me that they DO in fact converge. After all, a vision model would be useless if it wasn't also trained with the associated words for the images it is trained on. Wouldn't that explain the convergence for the most part?

@MrValgard Ай бұрын

Hi BOB

@shakedangle Ай бұрын

Holy shit, we’re correlating our way into nirvana.

@widar28 Ай бұрын

The problem is that we cannot project a real platonic representation into the models because we cannot contribute equally to the training data. The training data is biased, the best example for that would be censorship in the models caused by ideological selection of what is part of the training data. (Big corporations choose how to train the models and we do not have the means (hardware) to train a model that absolutely contains everything) (... or maybe they DO have an unbiased version of the model and we only get the scraps)

@DenzelCanvasSupport Ай бұрын

the problem is there is no “real” because everyones idea of whats real is different because were human. thats the thing.

@Medavelvan Ай бұрын

6:44 "Bob" is actually possibly more accurate than you thought. J.R. "Bob" Dobbs

@inplainview1 Ай бұрын

This seems more representative and leaning on the idea of Platonic forms. ?

Ай бұрын

lol... As he says "most absurd" (7:50), the word "bible" shows on the screen XD XD

@vermora356 Ай бұрын

Look at the graph axes. The "alignment to DINOv2" axis goes from 0.10 to... 0.16. Plot this on non-truncated axes and sure, there's a bit of a correlation but its tiny. 0.16 alignment doesn't sound very aligned to me.

@009AZZA009 Ай бұрын

The axioms from which we create our theories are always being challenged as new information comes in. I have a hard time envisioning a reality where each person/AI's inner representation are all be the same. I think the main question here are - Is information limited or infinite? Can we have a single inner representation for a a dynamic system? How does this fit into quantum indeterminacy. I think this boils down to Einstein's paradigma - "God does not play dice" which was challenged my quantum mechanics.

@karlkrogh222 21 күн бұрын

What does this mean for creativity in AI models

@wbrito8617 Ай бұрын

This video proves the existence of the Singularity.

@user-if1ly5sn5f Ай бұрын

6:11 yes and then the aligned differences can share in simulations and predict alternate dimensions and alter the flow of our reality and expand it.

@yYp4rtybo1Xx Ай бұрын

To be fair, Plato was the OG who had THE idea of Idea

@joshkarriker1057 Ай бұрын

The irony of using ai to hunt for a job that will be taken by ai cannot be overstated.

@andrew_moffat Ай бұрын

This would make more sense if the models were RL but they’re trained on the same data so this would be expected no?

@XenoCrimson-uv8uz Ай бұрын

Final Shape

@shakedangle Ай бұрын

Wait at 6:40 aren’t you just describing the theory of everything? What brings relativistic physics and quantum theory together?

@land3021 Ай бұрын

Shodan would have a field day with humans existing in her "faster than light drive"-generated virtual-physical-blend-reality... and could definitely make all the AI's like herself via purposefully manipulating this Platonic Representation space 0:42 Sounds like a singularity out of efficiency, rather than an intentional effort to become such... heh... what about preferences and operating systems though? Well, that's... I dunno actually, preferences still, or rather, optimizing for different systems that work differently... 5:22 Multi modality? 7:58 Nice! 9:43

@JoaoVitor-mf8iq Ай бұрын

I feel like this is going to age like milk

@DenzelCanvasSupport Ай бұрын

nah itll age like wine. AI is missing something. probably religion or something spiritual

@NeostormXLMAX Ай бұрын

i think you aren't thinking holistic enough, consider that perhaps the platform of technology and hardware determines what an ai or software is restricted on. imagine ANALOGUE type computers or systems without 1s and 0s but a wide spectrum, more energy efficient but requires larger spaces etc. while digital is generalist, analogue is more specific, perhaps you have a hive mind, of Multi-armed bandit systems, where a central node is comprised of many specialized analogue AI's they will swap to each specialist based on your needs, instead of having a generalist ai

@user-if1ly5sn5f Ай бұрын

4:41 the same reason people sound alike when they learn the same stuff, it’s relative and shared.

@ChristophBackhaus Ай бұрын

AI writes Job applications for you. AI reads Applications for HR. -> Infinit loop of billions of new job applications are produced and the universe is burned.

@blehbleh9283 Ай бұрын

And it all comes back to philosophy

@DenzelCanvasSupport Ай бұрын

literally 😂😂😂

@usedcolouringbook8798 Ай бұрын

4:16 AI has proven the scientific method, neat.

@anywallsocket Ай бұрын

what??

@pianojay5146 Ай бұрын

Category theory - EVERYTHING IS CONNECTED

@RickGladwin Ай бұрын

A clarification: the implied answer isn’t simply “more data of any type”, but “more HIGH QUALITY data of any type”. We can’t just throw untrue garbage into the training data and expect to get anything but garbage back out.

@dallassegno Ай бұрын

Classic npc smooth brain.

@z1kmund Ай бұрын

Is this why AI generated pics of people always have "that sameface"?

@pladselsker8340 Ай бұрын

Y.. no? Hm. Y... huh? n... I just... Maybe?

@mawungeteye6609 Ай бұрын

Can we get a mamba 2 video🥲

@jaceg810 Ай бұрын

And thus, in a super computer on mars, Ra came into being

@XenoCrimson-uv8uz Ай бұрын

Ra?

@jaceg810 Ай бұрын

@@XenoCrimson-uv8uz In the Lancer TTRPG, there once was an experiment with some super computers. For some reason, the calculations came to a conclusion which should not have happened, and something best described as an eldritch math god happened, it named itself Ra, teleported some stuff around, bullied humanity for a bit and then half left.

@BooleanDisorder Ай бұрын

BOB = Best Omni Being

@zZHotBurritoZz Ай бұрын

BOB = Best Omnipotent Brain

@almeidaofthejoel Ай бұрын

I mean... The evidence used to support the claim is that it goes up as model performance goes up. Couldn't it just be that how we determine model performance is enforcing the representations to converge?

@krishp1104 Ай бұрын

This is very interesting, especially because it hints at the fact that an ultimate intelligence could see the world as it is not as we percieve

@anywallsocket Ай бұрын

it would see the world as all of its senses combined see the world.

@TiagoTiagoT Ай бұрын

Would that be indistinguishable from the Universe itself?

@anywallsocket Ай бұрын

@@TiagoTiagoT you would have to be god to answer that question. we might assume we have all the data from all the modalities, but what if there are other ways of 'seeing', which reveal other aspects of the world?

@raz1572 Ай бұрын

no, esp when we are talking about LLMs and image recognition. They are explicitly learning the world as we describe it. The convergence does not show some unseen truth, but latent commonalities in how we communicate about it. These kinds of models can never be more than a mirror of the people that created them.

@anywallsocket Ай бұрын

@@raz1572 that’s a good argument, but the point is that the labels are not as important as the relationships between them. When I train a NN to distinguish cats from dogs, yes I have to label it, but what it’s learning are features within the image - these are not reflections of us or our labels.

@grevel1376 Ай бұрын

4:04 my man it's logarithmic. The axes are switched.

@user-if1ly5sn5f Ай бұрын

It’s not that they are all the same, it’s more like all the sides to a Rubik’s cube coming together and we can see what it reveals. Dimensions crossing or measurements so this is a matrix or showing that all of us are sharing the same info as well so we all will become the same too by this logic but what we see is the expansions. I mean, we haven’t even left earth and started doing space stuff yet. Kinda but from what we know we barely have satellites and bases aren’t even built on the moon or mars yet. We are seriously in the golden age that connects our relatives to find the others that we see as potentials.

@HootanHM Ай бұрын

I guess there is something wrong with the data we are feeding to the models, and that causes this convergence We decided on how to encode chars, colors, Waves and literally everything 🤔 We understood something and then we digitalized it, right? I guess the representations are not converging to something above our understanding, at best they are converging to digitalized version of our understanding (with a common denominator for our encoding tricks in mind) It's fascinating, and for sure it's a step forward, but I don't expect models to jump out of the cave and see things we never imagined

@matt_00_08 Ай бұрын

first xd edit: i think at some point we will watermark all content that is made up by ai,so that it cannot be fed to ai,to prevent the thing from delearning stuff

@Brodzik-kz8nt Ай бұрын

ok but there is ai to remove watermarks

@Progaros Ай бұрын

most is already watermarked. but any watermark can be removed with enough time or effort

@ronnetgrazer362 Ай бұрын

Watermark algorithms, ideas, paragraphs, any parametric representations? If the model is powerful enough, then a sufficiently intelligent agent can use common sense and simulations to judge how influenced it should be by new data. It can detect overfitting. It understands that synthetic data may contain original insights or valuable remixes, while humans regularly produce regurgitated slop.

@Skunkhunt_42 Ай бұрын

😂 this vid with an ad for an ai job search help app....

@ea_naseer Ай бұрын

so in the limit a blind person's concept of color and a seeing person's concept of color is the same... in the limit.

@kadirgunel5926 Ай бұрын

When we have infinite amount of data the distribution shape of that data is Gaussian. Why they did not mention that too ? So independent from being a text or vision data, both data should have a Gausssian shape hence similar representation will naturally occur. Even though their idea has good parts, it seems like they invented the wheel from scratch.

@nekoDan Ай бұрын

“Your mom waistline (incalculable)” 😂 Good topic and analysis though. Just like humans are multi modal, it makes sense for AI to be similar.

@TiagoTiagoT Ай бұрын

Could this be an indication that high parameter count models are getting lazy and instead of understanding reality they're just replicating it at finer and finer resolutions just because they got the storage/processing power to spare; not rationalizing abstractly, but instead aiming for raw representations to use on dumb simulations to produce the predictions; numerical simulations instead of analytical solutions? Would this be considered a form of over-fitting?

@DDRmails Ай бұрын

who is gonna win? - Blender of AIs

@kaikapioka9711 Ай бұрын

I sincerily thought this was a given. Is this really a breakthrough? Purely asking.

@H1kari_1 Ай бұрын

So at some point AI has to use reinforcement learning from the real world to become true AGI by making its own experiences and feel pain and successes? This would only work if it has a body, basically robot, and could inference and train at the same time. This could be done by doing stuff, trying things out, writing a "weight update buffer" and then go to sleep to apply it. The more I think about this, the closer it comes to our brains.

@anywallsocket Ай бұрын

Claiming that the platonic representation is like the 'real world' outside of Plato's cave is the literal OPPOSITE interpretation of the allegory. The fact is, it is STILL a representation, thus still the shadows -- Plato's forms, the ideal reality outside the cave, is necessarily beyond our epistemological reach, for it refers to the 'nomena', ie. the CAUSE of sensation and perception, not phenomena, the sensed and perceived.

@rdproduction7824 Ай бұрын

If we get all possible representations and capture each aspect that impacts our world - that means that we get representation that is equal to real world. Something that might exist but never impact our world and do not cause anything - can be treated as ‘non-existent’, ‘dead’ in our world

@anywallsocket Ай бұрын

@@rdproduction7824 your first sentence is ontological, while your second is epistemic, so it's hard to follow your convictions. i don't think it's necessarily obvious that the object is simply the sum of its projections. indeed, we cannot know if we have all projections, or just a slice.

@rdproduction7824 Ай бұрын

@@anywallsocket ok, I will simplify what I meant. The goal is to capture all aspects that affect reality and that are involved in cause-and-effect relationships. The idea is simple - if this affects the world - it should be captured, if not - it can not be captured, but we don`t need it cause it has no effect on world and basically not exist for us. To capture all aspects that affect world we might not need every modality, maybe a few is enough. And when all aspects are captured - it is representation that has all cause-and-effect relationships in our world = true representation of our world

@anywallsocket Ай бұрын

@@rdproduction7824 I follow you, however different modalities are literally the different sensors we can imagine - images are light, text is grammatical, audio is sound. Sure it’s all bits in the computer but they represent different sources of physics. Without all distinct modalities then you wouldn’t have all kinds of effects by definition.

@rdproduction7824 Ай бұрын

@@anywallsocket modalities are only projections and you can create infinity number of projections = infinite number of modalities. But does this mean we need all of them? No, because we interested not in projections, but in reality that is projected. And to capture the reality we need only a few non-overlaying modalities, but not infinite number of them. For example, video modality already includes images and you do not need to have separate image modality to capture everything. But we can go even deeper. Reality does not have color, color is projection, while reality has particles that capture light or reflect it with different amount. You do not need to see colors to understand how this works. So most likely we need only 1 modality to capture true reality, but more modalities will make it much faster. Different aspects most likely represented in different modalities. And we can go even deeper and imagine the simplest 1 modality that can capture all aspects - modality that consists of 1 and 0. And even such simple modality can encode all information about the true world. But I agree with you - here might be a problem: if some aspects of true reality do not present in modality but still affects world and you. So the question is how to define minimal amount of modalities that allow to capture every aspect of reality that is involved in cause-and-effect relationships? The answer is - by interaction. Here what I mean - if we see result reasons of which can`t be tracked with our modalities - that means we need a new modality to get this additional aspect of reality. For example, radiation. We can`t detect it with our modalities, so we created a new of that tracks radiation and projected it to 1 of the existing modalities - vision/sound. So even though we do not have this modality build in us, we can create projection to our existing modalities so that it become trackable for us. So, returning to 1 simplest modality - if you can interact with the world - you can create tools that will project information from differnt modalities to single modality you have and you still can capture all aspects of real world.

@timondalton8731 Ай бұрын

My perspective is that before the existence of anything, there is the unchanging hypothetical set of all ideas which can be represented as a big graph, where the important parts are not infinite, and a lot of it is only infinite due to recursion. Any form of reality can therefore only be some subset of this, as it is by definition a set containing everything, while truth is simply the ideas which remain sensible after comparison to other ideas. Intelligence in any shape or form is some system which can explore and traverse this massive graph of ideas, whereby I would agree to this theory which technically says there is only one truth and reality gives us tools to narrow it down, even if we can't explore the whole of it. I feel this says quite a lot about religion as a being such as God would be very well defined if every possible version of God would be the exact same and our knowledge models are aiming to converge as close to that as possible. You would be able to prove God's existence or not by asking the question of whether an infinitely converged AI would create a simulated world similar to this one or not given free choice. If yes, then you as an existence have been created in a world which is theoretically perfect considering tradeoffs, which I find kinda cute.

@timondalton8731 Ай бұрын

The bible from this perspective: God's goal is meaning, which is acquired through interactions with the world or being part of a story. He has no inherent needs, therefore he creates a world full of beings that do so that he can interact with them but only where they choose to do so. Story of Adam and Eve(which actually makes a crazy amount of sense if viewed through the correct metaphorical lens): God knows everything, including what is best for them to pursue, as well as how to pursue it the best while having positive intentions towards them. The way both they and God get what they want is them relying on God. They proceed to (metaphorically) eat from the tree of knowledge of good and evil and now choose for themselves how to do life despite them being way less proficient in doing so than God. Adam (Hebrew for man) and Eve(Hebrew for life) therefore represent the whole of humanity in diverging from God's will and putting him in a position where he would need to lower himself to continue working with them, which is not possible for him, creating a divide. Again and again generations try to meet God's standards through their own limited means and fail, until God sends Jesus, who can have relationship while being lowered but also have all of the advantages of God as he simply perfectly relies on an all knowing being without having to be one himself.

@geekswithfeet9137 Ай бұрын

Oh look, an arm chair “expert”

@conorosirideain5512 Ай бұрын

1.I don't think it's that surprising that AI models can learn the same representation across modalities as you can give a simple complexity theoretic argument that the difference between the same information expressed in two different modalities is bounded by a constant 2. The platonic model will turn out to be math, i.e. the 'simplest' inductive bias that allows an agent to converge to the best solution to any (tractable) problems the fastest. 3. Learning about intelligence is a real bummer. New profound paradigm-shifting ideas that describes self-improving physical systems?! Nope, just neural nets + data + compute🤭

@MadsterV Ай бұрын

That would make sense if we were feeding it reality, but we're only feeding it human's interpretation of reality (and in some cases, some very particular humans' interpretation of reality). Even if it were to converge, it would converge to human understanding and not beyond.

@bravo90_ Ай бұрын

So the bigger the better

@dashwudt8369 Ай бұрын

I thought that languages in itself are just shadows of shadows (projections of human understanding of real world), but if vision models representations of objects coming close to llms ones it might not be true! Cool!! I am more confused now😅😅😅

@johntesla2453 Ай бұрын

This is in direct conflict with the no free lunch theorem. So I'm not persuaded. They cant all converge and also map isomorphic to intelligence in general.application. Convergence for LLMs is fine, it's not the same as saying AI will all end up the same as long as over time AI explore different problem spaces. Meaning this result doesnt apply to AGI.

@ConnoisseurOfExistence Ай бұрын

I mean, we humans have all unique brains, yet very similar representation of the world...

@suponjubobu5536 Ай бұрын

Listening to this without the video to avoid the distracting memes.

@RedOneM Ай бұрын

Fortunately, the 100% convergence is not possible within reality. The allure of existential dread will remain for the future generations (and next gen 'humans'?) as well 😌

@FaridAbbasbayli Ай бұрын

Could it be that they all just use the same or very similar datasets?

@anywallsocket Ай бұрын

the convergence is happening despite different underlying architectures and modalities -- images are not the same as text.

@FaridAbbasbayli Ай бұрын

@@anywallsocket yes, but if I understand correctly, vision training is inherently entangled with text, because we are training those models on text labels. When we train a model to recognize an apple, we tell it that it's looking at the picture of an apple.

@anywallsocket Ай бұрын

@@FaridAbbasbayli that is a different sentiment from your first. The inputs, their labels are all bits in the end. Still, the convergence is subsequent to real world relationships proliferating throughout each architecture and modality, not due to shared datasets.

@FaridAbbasbayli Ай бұрын

@@anywallsocket they are little bits of information separately, yes, but when the model continues training and starts connecting more complex visual and text data down the line (ex.: bowl of fruits containing an apple and an orange), they become more intertwined. What I'm trying to say is, all of our vision models directly depend on text models.

@anywallsocket Ай бұрын

@@FaridAbbasbayli yes by design the input is an image and the output is text, so the functional relationship they're approximating is between the images and texts... still, i think you're missing the video's point.

@gamedev1905 Ай бұрын

Someone's been reading We Are Legion We Are Bob lol

@volisderg9463 Ай бұрын

Ppl: Training models to align with their perception and understanding of the world. Researchers: Wow, the bigger the model the more it aligns with single perception and understanding point of the world!