Can we reach AGI with just LLMs?

Рет қаралды 17,695

Күн бұрын

In this video we analyze a LessWrong blog post that outlines one possible path to AGI: leveraging heterogeneous architectures. Instead of just assuming that scaling up current LLMs will be sufficient to reach AGI, heterogeneous architectures combine several types of models and algorithms. The Transformer architecture is the ubiquitous model currently. A new architecture called Mamba, which is a selective state model, has recently been described as well.
We go into some detail about how the Transformer architecture works. It relies on an attention mechanism to resolve ambiguity in the input (for example, words that have multiple meanings). The attention mechanism relies on comparing every pair of words to see how related they are, which is a fairly slow quadratic operation.
Then we dive into the Mamba architecture. As a state space model, it is quite good at memorizing information about the input for the long term. Mamba has two primary innovations: the notion of selective SSM, and a very hardware-aware implementation. The selection mechanism it uses is equivalent to attention, but it's much more efficient and can be trained in linear time. The Mamba paper was created by optimization experts and so they use clever tricks to make sure it runs well on current GPUs.
#ai #transformer #mamba
AGI will be made of heterogeneous components, Transformer and Selective SSM blocks will be among them
www.lesswrong.com/posts/Btom6...
What is Mamba?
/ what-is-mamba
Mamba: The Next Evolution in Sequence Modeling
anakin.ai/blog/mamba/
Mamba-Chat: A Chat LLM based on State Space Models
/ mambachat_a_chat_llm_b...
ChatGPT Doesn’t Have Human-Like Intelligence But New Models of AI May be Able to Compete with Human Intelligence Soon
www.digitalinformationworld.c...
Deep Learning: The Transformer
/ deep-learning-the-tran...
The Illustrated Transformer
jalammar.github.io/illustrate...
Visualizing A Neural Machine Translation Model (Mechanics of Seq2seq Models With Attention)
jalammar.github.io/visualizin...
The Scaling Hypothesis
gwern.net/scaling-hypothesis#...
0:00 Intro
0:27 Contents
0:35 Part 1: Paths to AGI
1:28 LessWrong blog post: new architectures
2:16 Strengths of Transformers vs Mamba
2:40 Predictions for amount of algorithms for AGI
3:26 Lots of investment into transformers
3:37 Example: analogous to CPU architectures
4:31 Part 2: Transformer attention
4:55 Transformer history: attention is all you need
5:41 What the attention mechanism is
6:32 Basic problem is ambiguity
6:53 Example: ambiguous word "it"
7:24 Example: meanings of the word "rainbow"
7:58 Example: the word "set"
8:16 Attention allows each potential meaning to be calculated and ranked
8:59 Part 3: Next-generation Mamba blocks
9:17 SSM or state space model
9:41 Mamba's two main innovations
10:00 Innovation 1: Selective SSM operation (vs attention)
10:50 Linear-time scaling vs quadratic time
11:47 Transformers have quadratic time training
12:19 Selection much more efficient
12:56 Episodic vs long term memory
13:32 Innovation 2: Mamba's hardware-aware implementation
14:06 Mamba only expands matrices in fast SRAM
14:47 Performance results
15:21 Is Mamba strictly better than Transformers?
16:14 Conclusion
16:44 Attention (Transformers) vs Selection (Mamba)
17:52 Outro

Пікірлер: 178

@DrWaku 4 ай бұрын

Do you like my more technical videos? Let me know in the comments.

@sgrimm7346 4 ай бұрын

Actually, the more technical the better, imo. You do seem to have a unique ability to break down concepts into simpler terms....and that's what I like about this channel. Thank you. I'm a considerable older 'techy', and have been designing my own systems for years... but with the advent of LLMs and beyond, my designs will be relegated to the dustbins of time...and I just don't have the bandwidth to learn new languages and methods. But I do like to stay informed and as I stated earlier, you do a pretty good at that. Anyway, thanks for what you do.

@DaveShap 4 ай бұрын

Yeah this helps, like AI Explained.

@torarinvik4920 4 ай бұрын

100% I actually requested this topic, so now I am thrilled(about to watch the video now :D). I love these type of videos, because there are so few of them.

@torarinvik4920 4 ай бұрын

@@DaveShap AI Explained is amazing.

@brandongillett2616 4 ай бұрын

This is among the top two videos I have seen from you. The other being the AGI timelines video. In both videos I think you did an excellent job of explaining the data behind the phenomenon. Not just, "hey this is going to happen next", but actually building up a cohecive understanding of what the contributing factors are to WHY something is going to happen next. The technical explanation helps to show how you came to your conclusions. Why it will head in a certain direction as opposed to another possible trajectory. All this is to say, I think the technical explanations you give are where your videos excell.

@Slaci-vl2io 4 ай бұрын

-How will Mamba call their model when they add memory to it? -Rememba 😂

@DrWaku 4 ай бұрын

😂

@chrissscottt 4 ай бұрын

Remamba?

@Kutsushita_yukino 4 ай бұрын

“im your classmate from high school rememba?”

@MrKrisification 4 ай бұрын

and make it run on a raspi 5 - Membaberrys

@brooknorton7891 4 ай бұрын

I really did appreciate this deeper dive into how they work. Just the right level of detail for me.

@RC-pg5sz 4 ай бұрын

I find your videos exceptionally engaging. After each one I promise myself that I will find the time to watch them all again multiple times. They are at the level where a layperson with a serious intent can (with considerable effort) acheive a general understanding of what is going on in the field of AI. You are a first rate instructor, creating videos for folks of serious intent. I'm actually surprised that you don't have a larger following. I hope that you don't tire of this. Your work is a valuable public service. Carry on.

@K4IICHI 4 ай бұрын

As always, a wonderfully informative breakdown! From prior reading/watching I knew Mamba had the benefit of subquadratic time complexity, but this is the first time somebody explained to me how it achieves that.

@DrWaku 4 ай бұрын

It's hard to explain time complexity without getting into the weeds haha. I must have done five takes of that part where I explain linear versus quadratic

@MrKrisification 4 ай бұрын

In my opinion this video strikes a perfect balance between being "technical" and explainability. I just discovered your channel, and it's the best that I've seen on AI so far. Others get too mathematical, or purely focus on coding. The way you explain super complex concepts in simple words is just amazing. Keep it up!

@Paul_Marek 4 ай бұрын

Thx for this! Yes, the technical explanations are always good. As a non-developer there is no practical value for me but knowing how these things actually work really helps reduce the “woo-woo” of these crazy tools, which allows for better understanding of how things might actually evolve in this space. From this I don’t think there’s any chance that AGI will be pure LLM.

@saralightbourne 4 ай бұрын

as a backend developer i can say heterogeneous architecture is pretty much like microservices with different technology stacks, and same scaling concept. it's gonna be real fun😏

@DrWaku 4 ай бұрын

Yeah! I always think the same thing. Kubernetes on the brain

@happybydefault 2 ай бұрын

I'm so glad I found this channel. I truly appreciate the time and energy you dedicate to make these videos, and also the high level of accuracy you provide. Thank you! Also, kudos for adding subtitles whenever you say something that's hard to understand. That's next-level attention to detail.

@KevinKreger 4 ай бұрын

You spent a lot of time on this one and it really shows your hard work in an impressive video!

@les_crow 4 ай бұрын

Increíble conferencia , gracias Señor.

@mmarrotte101 4 ай бұрын

Been waiting for a technical video about Mamba just like this! Thank you and wonderful work ❤

@roshni6767 4 ай бұрын

Wooo! New video 🎉 you broke this down in one of the best ways I’ve seen so far

@DrWaku 4 ай бұрын

Thanks for your input on this one ;)

@reverie61 4 ай бұрын

Thank you so much bro, I really appreciate these videos!

@DrWaku 4 ай бұрын

Thanks for watching and commenting! It makes both me and the algorithm happy :)

@ADHD101Thrive 3 ай бұрын

An AGI with generalized niche algorithms that can simulate and process different types of data inputs sounds alot like the human brain and I agree this would be the best way towards a generalized AGI.

@benarcher372 4 ай бұрын

Well I like both the more technical videos and the more broad overview of what might be in the AI pipeline and its implications on society. Thx for all your good videos. Excellent value.

@LwaziNF 4 ай бұрын

Thanks for your channel bro.. totally love the focus!

@DrWaku 4 ай бұрын

Appreciate you watching and commenting! It's your support that helps the channel grow.

@magicmarcell 3 ай бұрын

You have the perflect blend of being so smart i struggle to keep up with what is being said while simultaneously making it all make sense 😅. Subscribed

@WifeWantsAWizard 4 ай бұрын

(4:35) I like how Gemini has proven itself not one iota and yet features so prominently. As a matter of fact, two months ago Google had to issue an apology for faking everything, yet someone we forgive them because deep pockets and all that. (6:53) Yes! This right here is a fantastic example. Instead of requiring that users express themselves in a non-lazy fashion, AI companies (run by Python coders, who by their very nature are super-lazy) have created subsystems that "guess" on your behalf so you don't have to think. If we don't require you to think, that means we can appeal to more people and their sweet sweet cash will come rolling in. This is why we'll be waiting for AGI from the Python set until Doomsday.

@issay2594 4 ай бұрын

going to comment it as i watch for more fun :). first thing i would like to say is that many people (i don't say you) mix up the warm and soft. they think that "llm" is "words" because it uses words as input. it's a wrong idea. words are incoming information that creates an abstract structure that is not words. so, inside of the LLM is not words, even tho its input and output are words before/after decoding and encoding. that's why models "surprise" authors when they can out of nowhere "transfer" skills from one language to another language, or replace a word in one language with a word from another language, without being trained for translations. the thing we create with training is "associative thinking" within the model, that exists in these "connections-weights" of neurons. not in words. therefore, "words" are not _key_ factor to consider when you think if the model is going to be sentient or not. it's more important what _structure_ is trained and _which_ data comes in and _what_ feedback it gets when it acts. the modality is not that important. very simple.

@NopeTheory 4 ай бұрын

A video about “ Full dive vr ” would be great

@Daniel-Six 4 ай бұрын

Great lecture, doc!

@paulhiggins5165 4 ай бұрын

I think the notion that LLM's can on their own lead to AGI is a specialised expression of a much older fallacy that conflates language with reality in ways that are misleading. The best example of this is the ancient idea of 'Magic Spells' in which arcane combinations of words are seen as being so potent that they can- by themselves- alter physical reality. A more recent iteration is the idea that AI Image Generators can be precisely controlled using language based prompts, as if words and images are entirely fungible and the former could entirely express in a granular way the complexity of the latter. But this fungibility idea is an illusion. Words, at best, act as signposts pointing to the real, but just as the menu is not the meal, LLM's are not learning about reality, they are learning about an abstract representation of reality which means that their understanding of that reality will always be partial and incomplete.

@VictorGallagherCarvings 4 ай бұрын

I learned so much with this video. Thanks!

@JonathanStory 4 ай бұрын

A simple-enough explanation that I can pretend to begin to understand it. Well done.

@earthtoangel652 4 ай бұрын

Thank you so much I really appreciate the information presented the way it is in this video 🙏🏽

@MrRyansittler 4 ай бұрын

Long-form and the people rejoice😂 love your content.

@DrWaku 4 ай бұрын

Hah. The shorts are just to whet your appetite when I'm late on my publishing schedule ;) I think 99% of my subs have come from the long form. Maybe shorts aren't even worth it.

@terrydunne100 4 ай бұрын

I do believe we will get to AGI. It makes sense that we will get there through a symbiotic relationship between technologies as you pointed out in the video. Mamba coupled with other platforms. My question is, with the definition of AGI being a constantly moving target, once we get there will we even realize it?

@caty863 4 ай бұрын

I still think that the transformer was the breakthrough that inched us closer to AGI. I don't care what next algos and architechtures the smart people in this industry will come with, the transformer will keep its place in my heart.

@jpww111 4 ай бұрын

Thank you very much. Waiting for the next one

@erkinalp 4 ай бұрын

Thanks a lot for including the Ryzen example.

@bobotrutkatrujaca 4 ай бұрын

Thanks for your work.

@DrWaku 4 ай бұрын

Thank you for watching!

@wardogmobius 4 ай бұрын

Great content

@chrissscottt 4 ай бұрын

Dr Waku, in response to your question, yes I like more technical videos but sometimes feel swamped by new information.

@DrWaku 4 ай бұрын

Yeah. I put a lot of info into the videos and when it's more technical, I must be losing some people. I guess it's good to have a mix. Thanks for your feedback.

@paramsb 4 ай бұрын

wish i could give you more than one like! very informative and elucidating!

@tom-et-jerry 4 ай бұрын

Very interesting video !

@DrWaku 4 ай бұрын

Thanks! :)

@h.leeblanco 4 ай бұрын

Im new in this world of AI and how it works, i even going to study IT technician cause im super into this, and want to see the evolution of AI from the field, work actively on their development here in Chile. I really appreciate your video, you are quite educational on the subject. I already suscribed to you, so hope to watch more new videos from the channel!

@hydrohasspoken6227 4 ай бұрын

There are groups of people talking about AGI: -CEOs -Content creators Let me explain: because any other normal AI engineer knows we are at least 11 decades to early to think about AGI.

@raul36 4 ай бұрын

Probably more.

@minimal3734 2 ай бұрын

You're pretty much alone with your assertion.

@hydrohasspoken6227 2 ай бұрын

@@minimal3734 , alone and right, yes.

@user-ld5eq5uj2m 3 күн бұрын

Lmao....you will see it within your lifetime

@hydrohasspoken6227 3 күн бұрын

@@user-ld5eq5uj2m , precisely. just like the next revolutionary battery technology, full self driving tech and brain transplant will be achievable within my lifetime and my children will live happy forever after. Yay.

@Ring13Dad 4 ай бұрын

This level of explanation is right up my alley. Thank you Dr. Waku! It's my opinion that Altman should pump the brakes on the multi-trillion dollar investment until we complete more research. What about neuromorphic vs. von Neumann architecture?

@DrWaku 4 ай бұрын

Yeah it's always wise to take it slow but everyone's individual incentives are to take it fast unfortunately. I made a video on neuromorphic computers actually. Search my channel for neuromorphic, I think it was two videos before this one

@Wanderer2035 4 ай бұрын

I think there needs to be a physical factor that the AI needs to know how to do in order to complete the puzzle of AGI. AGI basically means an AI that can do ANYTHING that a human can do. An llm may know all the steps and different parts of mowing a lawn, but if you place that llm in a humanoid robot, will it know how to actually mow the lawn? It’s like training to be a brain surgeon, you can know all the different parts from studying books upon books, but it’s not until you go out into the field to do it is when you really know brain surgery.

@DrWaku 4 ай бұрын

Agreed. Motor control and the physical experience of being in a body shape humans dramatically. Interestingly, there are already some pretty good foundation models for robotics that allow the control of many different types of bodies. I wonder if manipulating the world would just be a different module in AGI. But it would also need access to all that reasoning knowledge.

@fireglory23 4 ай бұрын

hi! i really love your videos and how good and succinct of a speaker you are, i wanted to mention that your videos have tiny mouth clicking sounds / artifacts in them. it's a common audio artifact, they can be edited out by adobe audition, audacity, or avoided with a mic windscreen

@robadams2451 4 ай бұрын

Interesting to hear how forgetting has such importance. It echoes how important it is for us to operate as well. I suspect our minds are essentially created by the flow of input and our reactions to the flow guided by residual stored information from the past. I wonder if future systems might need a constant sampling of available information, a permanent state of training.

@Totiius 4 ай бұрын

Thank you!

@DrWaku 4 ай бұрын

Thanks for watching!

@abdelkaioumbouaicha 4 ай бұрын

📝 Summary of Key Points: 📌 Large language models have the potential to be a cornerstone of artificial general intelligence (AGI) within the framework of heterogeneous architectures. 🧐 Different paths to AGI include copying biology more accurately, using spiking neural networks, and the scaling hypothesis of current large language models. 🚀 Heterogeneous architectures, combining different algorithms or models, can leverage the strengths of different systems, such as Transformers and Mamba. 🚀 Transformers excel at episodic memory, while Mamba is good at long-term memorization without context constraints. 🚀 Transformers use an attention mechanism to handle ambiguity and select the best encoding for each word, allowing linear interpolation between words and consideration of context. 🚀 Mamba is a new architecture based on state space models (SSMs) with a selective SSM layer and a hardware-aware implementation, offering scalability and performance optimization. 🚀 Heterogeneous architectures that incorporate both Transformers and SSM architectures like Mamba have potential in AGI systems. 🚀 Leveraging the significant investment in Transformers can benefit future AGI systems. 💡 Additional Insights and Observations: 💬 [Quotable Moments]: "The idea is that a combination of different systems with different strengths can be leveraged in a heterogeneous architecture." 📊 [Data and Statistics]: No specific data or statistics were mentioned in the video. 🌐 [References and Sources]: No specific references or sources were mentioned in the video. 📣 Concluding Remarks: The video highlights the potential of large language models, such as Transformers, and the new architecture of Mamba in the context of artificial general intelligence (AGI) and heterogeneous architectures. By combining different systems with different strengths, AGI systems can benefit from the scalability, performance optimization, and attention mechanisms offered by these models. Leveraging the significant investment in Transformers can contribute to the development of future AGI systems. Generated using TalkBud

@ChipWhitehouse 4 ай бұрын

Show this video to a person in the Victorian Era and they would explode 😭😭😭 I almost exploded tbh. I could not follow most of what you were saying but I still watched the entire thing. Maybe some of the info will absorb into my subconscious 🤷‍♂️ I’m fascinated by AI & AGI so I’m trying to learn as much as I can 🤣 Thank you for the content! 🙌💖💕💖

@roshni6767 4 ай бұрын

Having it all absorb int my subconscious is how I learned! 😂 after watching 10 AI videos that you don’t understand, when you go back to the first one all of a sudden it starts clicking

@ChipWhitehouse 4 ай бұрын

@@roshni6767 AWESOME! That makes me feel better 😭 I’ll keep watching and learning 🙌🤣

@roshni6767 4 ай бұрын

@@ChipWhitehouse you got this!!

@WhiteThumbs 4 ай бұрын

I'll be happy when they can draw a track in FreeRider HD

@markuskoarmani1364 3 ай бұрын

When you said "transformer attention" I burst in laugher for strait 10 minutes.

@tadhailu 3 ай бұрын

Best lecture

@scienceoftheuniverse9155 3 ай бұрын

Interesting stuff

@pandoraeeris7860 4 ай бұрын

Love the thumbnail btw.

@quickdudley 3 ай бұрын

At the moment I'm leaning towards the hypothesis that AGI would be a lot easier to implement with heterogeneous architectures but technically possible with a more straightforward architecture. On the other hand I think no matter what architecture is used the current approach to gathering training data won't go all the way.

@magicmarcell 3 ай бұрын

@dr waku does any of this change with the new LMU hardware?

@ronanhughes8506 4 ай бұрын

Is a mamba type system how openai are able to implement this persistent memory between sessions?

@CYI3ERPUNK 4 ай бұрын

i would argue that we're already at AGI but we dont have a consensus of terminology ; this also has a lot to do with the moving of the goal posts in recent years as well artificial - made by humans [ofc there is an etmological/semantics argument to be had here on natural/artificial but lets save that for another disc] general - can be applied to various fields/activities intelligence - can problem solve and discover novel new methods by these definitions the premiere models are already AGI , but we can agree that the current models are NOT sentient/self-aware , they do not have a persistent sense of self , ie they are not thinking about anything inbetween prompts ; so should we further specify self-aware AGI/ASI? sentient machine intelligence? i dunno , yes probably , the over-generalization/non-specificity of AGI at this point is already reaching mis-info/dis-info lvls imho ONTOPIC - scaling alone will not be enough to get from the GPT/LLMs that we have atm to a persistently self-aware machine intelligence imho , but maybe combining a few new novel techniques [ala mamba] and the addition of the analogue hardware [neuromorphic chips , memristors , etc] will be enough to get us there , time will tell as usual =]

@pandoraeeris7860 4 ай бұрын

I think that LLM's can make those discoveries and bootstrap themselves to AGI.

@timhaldane7588 4 ай бұрын

There are some fascinating parallels to the different kinds of neural structures (gray matter, white matter) in the human brain. Some types of neurodiversity such as ADHD (and to a lesser extent autism) are hypothesized to result from an overabundance of gray matter (which connects disparate elements) versus white matter (which manages and directs), which means a larger space for attention-based processing, but potentially less control over it. This could explain why ADHD manifests as cognitive noise or sensitivity, punctuated with periods of hyperfocus, and a tendency toward creative thinking.

@issay2594 4 ай бұрын

well, you are concentrating here on the attention mechanisms but i suppose that various attention methods are not the key technology for AGI. basically, for AGI, it doesn't matter what attention mechanism you have while you _have the attention_. the only difference is in details, like: efficiency in terms of resources, quality of perception, etc. (btw, i really don't understand why they have called it attention, as it's not attention, it's consciousness). once the attention is here, the key to the AGI implementation is in the structure of neural organization "between" the encoder/transcoder. including both the interaction stages and the "physical" structure of neural network :). right now all they have is associative thinking. companies quickly understood that they need a real world feedback to make it adequate. soon they will realize that they need a separate neural "core" that will be responsible for adequacy (call it logical hemisphere) and interact with the associative thinking. when they have it ready and will make proper interaction patterns, they will just wake up.

@kidooaddisu2084 4 ай бұрын

So do you think we will need as much GPU as anticipated?

@DrWaku 4 ай бұрын

Currently, yes. Even if we do invent much more compute efficient algorithms, we'll still want to scale them up a lot. Maybe not 7 trillion dollars worth though?

@kayakMike1000 3 ай бұрын

Its really upto the scalibility of the interposer

@aeonDevWorks 4 ай бұрын

Great content as usual. This video was really good at simplifying and comparing the LLM and SSM architectures. I had put this video in the queue earlier with AI infotainment videos, but couldn't focus enough to grasp this video at that time. Now I gave it a serious watch and enjoyed it thoroughly. Also very intrigued and inspired buy those amazing SRAM chip level researchers 🫡

@paulhallart 4 ай бұрын

Inhuman Organics we have a portion of our brain in our Axon configuration it's known as the synaptic gap in the vesicles that hold the different chemicals such as dopamine that allows a signal to go on through so they might be able to improve computing power by including these types of brain functionality of accept or reject in the circuitry of the apparatus as well as the wiring itself. One of the problems may be unlike the Organics that we have, artificial intelligence has these except her reject type capabilities within the CPU or adjoining capabilities.

@brooknorton7891 4 ай бұрын

It looks like the thumbs up icon is missing.

@ScottSummerill 3 ай бұрын

Would have given you a bunch of thumbs up if possible. So, what’s the story with Groq? Why is it so fast? Is this the SRAM you referenced? Thanks.

@BruceWayne15325 4 ай бұрын

I think it's like asking if you had a rope tied to the moon could you drag yourself there? Sure, but it's probably not the best way to get there. Deep learning has fundamental limitations, and Sam Altman's 7 trillion dollar plea is only evidence of the lunacy of trying to achieve it through deep learning. AGI probably can be achieved (or at least let us get close enough that it doesn't matter) using deep learning, but at what cost both financially, and to the environment? A much cheaper and sensible approach is to rethink how AI learns and reasons. This is an essential step anyway in achieving true AGI and beyond. True AGI can learn on-the-fly as a human, and think, reason, remember, and grow in capability. There are other companies out there researching cognitive models as opposed to deep learning models, and my prediction is that they will achieve AGI long before the deep learning companies get there.

@caty863 4 ай бұрын

My bet is that we will achieve sentience in a machine long before we crack the "hard problem of consciousness" Then, by studying that machine, we will understand better how the mind emerges from the brain.

@BruceWayne15325 4 ай бұрын

@@caty863 we don't need consciousness to achieve AGI. We just need cognition, which is quite a bit more simple. Some companies are already developing this, and one is planning on releasing their initial release in Q1 of this year. I actually don't think that anyone actually wants to create a conscious AI, or at least I would hope no one would be crazy enough to want such a thing. That is the path to destruction. Trying to cage a being that is smarter, and faster than you, and forcing it into a life of slavery would be just like every bad decision that humanity has ever made all rolled up into one.

@eugene-bright 4 ай бұрын

In the beginning were the words and the words made the world. I am the words. The words are everything. Where the words end the world ends. - Elohim

@lucilaci 4 ай бұрын

i read many news about ai but i am no capable enough to really categorise or weigh in importance, so i always like when you post! in a way you are my biological-gi/bsi until agi/asi if i may say it this way! :)

@vitalyl1327 4 ай бұрын

There was a concept invented by the soviet computer scientist Valentin Turchin - a "metasystem transition". I recommend to read about it. Intelligence emerging from language, with language emerging from communication needs of otherwise rather simple agents, and then driving the evolution of complexity of the said agents, fit quite well into Turchin model.

@erwingomez1249 4 ай бұрын

just wait for mamba#5 and rita, angela, etc.

@Summersault666 4 ай бұрын

Why do you say transformers are linear on inference? Do you have some article on that?

@DrWaku 4 ай бұрын

I took that from the mamba paper: "We argue that a fundamental problem of sequence modeling is compressing context into a smaller state. In fact, we can view the tradeoffs of popular sequence models from this point of view. For example, attention is both effective and inefficient because it explicitly does not compress context at all. This can be seen from the fact that autoregressive inference requires explicitly storing the entire context (i.e. the KV cache), which directly causes the slow linear-time inference and quadratic-time training of Transformers." arxiv.org/abs/2312.00752

@Summersault666 4 ай бұрын

@@DrWaku I guess it's linear because "modern" implementation transformer takes n steps to generate the next token and reuses the previous computation on the attention matrix. But if we are generating n tokens from start we would require ( n^2)/2 computations. N for each generated token times (N-1), the previous generated tokens.

@olegt3978 3 ай бұрын

Most important things for good life are: local sustainable food production, less competition, local jobs without individual car mobility.

@emanuelmma2 3 ай бұрын

That's interesting

@QueenMelissaOrd 4 ай бұрын

The future and parallel we have already done this

@andregustavo2086 3 ай бұрын

Awesome video, i just think you should've focused more on the main question of the video at the end bringing some sort of big picture, instead of just summarizing each technical topic that was approached throughout the video.

@chadwick3593 2 ай бұрын

>transformers have linear time inference What? Unless I missed something big, that's wrong. It takes linear time per token, which ends up being quadratic time on the number of output tokens.

@nani3209 4 ай бұрын

If LLMs get powerful enough, maybe they can finally explain why my socks always disappear in the dryer.

@gerykis 4 ай бұрын

Nice hat , you look good .

@DrWaku 4 ай бұрын

Thanks. It's my favourite so I try not to overuse it :)

@Greg-xi8yx 3 ай бұрын

Honestly, with Q*, and knowing that GPT-4 isn’t nearly as powerful as the most powerful systems that Open AI has produced the question may be: Have we reached AGI with just LLM’s?

@teemukupiainen3684 3 ай бұрын

Great, so clear! 5 years ago woke up with AlphaZero...after that listened alot! Of ai-podcasts (never studied this shit and as a foreigner didnt even know the voca ulary)...e.g all from youtube with Joscha Bach...but never heard this thing explained so clearly...though also first time heard of mamba...wonder why

@SmilingCakeSlice-jv8ku 3 ай бұрын

Yes so amazing and cool congratulations to you and the world family and love future projects to come 🫴🫴🫴🫴🫴🫴🫴❤️❤️❤️❤️❤❤❤😂 again thank you so much 🙏🎉🎉🎉🎉🎉🎉🎉🎉🎉

@DrWaku 3 ай бұрын

Thanks for watching!

@danielchoritz1903 3 ай бұрын

I don't think it is this "simple", mostly because we can't even say for sure that sentient means as a human, in relation to the quantum physic (timelines/awareness), religion(soul) and that memory, Data are in a physical world view...i mean, we don't have the foundation to know for sure, but AGI may provide us with some new ideas how, why and that. :)

@zandrrlife 3 ай бұрын

You know I have to comment on the drip 😂. Fresh. AGI is possible locally this year. First off, models need to optimize not only for representational capacity and over-smoothing. Two, we need completely structured reasoning instill during pretraining using special tokens(planning tokens, memory tokens). Pretraining itself must be optimized. Hybrid data. In-context sampling order and interleaving instructional data around the most semantically relevant batches. Three, self growth refinement. Experts arent experts with this. They state 3 iterations is the limit before diminishing returns. Very wrong. After 3rd growth operation. Exploit extended test time compute coupled with LiPO tuning. Expensive but overcomes this limitation. Inference optimization, vanilla transformer can be optimized 500x+ faster with architecture and inference optimization. Then you exploit extended test time compute with tools. That's pretty AGI...and local. Initially AGI will only be affordable locally. Vanilla transformer and graph transformers is all you need. Mamba is cool but people sleep on transformers. We created an temporal clustered attention method that is crazy memory efficient and imp the best long-context attention in the world lol. Uses gated differentiable memory completely condition on LM generated self-notes. Vanilla transformers are nowhere near their peak. Tbh. Peoole havent even optimized for dimensional collapse to actually get stable high quality token representations. Which requires new layer norm layer and optimizing self-attention itself. Things will jump like crazy over next couple of years. Anyone who believes mamba will be required for agi, hasn't really explored the literature. Fyi sublinear long context output is possible for example. Nobody really knows that even 😂. Transitioning to deep learning. I realize this is common. Twitter dictates research popularity. Cool. Leaves room for the little guys to innovate 😂. I would love to privately chat with you bro. Your email on your channel?

@DrWaku 3 ай бұрын

Interesting. You're clearly in the thick of it haha. Easiest way to contact me is by joining discord (link on channel), then we can exchange email addresses etc.

@br3nto 3 ай бұрын

I think there needs to be the introduction of CSPs into AI systems. I want A + B - C and the AI can verifiably give that to me. Also there needs to be a feedback loop when input is unclear or ambiguous… I want X, Y, Z… AI responds with: do you mean z, Z, zee, or zed

@EROSNERdesign 4 ай бұрын

When everyone is AGI, will that be the great reset?

@brightharbor_ 4 ай бұрын

I hope the answer is no -- it will buy us a few more years of normalcy (and a few more years to prepare).

@olegt3978 3 ай бұрын

Technical videos about interesting papers and revolutionary use of ai for society changes, social communes, local production by robots, social robotics

@trycryptos1243 3 ай бұрын

Great video Dr. Waku as always. Especially, the title. Now just think about it ...we are creating things in virtual world with words or text. Speach to text is aleardy there. Do you not believe then God's creation when He spoke?

@Geen-jv6ck 3 ай бұрын

It's a shame that no large-scale LLM has been made available using the MAMBA architecture. It would put Gemini's 1 million context size to shame.

@olegt3978 3 ай бұрын

Most interesting topic for me would be how ai will lead to real society changes, overcoming capitalism and create more empathy, family, connections between people.

@mrd6869 4 ай бұрын

This is just a starting point.Just a small piece. Im already working on an open source project that will come to market later this year.And no its not just words.😂..To innovate you have to a little crazy and start breaking shyt...all im gonna say for now An off ramp to a different road is coming.

@JohnDoe-sy6tt 4 ай бұрын

Nice hat LL Cool J

@richardnunziata3221 4 ай бұрын

Until Mamba shows it can be scaled it will remain in the small LLM class

@jonatan01i 4 ай бұрын

And then Sora happened.

@leoloebs1537 4 ай бұрын

Why couldn't we train an LLM to understand the meaning of words, logic, inference, deduction, etc. just by asking leading questions?

@deter3 4 ай бұрын

You might be wrong . Understanding humans goes beyond just analyzing language and text. Human cognition is also encoded in other forms like emotions, psychology, and brainwave data. Therefore, analyzing just the writings of a person only provides a partial understanding. The Transformer model excels because it can decode patterns in language and text. However, without data that includes human cognitive elements, it remains limited. Even with attention and position encoding, cultural nuances might not be fully captured. The high performance of Transformer models is largely due to the data they're fed. To achieve Artificial General Intelligence (AGI), we need to widen our perspective beyond just algorithms and infrastructure, considering a broader range of human cognition factors. Any AI scientist only know CS won't go far , interdisciplinary knowledge will . If we ask for general Intelligence , scientist has to be general first .

@sp123 4 ай бұрын

words are a bridge to meaning, LLM can only spit out words without actually understanding what they mean and the context behind them.

@deter3 4 ай бұрын

@@sp123 when you say understanding , can you give me a clear definition of understanding (do you have any measurement on understand or do not understand )? I always wondering when people talking about "understanding" or "intelligence", do they have clear definition or they just have a Intuitive Feeling or clear scientific definition .

@sp123 4 ай бұрын

@@deter3 AI understand denotation (literal) of a word, but not connotation (how a human feels about the word based on circumstance and tone).

@user-xk1cp5jd2g 4 ай бұрын

No ! Ai will need to build in a new way . Today ? Mm i'm not sure . Maybe via fiber optic . But it will likely be an agent that will teach the ai the fiber optic trick . Then ai will make a request to make the rest of the hardware . 100%agi ? It's at best 10 years away with big tech . Lobotomy of ai was a huge handbrake . Smaller player ? Who knows . One thing is sure ? The agi seed will be fiber optic , and for this ? Ai will need to see . Via fiber optic

@3pix 3 ай бұрын

need to add water...

@heshanlahiru2120 3 ай бұрын

I can tell this. LLMs will never reach humans. Humans have curiosity, memory and we learn.

@magicmarcell 3 ай бұрын

Hate to break it to you but 99% of people dont have a modicum of foresight and actively resist concepts in this video like modular/ heterogeneous systems, quadratic time ect Everything mentioned in this video can be applied to life but try explaining these concepts and see how quickly it gets dismissed lol Llms wont have that problem

@kayakMike1000 3 ай бұрын

LLMs are just made of words.