Was "Machine Learning 2.0" All Hype? The Kolmogorov-Arnold Network Explained

  Рет қаралды 64,976

bycloud

bycloud

17 күн бұрын

Streamline AI task delegation with HubSpot's Free Playbook: clickhubspot.com/9yu
Would KAN be the next paradigm shift in machine learning? Let's find out.
KAN: Kolmogorov-Arnold Networks
[Paper] arxiv.org/abs/2404.19756
[Code] github.com/KindXiaoming/pykan
[GPT-2 KAN] github.com/CG80499/KAN-GPT-2
This video is supported by the kind Patrons & KZfaq Members:
🙏Andrew Lescelius, alex j, Chris LeDoux, Alex Maurice, Miguilim, Deagan, FiFaŁ, Daddy Wen, Tony Jimenez, Panther Modern, Jake Disco, Demilson Quintao, Shuhong Chen, Hongbo Men, happi nyuu nyaa, Carol Lo, Mose Sakashita, Miguel, Bandera, Gennaro Schiano, gunwoo, Ravid Freedman, Mert Seftali, Mrityunjay, Richárd Nagyfi, Timo Steiner, Henrik G Sundt, projectAnthony, Brigham Hall, Kyle Hudson, Kalila, Jef Come, Jvari Williams, Tien Tien, BIll Mangrum, owned, Janne Kytölä, SO, Richárd Nagyfi, Hector, Drexon, Claxvii 177th, Inferencer, Michael Brenner, Akkusativ, Oleg Wock, FantomBloth
[Newsletter] mail.bycloud.ai/
[Discord] / discord
[Twitter] / bycloudai
[Patreon] / bycloud
[Music] Massobeats - Lush
[Profile & Banner Art] / pygm7
[Video Editor] @askejm & Slias

Пікірлер: 184
@bycloudAI
@bycloudAI 15 күн бұрын
Streamline AI task delegation with HubSpot's Free Playbook: clickhubspot.com/9yu and check out my newsletter 😎 mail.bycloud.ai/
@quebono100
@quebono100 15 күн бұрын
Hmm, don't you know that machine learning is a subset of artificial intelligence?
@gameboyplayer217
@gameboyplayer217 Сағат бұрын
Why don't we combine both for more optimal results?
@ThatTrueCJ201
@ThatTrueCJ201 15 күн бұрын
What KAN is really cool for in my opinion is to find mathematical functions between data where there didn't exist any in the past. And since we know a lot about mathematical optimisation and things like the Taylor/Fourier series, we could theoretically calculate the input-output relationship much more cheaply (inference becomes commodity). Training would be more expensive however
@nyx211
@nyx211 15 күн бұрын
I watched a talk by one of the authors and it seems like KANs are more useful for people doing science with relatively small models. For LLMs and image generators, however, knowing the exact mathematical function doesn't seem to be very useful.
@adamrak7560
@adamrak7560 6 күн бұрын
what about training with GELU/SELU as usual, and converting it later? Interpretability usually is done _after_ training is done anyway.
@Filup
@Filup 2 күн бұрын
I am curious as to whether there will be applications with PINNs in the future, given this possibility
@flowerpt
@flowerpt 15 күн бұрын
Hey, KAN! Hiya, BAR-B. You wanna go for a spline?
@hannen758
@hannen758 2 күн бұрын
😂😂!
@rothauspils123
@rothauspils123 15 күн бұрын
Still waiting to wake up and realized all of this was just a dream.
@4.0.4
@4.0.4 15 күн бұрын
GPT? Computers that can draw? Bro it's 2005 wake up.
@csiguszfoxoup
@csiguszfoxoup 15 күн бұрын
@@4.0.4 god I wish
@justsomeonepassingby3838
@justsomeonepassingby3838 15 күн бұрын
Don't worry, transformers are still unable to do anything they haven't learnt from their dataset
@nescaufe1991
@nescaufe1991 15 күн бұрын
Favorite comment of who knows how long
@underscore.
@underscore. 15 күн бұрын
​@@justsomeonepassingby3838 they definetly can.
@kapiushonkapiushon46
@kapiushonkapiushon46 15 күн бұрын
i thought fireship uploaded
@manavkumar348
@manavkumar348 15 күн бұрын
He did yesterday Edit: And now 5 hrs ago again
@kapiushonkapiushon46
@kapiushonkapiushon46 15 күн бұрын
@@manavkumar348 yeah i watched it i don’t understand the computer concepts i tune in for the comedy and fireships memes that man is so funny
@BRBS360
@BRBS360 15 күн бұрын
He did, just 3 hours later.
@kapiushonkapiushon46
@kapiushonkapiushon46 15 күн бұрын
@@BRBS360 3 hours later what a coincidence its a good day for fireship followers
@Words-.
@Words-. 13 күн бұрын
@@kapiushonkapiushon46lol same, it’s a good way to expose myself towards a bit of what’s going on in the software engineering field though, both him and bycloud(more machine learning, thankfully not just llms)
@efraim6960
@efraim6960 14 күн бұрын
I cannot believe My Little Pony powers the AIs that I regularly use.
@bresevic7418
@bresevic7418 9 күн бұрын
It's true, and nvidia currently has a massive hold on manufacturing the power of friendship, which is why they're dominating the stock market The GPU's are a side business
@Steamrick
@Steamrick 15 күн бұрын
Are you sure that a KAN will save VRAM? Yes, you need less parameters, but unless I misunderstood the video wouldn't a KAN need much 'bigger' parameters than a highly optimized MLP? A function should need a lot more bits to store than a 4-bit or 8-bit parameter.
@pythia666
@pythia666 15 күн бұрын
yeah it feels like this is just trading off regular parameters for less efficient and effective ones
@angelorf
@angelorf 15 күн бұрын
I don't think they would have counted a whole spline as a single parameter. A 1D B-spline with 4 control points simply has 4 parameters.
@AleatoricSatan
@AleatoricSatan 11 күн бұрын
Exactly, now you get to have less layers & less parameters per layer, but now your parameters are up to n times bigger. Except if they count, simpler cases (eg some curves are simpler than others so less data points) and they could shave of some low percentage of the size there (10-15% perhaps? Just pulling a random estimate). If that is case though, I do not understand why we can't just enrich MLPs with b-spline nodes when necessary and wrap this up, networks that mix multiple different activation functions are pretty common today. Instead seems like everyone is desperate to announce and hype the next best thing.
@WaefreBeorn
@WaefreBeorn 10 күн бұрын
I'm using gpt4 to design a KAN bspline stem separation model, KAN-Stem, this has ballooned the ram usage due to layer training parameters, there is no efficiency addition, what I get is layer complexity and weighting structure causes the initial abstraction into ram to skyrocket. My basic 5 example model with one second chunks when test ran on cpu only estimated 854gb of ram usage, I only have 64gb, rn making a caching and parsing system to step by step the training process as a ram swap with cache to prove the viabilty. IMO KAN is better for high spline prediction (1 input, 7 output) which is why I chose it for audio stem separation.
@jeremykothe2847
@jeremykothe2847 4 күн бұрын
@@WaefreBeorn In my testing you should be able to use far smaller layers for a KAN network to solve a similar problem. It's very situation specific though as you note.
@Guedez1
@Guedez1 15 күн бұрын
Ok, but when Kan we use it? :^)
@johndank2209
@johndank2209 14 күн бұрын
probably in 2 or 3 years you will see tech demos, the same way gpt 2 was introduced.
@raspberryjam
@raspberryjam 8 күн бұрын
whenever the gpu wizards grace us
@UnbornIdeas
@UnbornIdeas 15 күн бұрын
Is it KAN-enough? We don't know but we'll find out eventually!
@setop123
@setop123 13 күн бұрын
Gr8 simplification, thank you ! ❤‍🔥
@Nekroido
@Nekroido 9 күн бұрын
I was confused why the activator function should be a static sigmoid. I'd just come from FP to study ML and it made total sense to have those adjustable along with weights. 10x more efficiency is pretty impressive on paper tbh. Really looking forward to see what researchers will achieve with KAN
@Woollzable
@Woollzable 5 күн бұрын
Mate, sigmoid is barely used anymore unless its for the output layer. Sigmoids are used as an introduction to artifical neural networks / DL, most people stopped using them years ago due to vanishing gradient problem. There are many activation functions that are used in intermediate layers that are far more effective.
@Nekroido
@Nekroido 4 күн бұрын
@@Woollzable thanks for the insight. Indeed, I only did introduction to ML, and had to go back to study related topics in mathematics. I didn't even remember the name of the function from that introduction, but sigmoid was mentioned in this video as an example
@KostasOreopoulos
@KostasOreopoulos 4 күн бұрын
In mathematics we have "Generalized linear models". The simple explanation is that we know linear regression. What they forget to teach (not always) is that in order for that to work, all parameters and the result should have the should have the same distribution. For example Normal. What happens when they dont. We have to transform the output of the regression from one Distribution to another (or the other way around). This is easy for exponential distributions. Those S functions (or relu) are transformers from Normal to Categorical (we call that logistic regression). But that is not alway accurate ofcourse. It has been proven good enough though. In theory we could have different transform function that better map between those distributions. So the idea is pretty simple and I guess for many cases where logistic regression is obvious, it will fallback to obvious S-like functions. It would be interesting if that could be adaptive. Mean starting with simple Relus and by some criteria increase the Spline points etc
@AlexLuthore
@AlexLuthore 13 күн бұрын
I really like that kan isnt a black box. Thags huge for alignment
@jeremykothe2847
@jeremykothe2847 4 күн бұрын
So which spline shape are you looking for to explain "evil"?
@spencerfunk6697
@spencerfunk6697 15 күн бұрын
this would be cool to integrate into the mlp frameworks we have. it would be cool having something that inst just linear regression. i think what makes kans stand out its how theyre output can dynamically change. if we could think having this alongside transformers would be sick
@Words-.
@Words-. 13 күн бұрын
Great analogy for curse of high dimensionality! I’ve never heard of the term, as I’m not in ML, but your analogy was easy to understand
@musicproductionbrauns2594
@musicproductionbrauns2594 13 күн бұрын
maybe just FM sin waves as activation functions as fourier sin composition = all functions eexisting
@ansidhe
@ansidhe 4 күн бұрын
that’s a good idea as an alternative to b-splines! Great thinking! 👍🏻
@ModernTruthRevelation
@ModernTruthRevelation 4 күн бұрын
this is actually really smart. I wonder how many parameters this would add.
@musicproductionbrauns2594
@musicproductionbrauns2594 4 күн бұрын
@@ModernTruthRevelation I just thought frequency,amplitude and phase per activation function, but to be honest I am not deeply into programming neural networks but just from music I know you can already get some crazy function / waveforms from just like 10 sinus function in a row ... In a neural net you also mix every point up so probably you can get allot of variations / paths
@mujtabaalam5907
@mujtabaalam5907 14 күн бұрын
2:00 where is this from (the blue and orange)? I remember it was a google course of some kind but I can't find it
@NuncNuncNuncNunc
@NuncNuncNuncNunc 7 күн бұрын
It's the tensorflow playground
@alcardianzilthuras2396
@alcardianzilthuras2396 15 күн бұрын
I love this and all the other ideas for how to improve on AI like mamba, but I will believe them when I see the first competitive model to Mixtral, Llama3 or ChatGPT being released that utilizes any of these concepts.
@ronilevarez901
@ronilevarez901 15 күн бұрын
However it is possible that many of this improvements won't be usable at all for the current trendy AI tools we have and new types of AI apps will have to be developed, that will be smarter and faster.
@karthikeyank2587
@karthikeyank2587 15 күн бұрын
What is that substack profile of ur newsletter,I prefer to read in substack
@VivekYadav-ds8oz
@VivekYadav-ds8oz 14 күн бұрын
I've also been hearing a lot about "liquid networks". Been filling my YT feed lately. It'd be cool if you could make video on that.
@TheStickCollector
@TheStickCollector 4 күн бұрын
Impressive what they can do behind the scenes.
@jsivonenVR
@jsivonenVR 15 күн бұрын
I’ll just admit that this was way over my head 😅👌🏻
@dmitryr9613
@dmitryr9613 13 күн бұрын
I'm surprised that only about 60% of it went over my head , reading an entire Twitter head about KAN might've helped tho
@kinkanman2134
@kinkanman2134 12 күн бұрын
@@dmitryr9613 lol same. ive been SUPER interested in Ai suddenly so im trying to use my fortnite rotted brain to learn on twitter and this video shows its lowkey working. being able to just screenshot tweets and send it to gpt4 omni for free tutoring is amazing
@thebrownfrog
@thebrownfrog 15 күн бұрын
Thanks
@UFOgamers
@UFOgamers 10 күн бұрын
Did you made an episode about liquid neural nets?
@skeptiklive
@skeptiklive 15 күн бұрын
Could you use a mature MLP model to produce high quality synthetic training data for training a KAN model? In other words, can you "overfit" a KAN model to the outputs of something like GPT-4 to a sufficient similarity in output that you could then run that model on consumer hardware? 🤔
@atticusbeachy3707
@atticusbeachy3707 2 күн бұрын
Where is the quote at 3:52 from? ("The use of splines is not necessary. In particular, they seem quite expensive due to the recursive nature of B_{i,n}. Many other families of non-parametric AFs are possible [ADIP21]. For example, our KAF [SVTU19] provides a similar flexibility without any need of recursion and it should be pretty straightforward to implement")
@novantha1
@novantha1 15 күн бұрын
Hm... I wonder if this doesn't pave the way for a hybrid setup with either MLP + KAN MoE models, or maybe a series FFN where you have a small MLP block to handle noisy inputs which feeds into a KAN that does the actual approximating.
@LuicMarin
@LuicMarin 14 күн бұрын
Yes we KAN!
@Sams-li8tj
@Sams-li8tj 15 күн бұрын
I wonder if you have a custom CLIP model that maps each sentence in the script to a meme.
@inconformada1000
@inconformada1000 15 күн бұрын
What about the bias, you can change it also 2:05
@anthonychiang3182
@anthonychiang3182 15 күн бұрын
biases can be represented as weights
@daycred
@daycred 15 күн бұрын
@@anthonychiang3182 And how would you represent an offset as a multiplier?
@inconformada1000
@inconformada1000 15 күн бұрын
@@daycred Well I guess it could but it would be computacionaly ineffective, bycoud just let that one slide.
@anthonychiang3182
@anthonychiang3182 15 күн бұрын
@@daycred constant node of 1 as input for each layer, then just adjust the weight of that node’s edge to the next layer
@daycred
@daycred 14 күн бұрын
@@anthonychiang3182 Ahh, now I get what you mean. They're not thought, and words have a meaning so the og comment is still right. And besides, at that point that node basically has a bias of its own though i guess it isn't trained itself
@Eric-yd9dm
@Eric-yd9dm 15 күн бұрын
I can imagine a professor saying "Yes I KAN" "No you KAN't" "Yes I KAN"
@key_bounce
@key_bounce 15 күн бұрын
What is that bike design at 0:02 for?
@The.Anime.Library
@The.Anime.Library 15 күн бұрын
Is for illustrating reinventing the wheel
@newbie8051
@newbie8051 10 күн бұрын
2:40 to get the network respond correctly to an input right ? How will it respond to an output lol
@timeflex
@timeflex 15 күн бұрын
Given the fact that 1.53-bit networks are already in the labs, I doubt KAN with 32-bit precision will be any smaller.
@SolathPrime
@SolathPrime 14 күн бұрын
Instead of KAN or MLP why don't we just sum single layer perceptron activations in parallel Like this for example: ```python # imports import numpy as np from datasets import xor # load xor labels for example xs = xor.xs ys = xor.ys # weights = 5 input_size = 2 output_size = 2 ws = np.random.randn(weights, output_size, input_size) bs = np.random.randn(weights, output_size, 1) # batch dot product pred = np.einsum("woi,io->woj") + bs ``` This when tested appears to be faster in training and even better in parallelization
@meguellatiyounes8659
@meguellatiyounes8659 14 күн бұрын
briliant!!
@SolathPrime
@SolathPrime 14 күн бұрын
Wait why did my comment disappear?
@MrSongib
@MrSongib 14 күн бұрын
In a nutshell, we still need more memory. xd
@chsovi7164
@chsovi7164 15 күн бұрын
im a bit confused how they avoid the problem of not every b spline being a function? why not use fourier series? you could just train the whole neural net with a n=1 fourier series then once the nn starts converging on a value for the activation, you make it n=2 and start adjusting that instead
@alkeryn1700
@alkeryn1700 15 күн бұрын
Someone actually did that lol
@franzwollang
@franzwollang 15 күн бұрын
@@alkeryn1700 sauce
@chsovi7164
@chsovi7164 15 күн бұрын
@@alkeryn1700 link???
@Eltaurus
@Eltaurus 2 күн бұрын
Aren't you confusing B-spline with Bezier?
@alkeryn1700
@alkeryn1700 2 күн бұрын
@@Eltaurus nope, i also shared the link but youtube deleted it lol. You can easily find it though
@NuncNuncNuncNunc
@NuncNuncNuncNunc 7 күн бұрын
The description of neural networks seems just a bit off. Training difficulty seems like an implementation problem. There is also the issue of where you wish to place your costs. Models with fewer parameters may be cheaper (pick your metric) to run outweighing training costs. I thought you were going to make it through without a nod to figure 2.1
@jeremykothe2847
@jeremykothe2847 4 күн бұрын
It was hype, and channels like this were the ones who hyped it.
@TiagoTiagoT
@TiagoTiagoT 15 күн бұрын
What if the weights and biases of each neuron actually each also had their own trainable weights and biases, working as sub-neurons for each neuron, and you would train those instead of the neurons own weights and biases directly, sorta training the network to rewire itself on-the-fly?
@Coach-Solar_Hound
@Coach-Solar_Hound 14 күн бұрын
adding a linear layer inside of a linear layer would make the system still behave linear wouldn't it?
@TiagoTiagoT
@TiagoTiagoT 14 күн бұрын
@@Coach-Solar_Hound It would still be using conventional non-linear activation functions; the difference is it would adjust the weights and biases at inference time using the same mechanism that currently just drives the neurons directly..
@poipoi300
@poipoi300 5 күн бұрын
How do you adjust the weights at inference? Magic? You need to know what the output would be and therefore it's just regular training. Besides you can already train on inferences and have a learning model with any NN if you're dealing with data that changes over time. Take seasonal weather for instance, it's been done to predict like 10 minutes in the future, then 10 minutes later the model is trained by a small margin on that output. Adding smaller weights on the overall architecture here really doesn't do anything.
@TiagoTiagoT
@TiagoTiagoT 5 күн бұрын
@@poipoi300 Didn't you read what I wrote? There would be special neurons inferring the weights of the regular neurons at inference time.
@75hilmar
@75hilmar Күн бұрын
We know that it is impossible for humans to fully understand all the effects of machinelearning, yet it still works. Thus it might be possible that AI might find robust strategies with good generalisation, right?
@TeleviseGuy
@TeleviseGuy 15 күн бұрын
To my puny brain, KAN is a TV channel, and MLP is My Little Pony.
@zhelmd
@zhelmd 13 күн бұрын
I should have paid more attention to math in school
@justindressler5992
@justindressler5992 15 күн бұрын
I thought the activation function wasn't that important it only really needed to clamp values from out liers. Over fitting would make sense because the activation is fitted against the data. Plus dimensionality in MLP can be reduced by pruning and sparsity training.
@DustinRodriguez1_0
@DustinRodriguez1_0 Күн бұрын
Wouldn't the nodes in the B-spline internal to KAN just end up being represented across multiple layers of perceptrons? Sure you use less params... because you're training 4-5x more "weights" but just calling them B-spline control points. If the number of control points used in the B-spline is a dynamically learned property rather than being fixed across the layer or whole model, then I could see it being more interesting. But as-is, it sounds like a difference without distinction and if you just squint at a big MLP, you could interpret it as approximating a KAN.
@krassav43g
@krassav43g 14 күн бұрын
nah relu is best thing ever
@nevokrien95
@nevokrien95 21 сағат бұрын
This does not seem like it scale. Main issue is that having a polynomial can just get the zero/exploding gradient more easily. Other issue is thst ur parameters r not modeling relationships so ur using more parameters per connection.
@jameshughes3014
@jameshughes3014 15 күн бұрын
You have a real gift for explaining this stuff. I feel like even my smooth brain gets it. Thank you
@arg0x-
@arg0x- 15 күн бұрын
What math should i need to learn to understand this video?
@GeneralKenobi69420
@GeneralKenobi69420 15 күн бұрын
yes
@arg0x-
@arg0x- 15 күн бұрын
@@GeneralKenobi69420 😭😭😭
@MilkGlue-xg5vj
@MilkGlue-xg5vj 15 күн бұрын
No
@justsomeonepassingby3838
@justsomeonepassingby3838 15 күн бұрын
Start with simple MLPs (multi-layered perceptrons), activation functions and backpropagation, with digit recognition as the main "goal". You don't need to know all the algorithms, just how neural networks work. Wait a few months until you are familiar with the concepts, then check how NLP is solved on google translate with adversarial networks and tokenization (converting words and sentences into vectors that can be understood by other models) For adversarial networks, you should write at least one autoencoder to really understand how it works (with pytorch or keras, to also get used to high level AI libraries that describe MLP layers as simple functions). Then, read/watch about transformers and the attention mechanism, and wait a few months again to meditate. By that point, you can make your own transformer, or re-watch bycloud's videos to get a summarized technical explanation and the keywords to google in order to get up to date with the latest shiny things
@nyx211
@nyx211 15 күн бұрын
The math might look intimidating, but it's not too difficult to understand if you already understand how MLPs work. The only thing you need to wrap your head around are B-splines.
@finn_the_dog
@finn_the_dog 15 күн бұрын
"Your mom" 😮😂
@75hilmar
@75hilmar Күн бұрын
He put in a dog and a cat 😂
@jondo7680
@jondo7680 15 күн бұрын
The problem is as long as meta or mistral won't use it... it's just theory.
@alexxx4434
@alexxx4434 12 күн бұрын
If KAN takes less RAM for more compute, then it's a good trade off at the current stage of development.
@AleatoricSatan
@AleatoricSatan 11 күн бұрын
A bit less ram, but a lot more processing time it's faster on CPU than GPU due to the the branching required for each custom curve. Some hacky things could be done to have it operate on data that acts like textures, but the complexity in implementation goes through the roof and the results are questionable. It remains to be seen.
@JonasMielke
@JonasMielke 8 күн бұрын
Does 3blue1brown know about your usage of his animations? Good essay tho
@user-fc3cz6nh5j
@user-fc3cz6nh5j 15 күн бұрын
Idk if i KAN take this anymore, its too much.
@honkhonk8009
@honkhonk8009 6 күн бұрын
I think its better to have more efficient models than just fast ones. The brain takes more time to learn, but takes less cycles.
@Wlucrow
@Wlucrow 15 күн бұрын
Kolmogorov Arnold Network is short for KAN?
@XenoCrimson-uv8uz
@XenoCrimson-uv8uz 15 күн бұрын
Kan the conqueror
@cvs2fan
@cvs2fan 11 күн бұрын
bycloud has the mest meme trancisions i have ever seen hod you do it?
@dhanooshpooranan1861
@dhanooshpooranan1861 8 күн бұрын
do liquid neural networks
@Embassy_of_Jupiter
@Embassy_of_Jupiter 14 күн бұрын
The fact that it overfits much easier just means that they used too many parameters for the data they tested, no? That just sounds like it is even more efficient than they claim. Also perhaps by constraining the splines more they could avoid overfitting
@JImBrad
@JImBrad 15 күн бұрын
hey
@TheDreamFx
@TheDreamFx 14 күн бұрын
Can you feel KANergy?
@akispag1519
@akispag1519 14 сағат бұрын
Im just KAN
@apolodelsol
@apolodelsol 6 күн бұрын
AI by itself is just hype
@dinoscheidt
@dinoscheidt 15 күн бұрын
1:13 I dearly hope you don’t believe this “personally”. Money is fine.
@leosmi1
@leosmi1 10 күн бұрын
I think this paper is like thtat thoroidal fan blade LMFAO
@gergelymarta5524
@gergelymarta5524 6 күн бұрын
kan it run crysis
@dafidrosydan9719
@dafidrosydan9719 15 күн бұрын
i dont understand any of those fancy mathematical equations TvT
@meguellatiyounes8659
@meguellatiyounes8659 14 күн бұрын
Can you point to the source where KANs are MLPs,Because definitely they are
@coder3101
@coder3101 14 күн бұрын
I like how the female candidate match became 0.00
@veekshith1074
@veekshith1074 4 күн бұрын
We got machine learning 2.0 b4 GTA 6
@goodtothinkwith
@goodtothinkwith 15 күн бұрын
This was terrific. Normally I don’t like b-roll, but those are funny
@Adventure1844
@Adventure1844 13 күн бұрын
Why can't both methods be used alternately in the training process?
@thomasw4422
@thomasw4422 15 күн бұрын
He's just kan
@ilshiin6043
@ilshiin6043 15 күн бұрын
Ken => Officer K 🤖🤖🤖
@illuminum8576
@illuminum8576 15 күн бұрын
I thought that people were already using weights for activation functions lol
@haithemchethouna6363
@haithemchethouna6363 15 күн бұрын
Can you explain like im 5😅
@nutzeeer
@nutzeeer 15 күн бұрын
so basically we can have chatgpt at home sooner than expected
@ps3301
@ps3301 5 күн бұрын
Liquid network says they are superior too.
@bernardcrnkovic3769
@bernardcrnkovic3769 14 күн бұрын
doesn't seem to help solve problem. as far as i understand, the only important part of activation functions is non-linearity. at high enough granularity, shape of that function doesn't really matter. I don't see how splines which take up more storage to represent parameters would help us make models more efficient? maybe theoretically, sure. but practically speaking? where are bits going to be stored if not in VRAM?
@BrutalStrike2
@BrutalStrike2 15 күн бұрын
Kun out
@harshamesta
@harshamesta 15 күн бұрын
I just know how to centre div.
@joelpaul8650
@joelpaul8650 13 сағат бұрын
Arnold works well for bodybuilding not model buliding 💀
@cdkw2
@cdkw2 15 күн бұрын
Maybe we are reaching the limit of AI and all the big people are just mad that it isn't that good
@raphaelfrey9061
@raphaelfrey9061 17 сағат бұрын
Ai will only advance when modelled more like a brain
@VirtualShaft
@VirtualShaft 14 күн бұрын
My Little Pony
@hanskraut2018
@hanskraut2018 10 күн бұрын
POG but its boring. Could have built that shit when i was at the end of Kindergarden.
@gim8377
@gim8377 10 күн бұрын
The flowchart is laughable
@omgwtfrofltomato
@omgwtfrofltomato 6 күн бұрын
fastest way to get "dont recommend this channel to me" is by stealing another youtuber's distinct, thumbnail style. originality counts for something, bycloud.
@luisalejandroacunalopez3662
@luisalejandroacunalopez3662 15 күн бұрын
Bro, copying fireship huh?
@watcher8582
@watcher8582 14 күн бұрын
I'm a bit taken aback that you seemingly have never heard the name of the biggest name in Russian math pronounced before. Or maybe that's Markov, I'm not certain. You went to uni right? Does it maybe not come up in engineering fields when you only do Bachelor? I'd press the speaker button on peoples Wikipedia page before trying to come up with a pronunciation. Helps with credibility, given you want to take this channel more seriously you said.
@amesoeurs
@amesoeurs 12 күн бұрын
dude you dont have to copy fireship's style, a lot of us think he's annoying
@moresignal
@moresignal 11 күн бұрын
Develop your own style for the thumbnail instead of copying Fireship. Isn't it enough that you've completely copied his video style without fooling our brains into thinking he's released another one? Be honest.
@ravenragnar
@ravenragnar 15 күн бұрын
Until AI starts to ACTUALLY help humanity it is just a giant scam that lets kids cheat in school.
@lumarans30
@lumarans30 15 күн бұрын
Watch the live by OpenAI about GPT-4o. It helps blind people with their life. Also AI is very important in the scientific and medical field
@zeevdrifter2707
@zeevdrifter2707 15 күн бұрын
If all AI did was further undermine the educational system, that would still easily be enough moral justification for it's existence.
@lumarans30
@lumarans30 15 күн бұрын
Watch the OpenAI's live about GPT-4o, it helps blind people with their daily problems. Also, AI is very important in the medical field as it helps researchers with the protein folding computations and many other tasks.
@lumarans30
@lumarans30 15 күн бұрын
Watch the OpenAI's live about GPT-4o, it helps blind people with their daily problems. Also, AI is very important in the medical field as it helps researchers with the protein folding and many other problems. EDIT: My comment keeps disappearing, I think that there's a bug with KZfaq
@user-uq1sn5ob3k
@user-uq1sn5ob3k 14 күн бұрын
We really do need to change our ed system if complex machine learning programs can outperform intelligent humans
@BrownStarKachina
@BrownStarKachina 15 күн бұрын
You used videos from @3blue1brown, but why didn't you cite the source?
@akzsh
@akzsh 14 күн бұрын
2:12 bro
@BrownStarKachina
@BrownStarKachina 12 күн бұрын
@@akzsh bro what?
@eadwacer524
@eadwacer524 15 күн бұрын
Why not use an MLP as the activation function of the MLP, as the activation function of the MLP, ...? Am I doing AI research correctly?
xLSTM: The Sequel To The Legendary LSTM
11:42
bycloud
Рет қаралды 38 М.
Mamba Might Just Make LLMs 1000x Cheaper...
14:06
bycloud
Рет қаралды 113 М.
CAN YOU HELP ME? (ROAD TO 100 MLN!) #shorts
00:26
PANDA BOI
Рет қаралды 36 МЛН
КАРМАНЧИК 2 СЕЗОН 6 СЕРИЯ
21:57
Inter Production
Рет қаралды 411 М.
I Made a Neural Network with just Redstone!
17:23
mattbatwings
Рет қаралды 328 М.
How Did Llama-3 Beat Models x200 Its Size?
13:55
bycloud
Рет қаралды 100 М.
What Jumping Spiders Teach Us About Color
32:37
Veritasium
Рет қаралды 1,5 МЛН
The New Massively Parallel Language
23:37
ThePrimeTime
Рет қаралды 129 М.
Why Does Diffusion Work Better than Auto-Regression?
20:18
Algorithmic Simplicity
Рет қаралды 132 М.
the new PS4 jailbreak is sort of hilarious
12:21
Low Level Learning
Рет қаралды 514 М.
5 AI Scams That Are Wildin' Right Now
11:07
bycloud
Рет қаралды 27 М.
Palmer Luckey Wants to Be Silicon Valley's War King | The Circuit
29:13
Bloomberg Originals
Рет қаралды 589 М.
Mind-bending new programming language for GPUs just dropped...
4:01
Эволюция телефонов!
0:30
ТРЕНДИ ШОРТС
Рет қаралды 6 МЛН
Карточка Зарядка 📱 ( @ArshSoni )
0:23
EpicShortsRussia
Рет қаралды 280 М.
Как я сделал домашний кинотеатр
0:41
RICARDO
Рет қаралды 1,5 МЛН
Carregando telefone com carregador cortado
1:01
Andcarli
Рет қаралды 1,9 МЛН