#51 FRANCOIS CHOLLET - Intelligence and Generalisation

Рет қаралды 67,090

Күн бұрын

In today's show we are joined by Francois Chollet, I have been inspired by Francois ever since I read his Deep Learning with Python book and started using the Keras library which he invented many, many years ago. Francois has a clarity of thought that I've never seen in any other human being! He has extremely interesting views on intelligence as generalisation, abstraction and an information conversation ratio. He wrote on the measure of intelligence at the end of 2019 and it had a huge impact on my thinking. He thinks that NNs can only model continuous problems, which have a smooth learnable manifold and that many "type 2" problems which involve reasoning and/or planning are not suitable for NNs. He thinks that many problems have type 1 and type 2 enmeshed together. He thinks that the future of AI must include program synthesis to allow us to generalise broadly from a few examples, but the search could be guided by neural networks because the search space is interpolative to some extent.
Panel; me, Yannic and Keith
Tim Intro [00:00:00]
Manifold hypothesis and interpolation [00:06:15]
Yann LeCun skit [00:07:58]
Discrete vs continuous [00:11:12]
NNs are not turing machines [00:14:18]
Main show kick-off [00:16:19]
DNN models are locally sensitive hash tables and only efficiently encode some kinds of data well [00:18:17]
Why do natural data have manifolds? [00:22:11]
Finite NNs are not "turing complete" [00:25:44]
The dichotomy of continuous vs discrete problems, and abusing DL to perform the former [00:27:07]
Reality really annoys a lot of people, and ...GPT-3 [00:35:55]
There are type one problems and type 2 problems, but...they are enmeshed [00:39:14]
Chollet's definition of intelligence and how to construct analogy [00:41:45]
How are we going to combine type 1 and type 2 programs? [00:47:28]
Will topological analogies be robust and escape the curse of brittleness? [00:52:04]
Is type 1 and 2 two different physical systems? Is there a continuum? [00:54:26]
Building blocks and the ARC Challenge [00:59:05]
Solve ARC == intelligent? [01:01:31]
Measure of intelligence formalism -- it's a whitebox method [01:03:50]
Generalization difficulty [01:10:04]
Lets create a marketplace of generated intelligent ARC agents! [01:11:54]
Mapping ARC to psychometrics [01:16:01]
Keras [01:16:45]
New backends for Keras? JAX? [01:20:38]
Intelligence Explosion [01:25:07]
Bottlenecks in large organizations [01:34:29]
Summing up the intelligence explosion [01:36:11]
Post-show debrief [01:40:45]
Pod version: anchor.fm/machinelearningstre...
Tim's Whimsical notes; whimsical.com/chollet-show-QQ...
NeurIPS workshop on reasoning and abstraction; slideslive.com/38935790/abstr...
Rob Lange's article on the measure of intelligence (shown in 3d in intro): roberttlange.github.io/posts/...
Francois cited in the show;
LSTM digits multiplication code example: keras.io/examples/nlp/additio...
ARC-related psychology paper from NYU: cims.nyu.edu/~brenden/papers/...
This is the AAAI symposium Francois mentioned, that he co-organized; there were 2 presentations of psychology research on ARC (including an earlier version of the preprint above): aaai.org/Symposia/Fall/fss20s...
fchollet.com/
/ fchollet
/ fchollet
#deeplearning #machinelearning #artificialintelligence

Пікірлер: 178

@ChaiTimeDataScience 3 жыл бұрын

I feel MLST is like a Netflix special of the world of Machine Learning. The quality-of the podcast & production just gets better exponentially with every episode!

@MachineLearningStreetTalk 3 жыл бұрын

Means a lot coming from you, thanks!!!

@qadr_ 2 жыл бұрын

This channel is a treasure. What a great conversation that is full of ideas and insight and experience. I've finally found my passion on KZfaq.

@stacksmasherninja7266 2 жыл бұрын

This has to be my favourite video so far. I keep coming back to this talk whenever I feel like ML is hitting a wall

@Ceelvain 3 жыл бұрын

1:54:50 The idea that conciousness is at the center of intelligence is very much what consciousness wants us to think. We believe we're in control. When in fact, we're mostly not. The consciousness can query and command other parts of the brain, but those operate on their own.

@teslanewstonight 2 жыл бұрын

I like how you simplified this complex topic. 🤖🧡

@RobertWeikel 3 жыл бұрын

This was a great talk. Thank you.

@animebaka2010 3 жыл бұрын

Wait.. I wasn't prepared for this! What a content.

@adityakane5669 3 жыл бұрын

Progressive disclosure of complexity. Spot on!

@BROHAMMER_OK 3 жыл бұрын

Damn son, you made it happen

@DataTranslator Ай бұрын

This is incredible 😮 I’m delighted I found your channel 🎉

@ta6847 3 жыл бұрын

IT'S FINALLY HERE!!

@benibachmann9274 3 жыл бұрын

Thank you for yet another fantastic episode. Incredible!

@AICoffeeBreak 3 жыл бұрын

What a lovely surprise, the long-awaited episode is out! 😊 I will come back very soon when I have more time to watch and enjoy it -- I think this episode deserves a proper mind-set. 💪

@drhilm 3 жыл бұрын

This is one of these talks that will be relevant for many years. You should go back to it in 3 years from now, and review it again... when ARC challenge solution will start to come out...

@ZergD 3 жыл бұрын

Pure gold. I'm in AW about the production quality/lvl and of course content! Thank you so much!

@dginev 3 жыл бұрын

A very very eagerly awaited conversation, thanks to everyone involved!

@shyama5612 Жыл бұрын

First time listener of MLST. I've to say the host is refreshingly authentic. Enjoyed the whole pod - though some parts went over my head - exactly what you expect from listening to people more knowledgeable than you. Thanks. Keep up the great work!

@muhammadfahim8978 3 жыл бұрын

Thank you for such a awesome talk.

@sedenions 3 жыл бұрын

Excellent. You inspired me to pick up this same book by Chollet. Like I said before I'm from neuroscience but the amount of potential in this field is amazing. Thank you MLST.

@miguelangelquicenohincapie2768 3 жыл бұрын

Wow, this is really one of the best talks about DL and AGI i've ever saw, thanks for this, you just won a new suscriber

@LiaAnggraini1 3 жыл бұрын

Yay I learned a lot from his book when I started to learn deep learning. Thank you. Hopefully you can bring more people like him in the next episodes.

@_tnk_ 3 жыл бұрын

10/10 episode, and that debrief was super good. Really interesting ideas all around.

@jeff_holmes 3 жыл бұрын

My favorite quote: "Intelligence is about being able to face an unknown future, given your past experience."

@muzzletov 3 жыл бұрын

How about a whale outliving you for about 60+ years?

@martinbalage9225 3 жыл бұрын

How about an organism just born, no past = no intelligence, or do we then extend the past to the past of the process of physics, where DNA, chemistry, particles will lead you? To a big bang start of the entropy? Sounds like god again. So either you externalize the intelligence, or internalize, or integrate via a whole other question.

@EricFontenelle 3 жыл бұрын

@@martinbalage9225 What the fuck are you even saying?! lmao You are interpreting "unknown future" without the proper context. Look up "Known Knowns, Known UnKnowns, Unknown Knowns, AND Unknown Unknowns" -- you should get it then.

@martinbalage9225 3 жыл бұрын

@@EricFontenelle I apologize for creating a confusing experience for you. I was writing up something for a disproportionate amount of time, but that is not a way. Feel free to ponder on my cryptic reply, and if you find anything that you can shape into a more specific question than "the fuck you sayin", then feel free to ask. Thank you for your effort with the unknowns, but that is unfortunately quite misguided at the moment, and I rather give the benefit of doubt, and ignore the little evidence on your behalf as inconsequential so far. Again, if you have any substance feel free to follow that.

@falizadeh60 2 жыл бұрын

@@muzzletov ح

@DavenH 3 жыл бұрын

Your editing on this one is stunning. Fitting for such a guest!

@AICoffeeBreak 3 жыл бұрын

One can see the passion in the care that has gone into editing, right? 😍

@teslanewstonight 2 жыл бұрын

@@AICoffeeBreak I believe so. Close-up shots can be revealing when noticing adult faces attempting to hide excitement and glee. I love the AI/AGI community. 🤖🧡

@mobiusinversion 2 жыл бұрын

MLST is awesome. I love this cadence, fun, humor and synthesis of so many good ideas.

@angelomenezes12 3 жыл бұрын

What an awesome episode Tim!! Your editing skills are getting great! 💪

@Mutual_Information 3 жыл бұрын

This is a great listen! Makes me think... ML experts are so attuned to it's problems - huge data for only local generalization, extrapolation is super hard, challenges in translating information between domains (e.g. images vs audio vs text) - whereas the rest of the world thinks sentient robots are around the corner.

@anotherlevelofselfawareness 3 ай бұрын

Brilliant talk

@rock_sheep4241 3 жыл бұрын

The most awaited episode

@mahdinasiri6848 3 жыл бұрын

The quality of the content is lit! Nice job

@machinelearningdojowithtim2898 3 жыл бұрын

Oh my god.... here we go!!!! ❤🤞😃😃😃

@ratsukutsi 3 жыл бұрын

What a gem ladies and gentlemen!

@behrad9712 3 жыл бұрын

scientific analysis in combination with beautiful animations! great job!

@jamieshelley6079 3 жыл бұрын

I'm so glad the ideal of generalisation is becoming more popular and hopefully the flaws in deep learning to acquire this pattern will be realised.

@diatribes Жыл бұрын

This is such a brilliant episode. Can't believe I'm just finding out about this channel.

@bntagkas 3 жыл бұрын

i define intelligence as a function of being helpful to yourself and being helpful to others i believe this to be the correct definition and problems become unsolvable once you are using a wrong one or non at all

@martinschulze5399 3 жыл бұрын

Great work!

@abby5493 3 жыл бұрын

Wow best video you have ever made 😍

@matt.jordan 3 жыл бұрын

absolute legend

@CristianGarcia 3 жыл бұрын

Hype! Thanks for this :)

@videowatching9576 Жыл бұрын

Amazing show, really appreciate hearing talking about the nuances of AI, and how it could connect to applications now or in the near future or beyond. I would suggest a Playlist that in particular identifies the especially ‘applied’ versions - for example, talking through media generation models, and how those get used, or LLMs and business cases - while also being tightly connected to the AI work going on, including specifically what is enabled now, what the constraints are, what obstacles to overcome to get to enabling what kinds of capabilities, etc. For example, what’s between point A and point B to get to a place where a given creator can make an especially interesting / useful / entertaining video? For instance, including various AIs: humor generation / assessment, special effects, editing, story suggestion / modification, etc. Already certainly a lot that creators can do - but presumably way more to be unlocked. For instance, text-to-image generation allows for some pretty remarkable expression - and open question being about what text-to-video enables, or text-to-editing, etc. And then there’s the question of compounding of those creator capabilities, as well as AI’s enabling high quality recommendations of content etc.

@pani3610 3 жыл бұрын

goldmine❤️

@davidbayonchen 3 жыл бұрын

Awesome podcast. I subscribed right away. I like how you all listen and not talk over one another. Keep it up!

@teslanewstonight 2 жыл бұрын

I love this channel, amazingly inspirational interviews like these, and this awesome community. Love & prosperity to you all. 🤖🧡 #AI #AGI #Robotics

@coder8i 3 жыл бұрын

Looking forward to this one. Goes with a healthy lunch.

@dr.mikeybee 3 жыл бұрын

Thanks for the advice. I just got a copy of the book.

@bluel1ng 3 жыл бұрын

We have to clarify where DL starts and where it ends (when discussed in the context of general capabilities AI/AGI & generalization), e.g. is a system like MuZero that combines neural networks with MCTS in a RL setup still deep learning or is just the neural network that it employs internally the deep learning part to learn a world model, value function and a policy? Same question applies for neural guided program synthesis. I would even argue that the deep-learning 'interpolation' part in most complex interesting AI systems is only one ingredient, take voice assistant, a self driving car or any robotics application as examples. Regarding inter/extrapolation - I think you (Tim) are right that there is no more than tiny out-of-training-set extrapolation for MLPs, ConvNets etc (simply as by-product of relatively smooth functions for MLPs and ConvNets have an architectural form of generalization by extreme weight-sharing, e.g. translated inputs result on same outputs translated on the feature maps). But when memory comes into play, e.g. RNNs with attention or transformers I am no longer 100% sure that this restriction holds. Memory allows to load/save or select and pass arbitrary source information and map it. E.g. if a system (take transformers, or Neural Turing Machine) learns to select inputs only by position and pass and project whatever value they find in a value slot an algorithm for arbitrary inputs is created (potentially generalizing to inputs never seen during training)... we could discuss to which extend this form of generalization is found by gradient descent and how to 'motivate' a system to find compact generalizations instead of 'cheap' memoization. To me this is part of the magic that surrounds GPT. Regarding glitchy models and symbolic processing: This seems not to be a big issue in NLP .. with byte-pair encoding and reasonable sampling strategy a transformer model like GPT-* has a very acceptable 'glitch-level' - as human I struggle more abiding to syntactic and grammatical rules than GPT, e.g. closing the right number of brackets or tags in transformer generated source, of closing strings, comment etc.

@sabawalid 3 жыл бұрын

Excellent question @Yannick: is there a continuum between the two types of problem solving (discrete and continuous) - because they are in the same substrate and working together on solving problems co-operatively (presumably). Excellent point/question

@ShayanBanerji 2 жыл бұрын

KZfaq and such a HQ material. Kudos to MLST

@vikidprinciples 11 ай бұрын

Excellent

@shailendraacharya 2 жыл бұрын

Why was it hidden from me for so long? It's pure gem. Thank you so much 😍🎉🎉

@MrjbushM 3 жыл бұрын

Excellent video, very informative I love the ideas shared here, I do not have a masters degree or Ph.D. like you guys, I am only an average Java developer with DL as a hobby but I agree with Chollet regarding we need a different approach to artificial general intelligence, for Type 1 stuff DL is well suited for the reasons discussed in the video, for Type 2 we need other approaches like the DreamCoder paper explained by the Yannic in his channel, Another idea that fascinated me is the "Neural-Symbolic VQA: Disentangling Reasoning from Vision and Language Understanding paper", the idea of neurosymbolic approach I think is valid also, in the end, those approaches I think will not solve AGI, but there are a good baby step, the next baby step towards that goal. And yes I also think only neural networks are not the right path to AGI, for now, while we figure out how to do that, we need to experiment with approaches like DreamCoder paper and the ideas Chollet shared in the video.

@Larrythebassman Жыл бұрын

Well that was a wonderful video I think I learned three brand new phrases in relationship to artificial intelligence thank you very much

@marilysedevoyault465 2 жыл бұрын

You all do an amazing work, thanks for sharing. This is probably of no use, but just in case... About abstraction/generalisation... I wrote this to Mr. Hawkins yesterday before seing Mr. Chollet's video, and it might relate : " I'm sorry if it is anoying, and sorry for the mistakes, because I'm French speaking, and maybe it isn't of any use at all, cause I'm no specialist, only an artist I guess. But I'm sharing this little hypothesis : Let say all the mini columns in an area all learn the same thing, sequences of events in a chronological order. All a human went through or learned related to this area(let say visual memory) is there in every mini columns: all the sequences respecting the chronology, like if it is absolutely small layers of events stored in each mini column. Obviously there is some forgetting, but there is a lot there. Now lets talk about the predictions or creativity. When making a prediction or creating a mental image, could different mini columns jump at different layers of the chronology (different moments of life), seeking identified sequences of the same object, all this for predictions. The intelligence part would be to melt all these similar sequences from different moments of life in one single prediction ? Let say I saw a cat falling when I was ten years old, and I saw many cats falling on television, and many cats falling on facebook. Some minicolumns would bring back the cat at ten years old, other minicolumns some cat on facebook, and other minicolumns a falling cat on television, and melting all these sequences together, I could predict or hope that my own cat would fall on its feet while falling down. Is it what you mean when you say they vote ?"

@GameDevNerd 11 ай бұрын

We really value and love this content, and I am working on applying the latest machine-learning and AI theories, models, tools, etc to game and simulation development, real-time 3D, devops and other areas ❤‍🔥

@VijayEranti 3 жыл бұрын

Really great session. Imho: intelligent learnt inference loop (may use gradient descent with continous feedback of results of interpolation or extrapolation) like manual Tta or test time augmentation is an example manual baby step to program synthesis of discrete components (bengio's rim cells another learnt than manual way of tta). Hopefully having more powerful inference loop (program learnt recursively) may be the direction to go.

@mfpears 2 жыл бұрын

7:35 99% of software today will not be deep learning because it's algorithmic. 7:55 Neural networks can't represent the scalar identity function 11:07 Image models struggle drawing straight lines 11:38 Predicting the digits of pi, finding prime numbers, sorting a list 12:45 Human reasoning is guided by intuition - interpolative. Abstraction is key to generalization, and performed differently in discrete vs continuous 14:10 GPT-3 failed his ARC tasks 14:20 Neural networks are not Turing-complete. Nth digit of pi requires unbounded memory 15:03 You can train a neural network to multiply integers of fixed width together, but it will always have errors. 15:30 But you can augment them with unbounded memory and iteration... 22:30 is the data we give deep learning special?

@PeterOtt 3 жыл бұрын

its finally here! you've been teasing us with this for so long!

@ZandreAiken Жыл бұрын

Thanks!

@MachineLearningStreetTalk Жыл бұрын

Thank you!

@oncedidactic 3 жыл бұрын

So first off thank you guys for an always excellent conversation, and congrats on meeting your heroes ;) There is really nothing to disagree with at all in the convo, which is marvelous, sugary crystallized insights as usual. Two things- 1 The interpolation hypothesis needs substantial interrogation, though it’s admittedly catchy and powerful. Not to prove/disprove, but because it will teach us more about hard-to-reason-about things. E.g. can you contrive training data that gives a good approximation of extrapolation, artificially? If so, is this learnable? Etc. 2 While human brain is the obvious lighthouse for AGI, this convo seemed particularly anthropocentric. Which to me is a quiet warning bell that there is far more to be plumbed before setting foundations. As in, chasing AGI via generality via abstraction is making an engineering project out of a philosophical venture. If you are asking your model to be general, you are asking it to understand the universe. Undoubtedly there is practical insight in assessing applicability of learning and search methods, and ditching hype to do better science. But heed Keith’s mention of duality. For now I think we can only and correctly proceed in an epistemic mode (make better software) and we have a lot of room to run with modern computing. *How do you get knowledge?*. But the true game is ontological. *What is the nature of knowledge?*. And when you start asking to catalogue priors, you might as well be illuminating an encyclopedia with Plato’s Forms. (No page would be truly accurate nor would you ever finish.) For a concrete example, talking about appleness, plainly something in the putative capsule-NN-DSL-NN vein could capture familiar important qualities. (Red, round.). But we would have no sense whatsoever of the completeness of representations, just their usefulness i.e. by asking our bot to pick us the tastiest apple or find one that will maximize range from our spud launcher. But what is our sense of apples for comparison? Perceptively, sight, 400-700nm, scent, midrange mass spec, touch haptics fairly crude but sensitive to important characteristics like bruising. Should we consider ecology in deep time? One apple is a cc email of a thought a forest is having. (Or whatever.) Point being, it quickly becomes hard to assess whether our discrete problem solver with latent “apple knowledge” is either bug free or has good embeddings, because WHAT IS AN APPLE? And WHAT IS IT GOOD FOR? You and the tree might disagree. Nevertheless, our best measure of appleness is from an anthropocentric POV, which means for practical purposes we can agree what is an apple and what it’s good for (cider), until further notice. Hence, I see Chollet’s most valuable insight is embodiment, because this frees you of the bottomless pit of ontology, and forgives sloppy epistemology, since pragmatism dominates when you have to live in the world. This also happens to align with us. All that said, I love and prefer your guys’ sticking with near-context relevance and actionable ML roadmap discussion, minding real life utility and constraints. It sets you apart, really is unique afaik, giving credence to AGI talk being grounded in sota ML practitioner commentary, and is available at a disgustingly low price (ha). I’m interested, if you read all this, if you are spending any neurons on anthropocentrism as relates to models/priors, not from the philosophizing POV but out of scientific necessity.

@DavenH 3 жыл бұрын

These are excellent points.

@yasserdahou5308 3 жыл бұрын

This is just amazing, fascinating. When are you getting Ian Goodfellow? I would be so interesting too

@Hexanitrobenzene 2 жыл бұрын

Lex Fridman did a good interview with him: kzfaq.info/get/bejne/kJyiq6l_sq3InmQ.html

@JousefM 3 жыл бұрын

Nice one! What program do you use for your intro animations?

@brunoopermanis5449 2 жыл бұрын

Great episode :) Regarding NN!=turing machines - You can construct an RNN (carefully choose weights, not train it) that uses hidden state as infinite memory, since hidden state consists of continuous numbers (You can encode any integer or infinite bit string into a real number). In other words hidden state == infinte memory. So in theory RNN can be turing machines, although not in practice :) Once I came across a paper where RNN was constructed in a way that it worked as a universal turing machine, can't find the paper anymore.

@jjemine 2 жыл бұрын

good content

@dr.mikeybee 3 жыл бұрын

I wonder if there are low dimensional manifolds that alone or in combinations can create AGI? Just as deep neural networks find correlative combinations of weighted features, I wonder if complex flexible programs can emerge from mining the many manifolds of the computational universe.

@lusherenren4222 3 жыл бұрын

I’d like to see Marcus hutter on this show. Thanku

@DavenH 3 жыл бұрын

Please this.

@MachineLearningStreetTalk 2 жыл бұрын

Absolutely! Legg and Hutter are on our hit list -- we will invite them. We really hope they want to come on, it would be amazing

@fast_harmonic_psychedelic 3 жыл бұрын

Hands off mah deep lernin

@opiido 3 жыл бұрын

This is amazing - thank you so much. I could do without the background music doing the intro(a bit distracting) - but overall AMAZING

@mo_daboara 3 жыл бұрын

Hi, Chollet categorized the universal problem space into continuous and descrete somehow overlapping regions, I think if we want to get a true AGI descrete problemes will be something like morphed/superpositioned spaces that are blended into a similar stack of continuous spaces. Instead of thinking of type1 and type 2 thinking, I will argue that (at least in biology) what is happening is a mechanism to kinda seperate some solution graph out of the continuum manifold. this way those (virtual graphic segments) can be reused when facing new unknown problems.

@PhucLe-qs7nx 3 жыл бұрын

I think the disagreement between Yann and Francois regarding Interpolation / Extrapolation is that they are referring to different definition. As Yann said, a new image is unlikely to be "linear combination" of seen images, so it's extrapolation. Francois's interpolation is a bit more mainstream, that is a any new image with values inside the range of seen value is interpolation. I tend to agree with Yann view, because essentially to interpolation there is no other way assumption aside linear and smoothness. All other priors are for extrapolation. As you said in the video, per Francois's interpolation, there is nothing to learn to extrapolate, it's the unknown unknown. The only prior to extrapolate in this case is the meta-learning prior, learn to learn to interpolate.

@NelsLindahl 3 жыл бұрын

Oh I just kept watching... then I needed more coffee...

@dipamchakraborty 3 жыл бұрын

In my opinion combining Type 1 and type 2 is all about the user interface. Like how Yannic mentioned people kind of sort of learning programming but not properly, that's only because the user interface allows them to do things in an easy way. Also how abstraction allows better productivity in Ideas for a larger audience, and also creates a smoother gradient for learning.

@DanielCardenas1 3 жыл бұрын

Would appreciate a link to explanation about manifold

@EsotericAI 3 жыл бұрын

For a system to be intelligent I think it needs to be able to act in and have impact on its environment. Connecting actions to the loss between the predicted future and the actual experienced future will perhaps prepare it better for unknown and novel situations. The result might be that it learns to take actions to avoid situations where predicting the future is difficult, and this ”avoidance” migth actually turn out to work in the same way as simplification or abstraction would work.

@mjeedalharby9755 3 жыл бұрын

Yes

@DavenH 3 жыл бұрын

It's fairly easy to say what extrapolation isn't, but what can you say it IS positively? In my current view, extrapolation is enabled by, and nearly always requires, a conjugation of 3 things: a dynamics model, a state, and a simulator, within whose sandbox the dynamics shall be iteratively applied on the state. Let's take the case of an algorithm running on a computer. The dynamics are the primitive language operations, the state is the set of arguments to the algorithm (+the state of global vars if applicable), and the simulator is your computer. Everything is crisp and deterministic here. You can do a similar mapping with mathematics. The legal operations following from your axioms are the dynamics (e.g. the way a contradiction propagates back to invalidate a theorem is in the math dynamics model, the laws of logic), the state is the starting (sub)set of theorems and axioms, the simulator is usually the minds of mathematicians. But AlphaGo falls within this definition too: its dynamics model is the rules of Go (capturing, winning conditions, turn-taking) + the known interactions of higher level structures, the state is the empty board (or opponent's first move), and the simulator is their deep RL + MCTS algorithm which must have implicitly encoded the dynamics. Slightly less crisp and deterministic, but still able to generate completely new knowledge within the scope of playing Go. The more stochastic the dynamics (poker or scrabble say), the less deeply the simulator can concretely extrapolate -- it can only output distributions, usually with ever-higher variance with extrapolation depth, as that variance would grow exponentially as it iteratively compounds; past a point the var would become so great that any output distribution would tend to an uninformative uniform. I'm seeing this tripartite dynamics/state/simulator pattern everywhere now. So where does GPT-3 fall... GPT-3 seems to have baked in some system dynamics, and can in theory be performing limited simulation, bounded in extent by the sequential processing that 96 transformer layers can accomplish. So it does seem to extrapolate within domain. At least, in some cases it's not obviously regurgitation.

@vtrandal 2 жыл бұрын

Many good things happening including 2nd edition of “Deep Learning with Python” by Francois Chollet via MEAP (Manning Early Access Publications).

@arnokhachatourian8928 3 жыл бұрын

I think Chollet and Walid Saba argue for much of the same thing: a need for type 2 thinking or understanding combined with the type 1 signal processing power of neural nets. Interesting that they both see graphs and/or structure as part of the solution to type 2 thinking as well.

@jon0o0o0 3 жыл бұрын

When he is talking about the two types of thinking it kind of reminds me of Daniel Kahneman of his theory of two types of thinking "Thinking, fast and slow" :D

@jon0o0o0 3 жыл бұрын

I wonder if he was inspired by Kahnemans theory as they are pretty similar. Fast thinking meaning intuitive thinking e.g. stories you make up from past memory and slow thinking meaning extrapolating, reasoning about things.

@CristianGarcia 3 жыл бұрын

Having watched the DreamCoder video from Yannic this week paid off 😁 Amazing content! I have an open question I wish I could ask Chollet: "Do you believe that you can generally solve the ARC with a system that only trains on the ARC or does it require a system that (like us humans) trains on a much larger domain and then "fine tunes" on the ARC?".

@MrjbushM 3 жыл бұрын

Interesting question

@TimScarfe 3 жыл бұрын

Yes very interesting question. Chollet is fine with human knowledge priors in the algorithm I think.

@badhumanus 3 жыл бұрын

I don't think any formal test is needed for AGI. If a robot can walk into a generic kitchen and make a ham and cheese sandwich or a cup of coffee, it has GI. Just saying.

@DavenH 3 жыл бұрын

@@badhumanus I doubt that's a sufficient test either. A narrow set of skills would suffice.

@aldousd666 Жыл бұрын

The debate about interpolation, i think it's not actually a problem. The formula we're approximating is derived from interpolation of the training data. If the training data is representative, then we can just take the formula and extrapolate. It won't be 100% accurate. It just has to beat a coin flip to be an advantage. And that's purchase on new territory. A seed for the next experiment.

@sabawalid 3 жыл бұрын

Excellent observation about trying to write a discrete algorithm to work on MNIST digits... it is sort of the opposite of trying DNNs on discrete problems. I have tried the former: it might do a decent job, but it is not the right approach. Excellent point.

@mfpears 2 жыл бұрын

45:00 Can you point to any mechanism in the brain that would support System 2 type thinking? I thought there were pretty much just neurons.

@dr.mikeybee 3 жыл бұрын

Wherever I go in my mind, I meet Plato coming back. -- Scott Buchanan

@ThichMauXanh 3 жыл бұрын

So how do you explain human's brain to do discrete reasoning while being simply a bunch of neurons wiring together?

@DavenH 3 жыл бұрын

Again, wonderful and though-provoking episode. Playing devil's advocate as usual, here are some more thoughts -- I find much to be skeptical about with regard to the interpolation / manifold hypothesis as I understand it, as it's not hard to make logical mappings from what DNNs are capable of, and indeed GPT-3 is likely doing, and programs (with limited memory) which nobody would agree are interpolating training data. I think there's a creeping mismatch of conceptions somewhere which is leading some to simplistic conclusions and will force them to eat crow many times over (kind of like "perceptrons can't even solve XOR, NNs suck!") - IMO where the misconception may lie is the idea that NNs can only manipulate topological volumes connected by nice, densely sampled bridges of data points. Or at least, that all incoming data maps to such a singular well-connected manifold. If true, all this strongly limited-by-interpolation stuff would make sense to me. However, if you consider that NNs can manipulate many many disjoint topological islands and bring them together on certain dimensions, separate them again, successively over 100s of layers, this starts to look a lot more like the work of classical computation. If classical computation is also roped into the interpolation idea, then I'm not sure what its implied limitations are. A couple of remarks on that subject. There is clearly a spectrum of expressive power with limited computation and limited memory rather than a binary on/off (Turing-Complete or not), and since nothing physical is TC including supercomputers and human brains, this is not an appropriate argument against DNNs. There was a point where it was brought up by one of your guests so fair play, but it seems that this argument is a bit of a distraction now. It is not to say that comparisons with extant computing systems are unhelpful; they lie elsewhere on the spectrum, and certainly mechanisms that introduce a large sandbox of memory for NNs to store and access representations make a lot of sense. But, when thinking about memory, consider that large models in the 100s of billions of params, have a huuuge amount of stateful "memory" to use -- the values of the activations themselves. Yes it's ephemeral, with our present architectures, as these values are only available as the forward pass progresses. In that way it's analogous to stack space. Heap space is still kind of lacking outside of NTMs. The point is that DNNs do possess a logical workspace for successive calculations to happen, albeit ephemeral and bounded, and that opens the door IMO to some flavour of non-interpolative computation happening. Final thought, on the no-free-lunch theorem. This does not apply generally. It applies _only_ when comparing optimal solutions in the solution space. When a system is non-optimal on all measurement axes, by definition there must be a system that can dominate it. Likewise, it needs to be optimal on only one measurement axis to be indominable. That curve that defines optimal tradeoffs between conserved quantities is known as the Pareto Optimal Curve (or Frontier). One notable example is momentum / location precision tradeoff governed by the Uncertainty Constant. My point is that, particularly for messy optimization tasks, optimality on any axis is in practice impossible to prove, and none of the known neural architectures or cobbled systems like NARS or OpenCog are going to be actually on the POC, and so the NFLT is going to be technically inapplicable -- though in practice it is probably still an okay guide. With this in mind, we should not dismiss the possibility of an AGI that is more competent than any of our fined-tuned -- yet still suboptimal -- systems. Like, before Cooley/Tukey invented FFT we thought multiplying huge numbers had to take O(n²) time and space, but through some genius tricks it now takes O(n log n) on both. In general, I'd be careful making arguments which rely on asymptotic properties; the conclusions tend to degenerate when the relevant extreme (like optimality) is relaxed. I think it's also worth noting (and not to suggest anyone is arguing against this) that while an AGI system must sacrifice optimality in all but one task -- and very likely all -- that does not preclude non-optimal yet still superhuman competence on all the measurement axes we care about. To me, that's sufficiently general. And then, what's to prevent a robustly general purpose, but completely not-optimal-at-anything-specific meta-process from slowly implementing task-optimized tools at will, much like we do? Okay, that certainly broke my hyphenation budget! Now gimme that free lunch.

@nomenec 3 жыл бұрын

DavenH, thank you for your detailed and thoughtful questions. I'd like to clarify that I'm not arguing the interpolation/extrapolation divide, if there is one, stems from computational class; I don't (yet) know and what the computational complexity of "extrapolation" is. My focus in the "Turing-Complete" debate is, in part, to communicate what you expressed yourself: "It is not to say that comparisons with extant computing systems are unhelpful; they lie elsewhere on the spectrum, and certainly mechanisms that introduce a large sandbox of memory for NNs to store and access representations make a lot of sense. [NNs are] analogous to stack space. Heap space is still kind of lacking outside of NTMs." Moving from bounded to unbounded space/time computation models results in qualitatively different algorithms. This confers practical differences upon algorithms designed for Turing complete systems even when running on practically bounded systems because the algorithms are fundamentally different. Here is a quote from you that hits the key difference w.r.t NNs: "Yes [NN memory is] ephemeral, with our present architectures, as [the activation] values are only available as the forward pass progresses." An intelligent system learning algorithms for a Turing complete model can find fundamentally different optimal algorithms than one learning algorithms for a Finite State Machine (NNs have fixed (unrolled) node count hence their "stacks" are bounded ergo they are Finite State Machines). If more researchers would simply accept, if not embrace, that math fact, we might direct more time and effort towards researching NNs augmented with unbounded (in a computational model sense) read/write memory and iterations (computational time steps). That would be equivalent to a Turing Machine where the FSM part of the of TM (ie the "transition function") was an NN. The longer we continue to obfuscate the fact that NNs are not Turing Complete (by sneaking in things like infinite precision floating point registers) the longer we delay progress on next generation Turing complete computational models and practical systems that approximate them (with expandable memory and unbounded running time). Regarding the no-free-lunch theorem, let's first recall how Chollet employs it with regard to the measure of intelligence: "To this list, we could, theoretically, add one more entry: “universality”, which would extend “generality” ... to any task that could be practically tackled within our universe. [Considering the No Free Lunch theorem] we do not consider universality to be a reasonable goal for AI. ... The central message of the No Free Lunch theorem is that to learn from data, one must make assumptions about it - the nature and structure of the innate assumptions made by the human mind are precisely what confers to it its powerful learning abilities." In my opinion, he is invoking the NFLT for three purposes: 1) Universality should not be a requirement of intelligence 2) Intelligence measures should be task specific 3) Optimality for a task requires task specific knowledge I don't recall him (or any of the hosts) arguing that the NFLT implies that an AGI cannot exceed human intelligence on all tasks. If so, I don't agree with that. I think it is entirely possible that an AGI can radically exceed human intelligence on all tasks. That said, I do not think intelligence is "all powerful" either. In other words, I'm not worried that an embodied AGI can twinkle its red robot eyes in just the right way as to crash my brain. Such power is fantasy speculation at this point.

@machinelearningdojowithtim2898 3 жыл бұрын

Hello Daven, really appreciate your engagement and thoughtful commentary as always my friend. Keith commented eloquently on the later part of your question re: computability. Remember that TC just means that a computational system could run any program which a turing machine could run. Clearly NNs are not turing complete, and say, JavaScript is. It might take JavaScript an awful lot of time to compute your arbitrary digit of pi, but an NN never could. On the matter of “bridging toplogical islands”, what a delicious thought! The first intuition I have is that islands is the right way to think about it. NNs sparsely code data onto many different disconnected manifolds (think typical tSNE projection). I don’t think there is any bridging between them, the data point falls on one of the manifolds. What happens to the output when you do a linear combination in the input space between points from two different manifolds in the latent space? Does it end up in “no mans land” or does it get projected to the nearest manifold? You hinted that there might be some kind hierachy of manifolds, I don’t think that is the case - certainly there is an entangled hierachy of transformations to get each point to their respective manifold mapping and some of them might be shared. Will think more on this and add more later on. Thanks for the great comment

@DavenH 3 жыл бұрын

@Machine Learning Dojo with Tim Scarfe @Keith Duggar Thank you Tim and Keith very much. The point is well made, and quite clear, that NNs don't do much of what computers do. The strongest position I'm advocating is that gradient-optimized NNs can still approximate what small programs running on limited stack space can do. That proposition is especially vulnerable to what Keith says about the qualitative difference in algorithms each can produce. I'm curious about this. The empirical differences are clear, at least most of the time... GPT did open my mind though. Not that it was producing compact algorithms to generate accurate digits of pi, but that it was using some kind of messy logic or computation for which we don't have a good measure of the boundaries. You guys have evidently done a lot more reading on the subject than I, so it's quite possible that my intuitions are not mature yet.

@rohankashyap2252 3 жыл бұрын

The most Turing complete episode on MLST.

@nomenec 3 жыл бұрын

Hilarious comment, Rohan! I honestly LoL'd in real life.

@hideyoshi9716 3 жыл бұрын

Could you please set up auto translation ? 😂😂😂 The most interesting session. Thanks !! 😃😃😃

@ViktorGrandgeorg 2 жыл бұрын

"Could you train a NN to predict the nth digit of π? - No, you couldn't." But you could train a NN to write a program which can predict the nth digit of π! When you think about an egg don't forget the chicken.

@jeff_holmes 3 жыл бұрын

I was thinking about what Tim said in terms of separating intelligence and consciousness. I have always thought the same, I suppose. However, Yannic's comments about conscious introspection made me wonder if a truly intelligent being must always be "on" - or conscious. Currently, we create "intelligent" programs or algorithms and then train them or ask them to reason about something. But otherwise, they are inactive ("unconscious"). There is no idle thinking or pondering that occurs. Are we missing something?

@DavenH 3 жыл бұрын

Introspection and self-attention do not need anything qualitative to function, so there is no requirement of consciousness.

@DistortedV12 3 жыл бұрын

One thing that troubles me is “a perceptive dsl”. The whole point of deep learning is to learn these perceiving functions yet these functions or “core knowledge priors” are supposed to be composable in code? Has anyone turned an object detection algorithm to raw composable code?

@nomenec 3 жыл бұрын

I received a private channel comment from a listener: "Kahneman's System 1 and System 2: en.wikipedia.org/wiki/Thinking,_Fast_and_Slow#Two_systems are really the same as Chollet's Type 1 and Type 2 thinking. The fact that biology already arrived at such a solution may indicate that we will always have both types of computation." I thought would be worth posting here along with my response: Indeed, they close if not the same. For whatever reason, I tend to fall back on reasoning about computational systems (both model and actual). Given the response speed vs neuron timings, it seems clear that Kahneman's System 1 is the result of a single feed-forward pass through a biological network. As such, it is directly analogous to an artificial neural network which according to Chollet is Type 1 "interpolation". On the other hand, the much slower execution of System 2 permits iterative computation analogous to a Turing Machine which can implement any effective computation which Chollet claims is the heart of Type 2 reasoning. I would also argue that System 2 is what we directly experience via "consciousness" and that experience directly informs us that "thinking" has the very characteristics of computation we design into our computing machines and languages. In short, I think I agree with you. That said, there are interesting open questions. For example, one I ponder often is this. Does the real nature of biological neurons, specifically the fact that they directly utilize the continuous properties of matter (waves and fields), afford them any aspects of hypercomputation? After all, the so called "real computer" (1) which is the idealized computational model of an analog computer, supports hypercomputation. The answer to the above is not the kneejerk "of course, that's quantum computing" one so often sees. All our models of quantum computation do not support hypercomputation. Indeed quantum complexity theory tells us that quantum compute systems sit nicely well within PSPACE (2) so while they can execute some tasks "faster" that classic compute, they do not expand beyond classic computability itself one iota. If biological systems execute some aspect of hypercomputation, it is by some as yet undiscovered or unproven mechanism ala Penrose and Hameroff's Orchestrated Objective Reduction (3). (1) en.wikipedia.org/wiki/Real_computation (2) en.wikipedia.org/wiki/Quantum_complexity_theory (3) en.wikipedia.org/wiki/Orchestrated_objective_reduction

@JKjr328 3 жыл бұрын

I think the rough identification between these different framings of the two basic reasoning modes is at least intuition building, regardless of whether it turns out to be a true dichotomy in all (read human centric) biological and artificial computation systems

@dr.mikeybee 3 жыл бұрын

In conversational ai, do we have examples of responses that are accepted or denied. If denied, is the response coming back broken into parts of speech, rearranged, fed through other models that choose actions, query graph databases, run arithmetic routines, or run various other algorithms, then checked again for acceptance? Rinse and repeat? Accepted responses can be added to supervised training sets. I bet Google and Amazon are doing this with their vast resources. Personally, I believe we users are going to need to share model access on ports; so that agents can query those models. We have plenty of compute as a society, but we don't share it. If I run one model on my GPU and you run another and we share, we each have two available models for an agent to access. We are going to need hundreds of available models to create AGI until we can afford to create models with trillions of parameters. I'm hoping to setup a web site soon that allows people to register their shared model ip addresses and port numbers.

@zhangcx93 3 жыл бұрын

i think why dl cannot do general discrete learning well is fundamentally because: 1. the activation they use: continues activations 2. they're syned system where all "neurons" fires at the same time step. Dl choose this way because back progragation works only with continues values and parallel computation works in sync. while our brain is: 1.using binary activation, in a discrete value space. 2.all neurons fires asynchronously, in a continues time space. at the same time: the world we're interacting with is continues in time, which our brain's learning algorithm heavily rely on.

@badhumanus 3 жыл бұрын

Interesting. Deep learning or any kind of function optimizing system is a deadly poison to any attempt at generalization. Interpolation is not generalization, not even close. I think there are two types of generalization. The first is the most important; it is the ability to instantly perceive an object, pattern or scene that one has not seen before. One can perceive its shape, contours, color, position, distance, etc. The second type of generalization is the ability to instantly notice one or more similarities between an object one has seen before (and committed to memory) with another that one is seeing for the first time. In my opinion, the main key to generalization is the timing of sensory signals: they can be either concurrent or sequential. The brain uses spikes for a very good reason. Good luck.

@vslaykovsky 3 жыл бұрын

Could it be that topological (type 2) thinking somehow emerges from geometric (type 1) thinking the similar way as complex pattern recognition emerges from a seemingly simple concept of interconnected neurons?

@nomenec 3 жыл бұрын

In my opinion, that is possible if not likely. That said, the emergent discrete/topological behavior remains qualitatively different. For example, consider that the "square waves" typical in CPUs are of course not precisely square. At the finest scale they are noisy continuous signals composed of electron/hole quantum waves. However, the digital operation of the CPU at higher scale is best modeled mathematically as a abstracted "discrete" system. It's this ancient wave-particle or discrete-continuous duality we find everywhere in the material and conceptual worlds.

@arnokhachatourian8928 3 жыл бұрын

I think so, but the interesting question is how? If it is just a matter of scale, we're doing just fine, if not, we need some other advancement to attain intelligent systems.

@XOPOIIIO 3 жыл бұрын

It's hard to argue for one side or the other, considering how few evidences there are. But personally I become more inclined to believe that DL can extrapolate successfully when I watched what DALL E is doing, it's basically GPT-3, even weaker, but results are more demonstrative.

@TimScarfe 3 жыл бұрын

Natural data sits on an interpolatable manifold

@fredt3217 3 жыл бұрын

I only watched the first 20 minutes or so cause I got to get ready for work to pay 100% of my salary in taxes... which makes me wonder why I even bother... but as far as inference goes it is simply the process of when electrical patterns are inputed into the association processes and then the perceived state. If the pattern is similar it will assume it is similar until a negative confliction occurs. For example if you classified cats once it has a good video of one cat to draw on it can find them all. It will also classify lions as cats until you tell it the big ones are different. Thus now a negative association to classifying, or inferring, the same. So it will now split them. Inference and generalization simply allows us to grab a similar electrical pattern and assume it is the same until a negative conflict occurs. Such as something from our past... like somebody tells us big cats are lions or we try something and we then know it does not work. Like petting a lion and cat while thinking they are the same house pet. The problem you have is you have no ability to find the conflicts. Thus no ability to infer or generalize since it will think all cats are the same. Or where you have to create a program to do all this like the association processes and our mind naturally does it through the perceived state. There are numerous types of inference and generalization but at the core the process is always the same. Find a similar pattern, infer it is the same until a conflict occurs. But we can predict these conflicts through the rest of the minds processes. So that is where you will get stuck. And just like you always do because you want to build complex processes that happen in the mind without building anything close to something that resembles the mind. Which means you are stuck doing basic tasks.

@dougb70 3 жыл бұрын

1:46:41 - you guys are overthinking this. Step backwards. "Simulate" a cortical column for narrow intelligence. Map markov blankets to the system of intelligence for general intelligence.

@DistortedV12 3 жыл бұрын

Bro I just binge watched the whole thing. Are we all nerds?

@nomenec 3 жыл бұрын

Yes we are! And that is a wonderful thing ;-)