Discovering Symbolic Models from Deep Learning with Inductive Biases (Paper Explained)

Рет қаралды 46,018

Күн бұрын

Neural networks are very good at predicting systems' numerical outputs, but not very good at deriving the discrete symbolic equations that govern many physical systems. This paper combines Graph Networks with symbolic regression and shows that the strong inductive biases of these models can be used to derive accurate symbolic equations from observation data.
OUTLINE:
0:00 - Intro & Outline
1:10 - Problem Statement
4:25 - Symbolic Regression
6:40 - Graph Neural Networks
12:05 - Inductive Biases for Physics
15:15 - How Graph Networks compute outputs
23:10 - Loss Backpropagation
24:30 - Graph Network Recap
26:10 - Analogies of GN to Newtonian Mechanics
28:40 - From Graph Network to Equation
33:50 - L1 Regularization of Edge Messages
40:10 - Newtonian Dynamics Example
43:10 - Cosmology Example
44:45 - Conclusions & Appendix
Paper: arxiv.org/abs/2006.11287
Code: github.com/MilesCranmer/symbo...
Abstract:
We develop a general approach to distill symbolic representations of a learned deep model by introducing strong inductive biases. We focus on Graph Neural Networks (GNNs). The technique works as follows: we first encourage sparse latent representations when we train a GNN in a supervised setting, then we apply symbolic regression to components of the learned model to extract explicit physical relations. We find the correct known equations, including force laws and Hamiltonians, can be extracted from the neural network. We then apply our method to a non-trivial cosmology example-a detailed dark matter simulation-and discover a new analytic formula which can predict the concentration of dark matter from the mass distribution of nearby cosmic structures. The symbolic expressions extracted from the GNN using our technique also generalized to out-of-distribution data better than the GNN itself. Our approach offers alternative directions for interpreting neural networks and discovering novel physical principles from the representations they learn.
Authors: Miles Cranmer, Alvaro Sanchez-Gonzalez, Peter Battaglia, Rui Xu, Kyle Cranmer, David Spergel, Shirley Ho
Links:
KZfaq: / yannickilcher
Twitter: / ykilcher
Discord: / discord
BitChute: www.bitchute.com/channel/yann...
Minds: www.minds.com/ykilcher

Пікірлер: 85

@IproCoGo 4 жыл бұрын

Yannic, thank you for presenting these papers. Your efforts are helpful, to understate the matter. With the number of videos you produce, this project must be time consuming. Know that it is worthwhile.

@YannicKilcher 4 жыл бұрын

Thanks, I'm enjoying it so far :)

@sorontar1 3 жыл бұрын

I have found your channel by accident and I got my mind blown. Thank you for the effort!

@bautistapeirone6345 4 жыл бұрын

Thanks for your hard work, and excellent explanation. It really clears out things viewing them graphically. Keep on like this

@welcomeaioverlords 4 жыл бұрын

Very clear explanation of a fascinating application area. Thanks Yannic!

@annestaples7503 4 жыл бұрын

This was a great walk-through of the paper. Thank you for taking the time to do this. I wish all papers had similar walk-throughs!

@freemind.d2714 3 жыл бұрын

hope future ML will do this automatically

@KyleCranmer 4 жыл бұрын

Wow, what a surprise... this is great... thank you!

@jasdeepsinghgrover2470 4 жыл бұрын

Amazing paper... Did you guys try it on physical systems like double pendulum??

@smnt 4 жыл бұрын

Great work Kyle, I followed you when I worked on the LHC. Always loved your work. I almost applied to post-doc with you but was too scared, haha.

@zh4842 4 жыл бұрын

good job Kyle, where I can find the git repos to do some experiments, thanks in advance

@carllammy8020 4 жыл бұрын

Thanks for walking me thru differing disciplines.

@sahandajami3171 3 жыл бұрын

Many thanks Yannic. Your explanations are really helpful.

@DistortedV12 4 жыл бұрын

oooo I was just going to ask you about this!! Thanks Yannic!

@mau_lopez 4 жыл бұрын

Very interesting paper and an excellent explanation. Thanks a lot for sharing!

@phillipreay 4 жыл бұрын

This is so awesome! I’m very curious to see how the latent vector space derived by neural graphs is shaped and this really is first light in that direction for me. Thx!

@MyU2beCall 3 жыл бұрын

ThX Yannic. Yet another great Vid !

@sun1908 2 жыл бұрын

Thank you Yannic. Nicely explained.

@niraj5582 3 жыл бұрын

Interesting paper indeed. You explained it really well. Thanks a lot.

@cmtro 4 жыл бұрын

Excellent ! Good explanations.

@crimythebold 4 жыл бұрын

Very interesting. Very nice summary of the paper

@user-kg7ex2dm8g 4 жыл бұрын

Great Explanation👍

@Basant5911 10 ай бұрын

Amazing explanation bro. Enjoyed more than netflix.

@ifernandez08 4 жыл бұрын

Thank you!

@spsharan2000 4 жыл бұрын

Really good video! Something new everytime :) On which device and app are these recorded on btw? 🤔

@YannicKilcher 4 жыл бұрын

I do recording with OBS

@freemind.d2714 3 жыл бұрын

very good paper!!! and very good video as well!!!

@elsayedmohamed9706 4 жыл бұрын

Thank you ❤

@edeneden97 2 жыл бұрын

if we have 2 vertices and an edge between them, the force (in 2 dimensions) is not enough to update the vertices as the force direction is opposite for each vertex. Therefore we either need more than 2 hidden dimensions to represent the force or use 2 edges (directed edges) for each pair that only apply on the destination vertex for example

@__Somebody__Else__ 4 жыл бұрын

Hey Yannic. I am a big fan of your channel, thank you very much. You really hit the mark with your content: Current research papers reviewed with a focus on the core principles of the contributions and with some context added to the respective strain of literature. This is perfect for practitioners like me - for staying up to date and to get inspiration on where else you can apply deep/machine learning. A question regarding the current paper: I don´t get why the notion of graph (network) is important here. They just seem to stack and combine neural networks and other computation in an intelligent way. For me the graph seems only to be a nice illustration for the independence of the forces, but I don´t get why graph principles are relevant here. Probably I am missing a point. Do you have an explanation?

@YannicKilcher 4 жыл бұрын

You're right, ultimately it's just a weight sharing scheme and a computation flow. but that's conveniently described as a graph, so people call it graph network.

@youngjin8300 4 жыл бұрын

You’re something 👏

@jabowery 4 жыл бұрын

Discovery of the planet Neptune was a latent identity (not to be confused with the article's "latent vector") imputed from gravitational perturbations. This, of course, would require regressing the topology of the GNN itself.

@ScriptureFirst 2 жыл бұрын

QUESTION. Does the graph somehow make the solution more examinable? Are graphs potentially an answer to black box nn? Cross reference paper LM’s are graphs. Thank you for considering.

@n.lu.x 4 жыл бұрын

Thanks! Any chance of going through "Memory transformers" soon?

@ScriptureFirst 2 жыл бұрын

41:30 it looks like the basic structure of this equation was determined in the nn setup & the coefficients were the nn output gold

@drdca8263 4 жыл бұрын

I’m a bit confused about how the directionality of the information on the edges works. If the information produced for an edge is the force, then, when adding it on one object or the other, the force vector should be in opposite directions, right? How does this system account for that? Does it subtract the edge’s value in one of the sums and add it in the other?

@rcpinto 4 жыл бұрын

Each bidirectional edge is actually 2 unidirectional edges, so there is no ambiguity.

@drdca8263 4 жыл бұрын

Rafael Pinto thanks!

@dreznik 4 жыл бұрын

what tool do you use to draw next to the paper? and how about capturing your screen

@YannicKilcher 4 жыл бұрын

I use OneNote and OBS

@eladwarshawsky7587 3 ай бұрын

I was thinking that we could try to learn a graph neural net that relates X, Y coordinates to the corresponding RGB values. If we could create A relationship between the two modeled as a single simple equation, that could make for huge compression gains. Please tell me why or why not it wouldn’t work.

@frun 3 ай бұрын

Physicists could do this for QM if wavefunction did not represent an ensemble of similarly prepared systems.

@AaronKarper 4 жыл бұрын

That still sounds like symbolic regression with extra assumptions. Is the neutral network actually necessary, it seems that fitting the symbolic equation against the NN could be symbolic regression against the data directly. Or am I missing something?

@marcovirgolin9901 4 жыл бұрын

I am wondering the same, but I'd say it's smart to use grad desc to have NNs encode the edges, as to provide info on how symb.reg. should find the inner formulas. This work also reminds of Schmidt and Lipson's Science paper "distilling free-form natural laws from experimental data". Still gotta read this paper myself though

@malse1234 4 жыл бұрын

Thanks, this is a great question. It comes down to the fact that symbolic regression (with current techniques) scales very badly with increasing number of features, so breaking it down with a NN makes problems tractable. I explain this in more detail in the reddit thread: www.reddit.com/r/MachineLearning/comments/hfmqnx/d_paper_explained_discovering_symbolic_models/. Cheers! Miles

@malse1234 4 жыл бұрын

@@marcovirgolin9901 Thanks for the question. We actually use Schmidt and Lipson's algorithm "eureqa" for our symbolic regression. One way of thinking of this work is an extension of their algorithm to high-dimensional spaces.

@marcovirgolin9901 4 жыл бұрын

@@malse1234 thank you for your answer Miles, and congrats for this beautiful work.

@malse1234 4 жыл бұрын

@@marcovirgolin9901 Thank you!

@vishalpachpande5921 Жыл бұрын

Sir , where can I find these types of papers

@sarvagyagupta1744 4 жыл бұрын

Hey. Great video like always. I have a question though: The datasets that they are using, is in itself a simulator and using different formulas to simulate the particles. And, in the end, the neural networks are outputting the same formula that is already being used in these simulators, according to which we are calculating the loss function. So we are not doing anything new here, it's more of a reinventing the wheel problem. Right?

@EyeIn_The_Sky 4 жыл бұрын

I believe he said that the loss function was from actual observed data in the real world rather than simulations by other neural networks or some other technology.

@YannicKilcher 4 жыл бұрын

I think the simulations are run with other equations than what comes out.

@sarvagyagupta1744 4 жыл бұрын

@@EyeIn_The_Sky did he? I think it's the simulation

@sarvagyagupta1744 4 жыл бұрын

@@YannicKilcher really? Two different equations lead to the same simulation? It could be very much possible but that's interesting. So do you think that with different initial settings, we might be able to get different results from the GNN?

@cameron4814 4 жыл бұрын

damn thats some crazy shti

@uyenhoang1780 3 жыл бұрын

Sorry, but I find the most important problem is that what the components in L1 are not clear yet, and the details of applying standard deviation and it is a bit confusing and not clearly described. Can you explain over that part

@cw9249 Жыл бұрын

this is insane

@teslaonly2136 4 жыл бұрын

Please review this paper: Locally Masked Convolution for Autoregressive Models

@jonathansum9084 2 жыл бұрын

on 42:05, I think (r-1) can not become (1-1/r).

@Tehom1 4 жыл бұрын

Is co-author David Spergel the astronomer David Spergel? I thought it must be two guys with the same name until the topic of dark matter came up (astronomer Spergel's specialty).

@YannicKilcher 4 жыл бұрын

I can imagine, but I don't know

@IamLupo 4 жыл бұрын

I use Eureqa, this software has been made a long time ago and is based on evolution to give me formula on data. en.wikipedia.org/wiki/Eureqa

@jabowery 4 жыл бұрын

They actually used Eurequa: "We score complexity by counting the number of occurrences of each operator, constant, and input variable. We weight ^, exp, log, IF(·, ·, ·) as three times the other operators, since these are more complex operations. eureqa outputs the best equation at each complexity level, denoted by c."

@herp_derpingson 4 жыл бұрын

The "From Graph to Equation" part of this paper is a bit disappointing. I was expecting some differentiable method. Also, I doubt much of this work can be generalized to non-particle system problems. . Isnt reducing a neural network to a equation, an extreme form of network pruning and distillation? . Symbolic regressions are more than just equations. Technically computer code is also a symbolic regression from user input to computer output. If we can build an automation to automate all automatons, that would be AI complete. It would also make every human unemployable.

@malse1234 4 жыл бұрын

These are good questions, thanks. "[symbolic] differentiable method" I wish there was a differentiable method of symbolic regression as well. Currently it seems like embedding discrete search spaces in a differentiable function is difficult. For now it seems best to learn subcomponents of the model using a NN, then approximate those subcomponents with traditional genetic programming. "[doubt generalization] to non-particle system problems" The Dark Matter simulation is not particle-based, it is a grid of densities. We look for "blobs" of dark matter (dark matter halos) and consider those to be the nodes in the GNN where the integrated density of that blob is the mass. More generally (which our work in the near future will show), the symbolic regression strategy can be applied to NNs other than graph networks, so long as you have a separable internal structure. We try to explain this in a bit more detail in the paper. "network pruning" The symbolic form gives you a few advantages: (i) interpretability, (ii) generalization, (iii) compactness. I think pruning could arguably only give you (ii)? Though using ReLU activations, you still only have linear extrapolation. We do have L1 regularization in our GNN, yet the symbolic form still generalizes better. It's very curious how simple symbolic equations generalize so well. Let me know if you have any other questions. Thanks, Miles

@herp_derpingson 4 жыл бұрын

@@malse1234 It is rare for paper authors to visit. Thanks for clarifying.

@malse1234 4 жыл бұрын

@@herp_derpingson No problem, feel free to email me if you have any other questions. Cheers!

@jasdeepsinghgrover2470 4 жыл бұрын

Did anyone try this on double pendulum??

@herp_derpingson 4 жыл бұрын

It wont work on a double pendulum as the as the acceleration components are not independent.

@jasdeepsinghgrover2470 4 жыл бұрын

@@herp_derpingson right... But maybe the NN learns some approximation instead as normal reactions at joints are also forces and free body diagrams consider them. Maybe it learns some combined force instead.

@bluesky3149 4 жыл бұрын

Who helps you to read these papers?

@YannicKilcher 4 жыл бұрын

I have a bunch of undergrads in the basement and they get a cookie for each video-script they produce

@bluesky3149 4 жыл бұрын

@@YannicKilcher haha i meant like are some professors involved who can give you more context or fill in some knowledge gaps?

@jryanconnelly 4 жыл бұрын

Simply awesome, thank you. Tangential thought.... Evolution of ideas in humanity is an ordered heuristic dialogue (yeah I just made that up) that is sort of Bayesian in nature...? I dunno, seems like a way to frame Hegel's dialectic in a mathematical way sorta...

@herp_derpingson 4 жыл бұрын

Are you a GPT?

@Phobos11 4 жыл бұрын

Herp Derpingson exactly 🤣

@jryanconnelly 4 жыл бұрын

@@herp_derpingson GPT?

@herp_derpingson 4 жыл бұрын

@@jryanconnelly Generative Pre-trained Transformer

@AbgezocktXD 4 жыл бұрын

Your Deltas are an abomination xD

@aBigBadWolf 4 жыл бұрын

Just before the Neurips blind-review process starts, the authors of this paper go through great lengths to publicize their work on social media with pretty pictures, an interactive demo, a nice blog post, and lots of vitamin-b people retweeting or sharing their work. Miles Cranmer, Alvaro Sanchez-Gonzalez, Peter Battaglia, Rui Xu, Kyle Cranmer, David Spergel, and Shirley Ho, you are trying to sway and influence the reviewers and are doing this shamelessly with full intention. This is unscientific and I will remember your names.

@malse1234 4 жыл бұрын

It is very true that double blind review and preprint servers seem to be opposites, despite being such a common practice to post to arxiv after submission. I wouldn't say preprint submission by researchers is generally to sway reviewers, but just to publicize work earlier. It's important to note that with only ~20% papers accepted to big ML conferences, annual submission deadlines, and the very fast pace of research, work might be out of date if one waits until work is finally published. I think this is why posting to arxiv before acceptance is so common in ML research. And in posting we would like many people to read it, hence the blog/etc. I'm not sure if there are solutions to the preprint trend in ML given the slow publication process contrasted against the fast research pace, but I'm curious if there are options.

@YannicKilcher 4 жыл бұрын

Just my personal opinion: Double-blind review is half broken and I would like to see it completely broken and move to a new world where research happens in the open. So I'm more than happy when authors publicize their work and I appreciate them for sharing it as soon as it's ready, rather than six months in the future after three random people looked at it for 5 minutes on the toilet and decided on "weak accept"