Fourier Neural Operator for Parametric Partial Differential Equations (Paper Explained)

Рет қаралды 63,413

3 жыл бұрын

#ai #research #engineering
Numerical solvers for Partial Differential Equations are notoriously slow. They need to evolve their state by tiny steps in order to stay accurate, and they need to repeat this for each new problem. Neural Fourier Operators, the architecture proposed in this paper, can evolve a PDE in time by a single forward pass, and do so for an entire family of PDEs, as long as the training set covers them well. By performing crucial operations only in Fourier Space, this new architecture is also independent of the discretization or sampling of the underlying signal and has the potential to speed up many scientific applications.
OUTLINE:
0:00 - Intro & Overview
6:15 - Navier Stokes Problem Statement
11:00 - Formal Problem Definition
15:00 - Neural Operator
31:30 - Fourier Neural Operator
48:15 - Experimental Examples
50:35 - Code Walkthrough
1:01:00 - Summary & Conclusion
Paper: arxiv.org/abs/2010.08895
Blog: zongyi-li.github.io/blog/2020/fourier-pde/
Code: github.com/zongyi-li/fourier_neural_operator/blob/master/fourier_3d.py
MIT Technology Review: www.technologyreview.com/2020/10/30/1011435/ai-fourier-neural-network-cracks-navier-stokes-and-partial-differential-equations/
Abstract:
The classical development of neural networks has primarily focused on learning mappings between finite-dimensional Euclidean spaces. Recently, this has been generalized to neural operators that learn mappings between function spaces. For partial differential equations (PDEs), neural operators directly learn the mapping from any functional parametric dependence to the solution. Thus, they learn an entire family of PDEs, in contrast to classical methods which solve one instance of the equation. In this work, we formulate a new neural operator by parameterizing the integral kernel directly in Fourier space, allowing for an expressive and efficient architecture. We perform experiments on Burgers' equation, Darcy flow, and the Navier-Stokes equation (including the turbulent regime). Our Fourier neural operator shows state-of-the-art performance compared to existing neural network methodologies and it is up to three orders of magnitude faster compared to traditional PDE solvers.
Authors: Zongyi Li, Nikola Kovachki, Kamyar Azizzadenesheli, Burigede Liu, Kaushik Bhattacharya, Andrew Stuart, Anima Anandkumar
Links:
KZfaq: kzfaq.info
Twitter: ykilcher
Discord: discord.gg/4H8xxDF
BitChute: www.bitchute.com/channel/yannic-kilcher
Minds: www.minds.com/ykilcher
Parler: parler.com/profile/YannicKilcher
LinkedIn: www.linkedin.com/in/yannic-kilcher-488534136/
If you want to support me, the best thing to do is to share out the content :)
If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this):
SubscribeStar: www.subscribestar.com/yannickilcher
Patreon: www.patreon.com/yannickilcher
Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq
Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2
Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m
Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n

Пікірлер: 135

@DavenH 3 жыл бұрын

The intro is cracking me up, had to like.

@AE-cc1yl 3 жыл бұрын

Navier-Stonks equations 📈

@RalphDratman Жыл бұрын

"Linearized ways of describing how a system evolves over one timestep" is BRILLIANT! I never heard PDEs described in such a beautiful, comprehensible way, Thank you Yannic Kilcher.

@dominicisthe1 3 жыл бұрын

Cool to see a paper like this pop up on my youtube. I did my MSc thesis on the first reference solving ill-posed inverse problems using iterative deep neural networks.

@errorlooo8124 3 жыл бұрын

So basically what they did is kind of like taking a regular neural network layer added jpeg compression before it, and jpeg decompression after it, then built a network and trained it on navier stokes images to predict the next images. The reason i say jpeg is because the heart of jpeg is transforming an image into the frequency domain using a fourier-like function, the extra processing jpeg does is mostly non-destructive(duh you want your compressed version to be as close to the original), plus a neural network would probably not be impeded by the extra processing, and their method throws away some of the modes of the fourier transform too.

@errorlooo8124 3 жыл бұрын

@Pedro Abreu Yeah DCT is derived from the DFT which is basically the Fourier Transform but can work on actual data instead of needing a continuous function. (DCT is just the real component of DFT, with a bit of offsetting(it uses n+1/2) and less rotation(it uses pi instead of 2pi))

@user-fl8ql8fe8w Жыл бұрын

This is an excellently clear description. Thanks for the help.

@PatatjesDora 3 жыл бұрын

Going over the code is really nice!

@channuchola1153 3 жыл бұрын

Wow.. simply awesome. Fourier and PDE good to see togather

@diegoandrade3912 Жыл бұрын

Fabulous thank you for the explanation and time to create this video, keep it coming.

@soudaminipanda 7 ай бұрын

Fabulous explanation. Crystal clear

@taylanyurtsever 3 жыл бұрын

Vorticity is the cross product of nabla operator and the vector field of velocity, which can be thought of as the rotational flow in that region (blue clockwise and red ccw).

@judgeomega 3 жыл бұрын

or more simply: twisting

@CharlesVanNoland 2 жыл бұрын

AKA "curl" en.wikipedia.org/wiki/Curl_(mathematics)

@shansiddiqui8673 3 жыл бұрын

Fourier Neural Operators aren't limited to periodic boundary conditions the linear transform W works as a bias term which keeps track of non-periodic BCs.

@kazz811 3 жыл бұрын

Cool video as usual. Quick comment, vorticity is simply the curl of the velocity field and doesn't have much to do with "stickiness". Speaking of which, viscosity (measures forces within the fluid molecules) is not actually related to "stickiness", a property that is measured by surface tension (how the fluid interacts with an external solid surface). You can have highly viscous fluids which don't stick at all.

@lucidraisin 3 жыл бұрын

Woohoo! New video!

@clima3993 2 жыл бұрын

Yannic always give me an illusion that I understand things that I actually don't. Anyway, good starting point and thank you so much!

@herp_derpingson 3 жыл бұрын

36:30 I like the idea of throwing away high FFT modes as regularization. I wish more papers did that. 37:35 IDK if throwing out the little jiggles is a good idea because the Navier Stokes is a chaotic system and those little jiggles were possibly contributing chaotically. However perhaps the residual connection corrects that. 46:10 XD I wish the authors ablated the point to point convolution and showed how much does that help, same for throwing away modes. Also I wish the authors showed an error accumulation over time graph. I really liked the code walkthrough. Do it for other papers too if possible.

@pradyumnareddy5415 3 жыл бұрын

I like it when Yannic throws shade.

@kristiantorres1080 3 жыл бұрын

Thank you! I was just reading this paper and somewhere around page 5, I started to fall asleep. Your video will help me to understand this paper better.

@Andresc93 Жыл бұрын

Thank you, you just save a bunch of time

@antman7673 3 жыл бұрын

Vorticity is derived from vortex. The triangle pointing down is the nabla Operator. It was pointing to the lowest value.

@simoncorbeil4081 5 ай бұрын

Great video, however I would like to correct a few facts. If Navier-Stokes equations needs the development of new and efficient methods like neural networks it`s essentially because they are strongly Nonlinear especially for high Reynold number (low viscosity, like with air, water; typical fluids we daily meet ) where Turbulence is triggered. Also, I want to rectified, the Navier-Stokes systems shown in the paper is in incompressible regime, and the second equation is the divergence of of velocity, which is the mass conservation equation, nothing related to vorticity (it`s more the opposite, vorticity would be the cross product of the nabla operator with the velocity field).

@idiosinkrazijske.rutine 3 жыл бұрын

Looks similar to what is done is so called "spectral methods" for simulation of fluids. I'm sure this is where they draw their inspiration from.

@boffo25 3 жыл бұрын

Nice explanation

@tedonk03 2 жыл бұрын

Thank you for the awesome explanation, really clear and helpful. Can you do one for PINN (Physics Informed Neural Network)?

@DavenH 3 жыл бұрын

I hope this is going to lead to much more thorough climate simulations. Typically these require vast amounts of supercomputer time and are run just once a year or so. But it sounds like just a small amount of cloud compute would run them on this model. Managing memory would then be the challenge, however, because I don't know how you could afford to discretize into isolated cells the fluid dynamics of the atmosphere, where each part affects and flows into other parts. It's almost like you need to do it all at once.

@PaulanerStudios 3 жыл бұрын

Well from what I have seen climate simulations are at the moment also discretized into grids for memory management... at least the ones where I have looked at the code... I guess its more of a challenge to enforce boundary conditions in this model such that neighbouring cells don’t diverge at their shared boundaries... I guess traditional methods for dealing with this would suffice tho... you’d still have to then blend the boundaries occasionally, so the timesteps can’t be arbitrarily large

@DavenH 3 жыл бұрын

@@PaulanerStudios Hmm. Maybe take a page from CNNs and calculate 3x3 grid cells, so you get a centre cell with boundaries intact, then stride 1 cell and do another 3x3 calculation; hopefully the interaction falloff is steep enough to then stitch the centre-cells together without discontinuities. Or maybe you need to do 5x5 cells throwing away all but the centres. Another thing, I thought the intra-cell calculations were hand-made heuristics with these climate simulations, not actually Navier-Stokes. Could be wrong, but if no even eliminating those heuristics and putting in "real" simulations is a good improvement.

@PaulanerStudios 3 жыл бұрын

@Mustache Merlin The thing with every compute job is the von Neumann Bottleneck... running massively parallel compute jobs on CPU or GPU, the limiting factor is always memory bandwith... since neural networks are in the most basic sense matrix multiplications interspersed with nonlinearities, VRAM is the limiting factor for how large a given multiplication/network and thus network input can be... there is really no sense in streaming anything from a drive no matter how fast, because the performance will tank by orders of magnitude for backprop and such, if the network (and computation graph) can‘t be held in graphics memory at once... If u‘re arguing the case for regular simulations, well, supercomputers already have terabytes or petabytes of ram... the issue is swapping the data used for computation in and out of cache and subsequently registers... optane drives will not solve the issue of the memory bottleneck there either... the only thing they can solve is maybe memory price, which really is not a limiting factor in HPC (most of the time)

@dawidlaszuk 3 жыл бұрын

Coming from signal processing and getting head into the Deep™ world, I'm happy to see Fourier showing up. Great paper and good start but I agree with the overhype. For example, throwing away modes is the same as masking with square function, which in the signal space is like convolving with a sinc function. That's a highly "ripply" func. Nav-Stks is general is chaotic and small perturbations will change output significantly over time. I'm guessing that they don't see/show these effects because of their data composition. But that is a good start and maybe an idea for others. For example replace Fourier kernel with Laplace and use proper filtering techniques.

@DavenH 3 жыл бұрын

Hey Dawid, you produce any YT content? I'm also from DSP and doing Deep learning, curious what you're working on.

@billykotsos4642 3 жыл бұрын

Damn the opening title blew my mind

@beginning_parenting 3 жыл бұрын

On the line 87 of the code in FNO3D , it is mentioned that input is a 5d tensor (batch, x,y,t, in_channels).. What does in channels represent? Does that mean that each point in (x,y,t) is a vector containg 13 channels?

@raunaquepatra3966 3 жыл бұрын

I wish the authors showed the effects for throwing away modes in some nice graphs😔. Also show the divergence for this method from ground truth (using simulator) when used in a RNN fashion(ie feeding the final output of this method back to itself to generate time steps possibly to infinity and show at what point it starts diverging significantly)

@markh.876 3 жыл бұрын

This is going to be lit when it comes to Quantum Chemistry

@Mordenor 3 жыл бұрын

Normal broader impact: This may have negative applications on society and military applications This paper: I AM THE MILITARY

@MaheshKumar-iw4mv Жыл бұрын

Can FNO be used to train data from Reaction-Diffusion dynamics with no-flux boundary conditions?

@Newtube_Channel 3 жыл бұрын

That's the second derivative of the flow _w_ in position in the NS-eqn. Solving PDEs usually involves a niche in being able to tackle them. There's no singular way to approach these problems. You always need an additional nugget of information to tackle these sorts of problems and in many -if not all- cases there's inspiration and guesswork involved. We see this in scattering methods for nonlinear PDEs too (the guess is the transform itself). In FEM you're looking for current values using values from the last handful of time steps. It's inherently deterministic and reasonably accurate in so far as modeling the real world is concerned. It's conceivable that you could apply such a NN at regular intervals. That means retraining it every few steps. Considering the initial guess you have to make using the initial condition (or subsequent current value), I'm dubious if it's any more economical than FEM.

@DamianReloaded 3 жыл бұрын

47:00 If they wanted to predict longer sequences they could use the solver for the first tensor they input and just feed in the last 11 steps of the latest prediction back in right? I wonder after how many steps it would begin to diverge if they used the maximum possible resolution of the data.

@YannicKilcher 3 жыл бұрын

True, but as you say, the problems would pile up

@reinerwilhelms-tricarico344 3 жыл бұрын

I found this article quite abstract (which may explain why it's interesting ;-). I could sort of get it after first reading an article by the same authors where they explain neural operators for PDEs in general (Neural Operator: Graph Kernel Network for Partial Differential Equations, 2020). There they show that the kernel they learn is similar to learning the Green's function for the PDE.

@kristiantorres1080 3 жыл бұрын

It is abstract and there are some things that I don't understand. Is this the paper you are referring to? arxiv.org/abs/2003.03485

@reinerwilhelms-tricarico344 3 жыл бұрын

@@kristiantorres1080 Yes. I read that paper and it somehow helped me understanding the paper presented here.

@digambarkilledar003 3 ай бұрын

what is number of input channels and output channels ?

@davenovo69 3 жыл бұрын

Great channel! What App do you use to annotate PDFs?

@YannicKilcher 3 жыл бұрын

OneNote

@mohsensadr2719 2 жыл бұрын

Very nice work of explaining the paper. I was wondering if you have any comments about: - Fourier works well if you have equidistance grid points. I think if the initial data points are random in space (or unstructured grid), one has to include more and more terms in the Fourier expansion given the irregularity of the mesh. - FNO has to be coupled with an exact solver since one has to give the solution of the first several time steps as input. - I think it is not possible to train FNO on a small solution domain and then use it for larger ones. Any comments on that?

@weishkysiliy4420 2 жыл бұрын

Training on a lower resolution directly evaluated on a higher resolution. I don't understand how he can do It?

@CoughSyrup Жыл бұрын

This is really huge. I see no reason this couldn't be extended to solve magnetohydrodynamic behavior of plasma. And made to work for the 3D equations. This currently requires supercomputers to model. Imagine making it run on a desktop PC. This means modeling of plasma instabilities inside fusion reactors. Maybe with fast or real-time modeling, humanity can finally figure out an arrangement of magnets in 3D for plasma that is stable and robust to excursions.

@southfox2012 3 жыл бұрын

great

@meshoverflow2150 3 жыл бұрын

Would there be any advantage to doing convolution in frequency space with a conventional cnn for say image classification? On the surface it seems like it could be faster (given that an fft is very fast) than regular convolution, but I assume there’s a good reason why it isn’t a common practice.

@nx6803 3 жыл бұрын

Octave convolutions are sorta based on the same intuition, yet don’t actually use fft.

@andrewcutler4599 3 жыл бұрын

Convolution preserves spatial relationships which makes it useful for images. Neighboring pixels are often related to one another. A CNN in FFT world would operate on frequency. Not clear that there is a window where only near frequencies should be added together to form feature maps.

@meshoverflow2150 3 жыл бұрын

@@andrewcutler4599 The cnn wouldn’t operate on frequencies though. Multiplication in frequency space IS convolution, so a feed forward network in frequency space should do the exact same thing as a conventional cnn. I feel like the feed forward should be smaller than the equivalent cnn, hence the question.

@DavenH 3 жыл бұрын

@@meshoverflow2150 Interesting observation.

@esti445 3 ай бұрын

8:30 It is the laplacian operator - the second derivative with respect to space..

@lestroarmonico 3 жыл бұрын

6:26 vorticity is derivation of viscosity? No it is not. Viscosity is the fluid's property, vorticity is ∇×V (curl of the velocity). Edit: And at 8:18, that is not vorticity equation, that is the continuity equation which is about conservation of mass. Very helpful video as I currently study on this very paper myself, but there are a few mistakes you've made that needs correction :)

@konghong3885 3 жыл бұрын

jokes aside, as a Physics student, I wonder: is it possible to apply periodic boundary condition on the FNO? how to actually estimate the error of the solver, for MCMC, the error can be estimated with probability, but not for the ML case

@artyinticus7149 3 жыл бұрын

Highly unlikely

@dominicisthe1 3 жыл бұрын

I think it is the non periodic boundary conditions u are worried it about.

@andyfeng6 3 жыл бұрын

The triangle means Laplace operator

@sujithkumar824 3 жыл бұрын

Download this video to save it personally because it can be taken down because of pressure by the author, for stupid reasons.

@herp_derpingson 3 жыл бұрын

Why?

@judgeomega 3 жыл бұрын

@@herp_derpingson i think the author can neither confirm nor deny any reasoning for a take down

@sujithkumar824 3 жыл бұрын

@@judgeomega yes, I'm glad Yannic didn't even respond publically to her, this is exactly the treatment every attention seeker should get.

@matthewtang1489 3 жыл бұрын

what?? paper author or article author? there is a fiasco about this?

@amarilloatacama4997 3 жыл бұрын

@antman7673 3 жыл бұрын

So this is kind of like an approximation of the development of the fluid with pixels instead of the infinite resolution “vector graphic” provided by the equation.

@weishkysiliy4420 2 жыл бұрын

Training on a lower resolution directly evaluated on a higher resolution. I don't understand how he can do It?

@airealguy 3 жыл бұрын

So I think this approach has some flaws and has been hyped too much. The crux of the problem is the use of FFT's which impose some severe constraints on CFD problems. First, consider complex geometries (ie those that are not rectangular). How does one take an FFT on something that is not rectangular? You can map the geometry using a spatial transform to a rectangular coordinate system, but then the learned parameters are specific to that transform and thus that geometry. Secondly, there are no good ways to do FFT's efficiently at large scales (ie scales above the memory space of one processor). Even the best algorithms such as heFFTe which can achieve 90% of the theoretical max performance are quite poor in comparison to the algorithmic performance of standard PDE solvers. heFFTe only achieves an algorithmic performand of 0.05% of peak on summit. So while this is fast on small scale problems, it will likely suffer major performance problems at large scales and will be difficult if not impossible to apply to complex non rectangular geometries. The neural operator concept is probably a good one, but the basis function makes this difficult to apply to general purpose problems. We need a basis function which is expanded in perception but not global like an FFT. Even chopping the FFT off can have issues. If you want to compute a N

@crypticparadigm2180 3 жыл бұрын

Great points... On the topic of memory consumption and allocation of neural networks-- what are your thoughts about Neural Ordinary Differential Equations?

@yusunliu4858 3 жыл бұрын

The process Fourier Transformation -> Multiplication -> Inverse Fourier Transformation seems like a low pass filter. If that is so, why not doing a low pass filter at the input A'. Maybe I didn't get the idea correctly.

@YannicKilcher 3 жыл бұрын

I think one of the steps is actually explicitly a low pass filter, so you're right

@weishkysiliy4420 2 жыл бұрын

@@YannicKilcher Training on a lower resolution directly evaluated on a higher resolution. I don't understand how he can do It?

@YannicKilcher 2 жыл бұрын

@@weishkysiliy4420 the architecture is somewhat agnostic to the resolution, unlike traditional image classifier models

@weishkysiliy4420 2 жыл бұрын

@@YannicKilcher After training on small size (64*64) and loading the model directly, change the input dimensions to 256*256? Can I understand it this way?

@weishkysiliy4420 2 жыл бұрын

@@YannicKilcher I really like your song. Nice prelude

@sohrabsamimi4353 3 жыл бұрын

Thank you so much for this video! can you explain how we learn the matrix R at 32:36 ?

@pedromoya9127 2 жыл бұрын

tipically by backpropagation update of its weights according the loss,

@weishkysiliy4420 2 жыл бұрын

@@pedromoya9127 Training on a lower resolution directly evaluated on a higher resolution. I don't understand how he can do It?

@JurekOK 3 жыл бұрын

So . . . they have taken an expensive function (which is itself, already an approximation of an even more expensive function), and trained up an approximated function. Then, there is no comparison of predictions with any experiment (least a rigorous one), only with that original "reference" approximated function. Is this a big deal? I have been doing that during the 2nd year of my undergrad in mechanical engineering, 18 years ago. Come on. How about the long-term stability of their predictor? How does it deal with singularities at corners? moving or deforming objects? deconveregence rate? is the damping spectrally correct? My point is that this demo is really unimpressive to a person that actually uses fluid dynamics for product design. It might be visually impressive for the entertainment industry. Hyped titles galore.

@JM-ty6uq 3 жыл бұрын

24:40 I suppose its worth mentioning that you can make a cake with 0.5 eggs or 2 eggs

@sui-chan.wa.kyou.mo.chiisai 3 жыл бұрын

8:30 Triangle for Laplace operator ?

@sui-chan.wa.kyou.mo.chiisai 3 жыл бұрын

www.wikiwand.com/en/Laplace_operator

@machinelearningdojowithtim2898 3 жыл бұрын

😀😀😀😀 pwned

@finite-element 3 жыл бұрын

Also Navier-Stokes should be nonlinear not linear (circa the same time window).

@JM-ty6uq 3 жыл бұрын

that is the dorito operator

@jean-pierrecoffe6666 3 жыл бұрын

Hahahahaha, excellent intro

@konghong3885 3 жыл бұрын

behold, the new title formate for ML community

@perlindholm4129 3 жыл бұрын

Idea - Scale down the ground truth video. Then train a model on a small matrix 4x4 part of the frame and learn the expansion 16x16 submatrix of the original frame. This way you can train 2 models each on the different aspects of the calculation. One scaled down time learning and one scale up learning.

@sinitarium 5 ай бұрын

Amazing! This must be how Nvidia DLSS works!?

@cedricvillani8502 2 жыл бұрын

Should update your video

@acharyavivek51 2 жыл бұрын

very scary how ai is progressing.

@Neomadra 3 жыл бұрын

I don't quite get why you said (If I understood you correctly) that the prediction cannot be made arbitrarily far into the future. Couldn't you just use the output of the forward propagation as new input for the next round of forward propagtion. So you apply a chain of forward propagations until you reach the time you want. If memory is a problem, then you can simply clear the memory of the previous outputs.

@seamusoblainn4603 3 жыл бұрын

Perhaps as the network is making predictions as opposed to the ground truth sim which is using physics. In the latter there only is what it's rules generate, while in the former you are using 'feedforwarding' which must by necessity diverge, and on a fine degree of granularity probably is from the beginning.

@YannicKilcher 3 жыл бұрын

it's true, but you regress to the problem you have when running classic simulations