Episode 3: From PyTorch to PyTorch Lightning

Рет қаралды 38,709

Күн бұрын

Check out our latest Educational Offerings, Deep Learning Fundamentals with Sebastian Raschka. Unit 1 Playlist: • Deep Learning Fundamen...
This video covers the magic of PyTorch Lightning! We convert the pure PyTorch classification model we created in the previous episode to PyTorch Lightning, which makes all the latest AI best practices trivial. We go over training on single and multi GPUs, logging and saving models, and many more!
Alfredo Canziani is a Computer Science professor at NYU (check out his deep learning class -kzfaq.info?list...)
Willam Falcon is an AI Ph.D. researcher at NYU, and creator and founder of PyTorch Lightning
Join our Discord to participate in the discussion: / discord
Chapters:
00:00 Introduction to PyTorch Lightning
00:38 Install PyTorch Lightning
01:03 5 main components of a Lightning Module
01:47 Defining a model
04:05 Optimizer
05:20 The Training Loop
07:26 Loading and preparing data
09:10 Running training experiments
16:04 Training on a GPU
17:35 Logging and saving models
23:02 Validation loop
31:48 Multi GPU training

Пікірлер: 92

@paulmathew1214 Жыл бұрын

Great tutorial! One thing I noticed for anyone working through this in 2022, the accuracy wont show up on the progress bar using the method in the tutorial. To get it to work, you need to remove the progress bar pbar variable from the return statement and instead insert "self.log("accuracy", acc, prog_bar=True)" into the training_step function

@didiruhyadi4798 2 жыл бұрын

Straight forward, great, thanks.

@mayankbhaskar007 3 жыл бұрын

Awesome, it's so easy to implement distributed training across nodes along with custom hooks!! 😉

@alfcnz 3 жыл бұрын

Yup, easy peasy! 😊

@anniezhi 3 жыл бұрын

Nice video! Just in case I misunderstood, when using multi-GPU, do I still need to specify the number of GPUs and nodes in the code after specifying in the SLURM script? Which specification will pl choose when the two are different?

@0x80O0oOverfl0w 3 жыл бұрын

I struggled a bit with this to get it working on my current setup. Looking through the API, I figured out that setup() also requires you to pass in stage. Might be good to add an overlay or something to the video pointing that out? Really looking forward to trying this out on a multi-gpu setup once I get my cooling situation under control.

@adityassrana 3 жыл бұрын

@20:15 is my favorite part of the video. Alfredo is so freaking honest at that moment, love it.

@alfcnz 3 жыл бұрын

😅😅😅

@timothydell4709 3 жыл бұрын

this was very helpful to reorganise my pytorch-lightning 0.7 or 0.8 code into the latest version. thanks guys waiting for more.

@alfcnz 3 жыл бұрын

Anytime! 😊

@wucf20 6 ай бұрын

It seems that, by tracking the release dates of the video and GitHub repo, the Colab notebook in this video was run with pytorch-lightning==0.8.5. (Just in case that this might be useful for future viewers)

@johngrabner 3 жыл бұрын

My vote for the order of feature coolness: #1- trivial multi GPU, #2- flexible tensorboard (I'm logging a bunch of metrics), #3- accumulate_grad_batches, #4- resume_from_checkpoint, #5 Hparms logging in tensorboard (especially useful when I keep tweaking parameters in the middle of a day long run, then resume), #6- warmup learning rate with optimizer_step.

@mayankbhaskar007 3 жыл бұрын

True dat!

@alfcnz 3 жыл бұрын

It's awesome, isn't it?

3 жыл бұрын

Very interesting, thanks!

@alfcnz 3 жыл бұрын

🥳

@sanjaydanyamraju3928 3 жыл бұрын

brilliant. thanks for sharing

@alfcnz 3 жыл бұрын

You're welcome 😉

@0x80O0oOverfl0w 3 жыл бұрын

I would like to see more explanations on why certain functions inside the model are chosen and the implications of numbers chosen for the functions. Ie why use linear vs conv2d. Also I don't quite understand the second linear transformation which goes from 64 to 64. In most tutorials usually the output is greater than the input? Thanks for making these videos. I'm new to machine learning and trying to apply these concepts to unstructured binary data using pytorch.

@parthchokhra7298 3 жыл бұрын

Great video.

@alfcnz 3 жыл бұрын

Thank you ❤️

@emmanuelkoupoh7979 3 жыл бұрын

Hi , really cool . Do you think about an automatic mini-batch processing?? I mean user pass a big batch dans you automatically split it in optimal size of mini-batch for computation on GPU to avoid ResourceExhaustedError and try experiments with much data into a batch

@buoyrina9669 2 жыл бұрын

Thanks !

@-mwolf Жыл бұрын

Nice!

@asiffaisal269 3 жыл бұрын

When will you publish the next video? This is amazing

@alfcnz 3 жыл бұрын

Just recorded two of them this week. Next week they'll be up! 😎

@KSK986 3 жыл бұрын

Thanks for this video, If you can cover call backs, that will be interesting learning. The progress bar always overwrites the previous metrics. I am hoping if you can cover printing metrics for each epoch separately , it will be of great help.

@alfcnz 3 жыл бұрын

Callbacks are coming out in 3 videos times. We're currently reviewing the edits, but they are basically ready! Metrics are logged on a text file, if you add them to the logger. Then you can visualise everything with whatever experiment manager you desire.

@alfcnz 3 жыл бұрын

@nivesh gadipudi we've recorded the callback yesterday! Hopefully this coming week the video is out! 🙂

@alfcnz 3 жыл бұрын

@nivesh gadipudi please, check out the last video. It's about the callbacks!

@jonathansum9084 3 жыл бұрын

Wonderful video and I will start using it, will the next episode do a VAE, cycle GAN, and hook at least? 😃 Feel free to ignore this part if you think it is too much. I hope we will do world model, pixel-level classifiers, cycle-GAN, Transformer, LSTM, Heatmap, Hook, Upsampling, GPT, Bert, music generation, and more because these are the basic today. Colab doesn't run the world model(truck backing one) in anime. I am not sure we have something that makes Colab runs it. We should do more on self-supervised learning and Energy-Based model.

@PyTorchLightning 3 жыл бұрын

way ahead of you. Here's converting a VAE to lightning: kzfaq.info/get/bejne/h66nqpR7rZ2tdIk.html And here's a VAE implementation in lightning (along with simclr, byol, cpc and more advanced things): github.com/PyTorchLightning/PyTorch-Lightning-Bolts/blob/master/pl_bolts/models/autoencoders/basic_vae/basic_vae_module.py#L15-L264

@ZobeirRaisi 3 жыл бұрын

Amazing,, I have to migrate to the lightning right now :)

@alfcnz 3 жыл бұрын

🥳

@nicolasmandel2392 3 жыл бұрын

Hey guys this is a great video. and I am really looking forward to simplify my pytorch pipeline with some of this code. There are just two issues I am running into: 1. When using acc = accuracy(logits, y), lightning complains about non-normalized predictions. What would you propose for this specific task, a lot of people just use a softmax layer in the end and add a log-likelihood loss. 2. When I define my train and val dataset split in my train_dataloader function by assigning self.train and self.val, and then just use a DataLoader on self.val in my val_dataLoader, I receive an error saying that my object has no attribute val, so I assume the call order is diffrent? Great introduciton apart from these minor things though, keep up the good work Cheers Nico

@nicolasmandel2392 3 жыл бұрын

For anyone coming across this error: github.com/PyTorchLightning/pytorch-lightning/issues/7050 self.train is a reserved keyword, just call it anything else.

@pierrebedu7760 2 жыл бұрын

Nice video! All the "progress bar" stuff is deprecated. Do you have any tips to replace that in a simple manner? In fact i just want to plot my validation loss and accuracy for each epoch. Thanks !

@chrisoman87 2 жыл бұрын

in self.log theres a keyword prog_bar=True e.g. self.log('val_acc_step', accuracy, prog_bar=True)

@stonemannerie 3 жыл бұрын

Hi. Nice video. I was trying to implement myself a similar basic model. But during validation loop I tried working with EvalResults instead of dictionaries, since I thought that would be the recommended way (and I also liked the automatic reduction). But early stopping did not want to accept that. I always got an error, that 'val_loss' is not present. Did you ever try? Or can you link to a repo which succesfully combined EvalResults with early stopping?

@PyTorchLightning 3 жыл бұрын

did you add the arguments: EvalResult(checkpoint_on=X, early_stop_on=Y)?

@stonemannerie 3 жыл бұрын

@@PyTorchLightning yes. I found the problem. I was confused, when specifying early_stop_on during validation_step, pytorch lightning applies mean for you. I will add it to the documentation, since I think it's nice to have, but should be transparent.

@rameshprakash3028 2 жыл бұрын

got some error MisconfigurationException: No `train_dataloader()` method defined. Lightning `Trainer` expects as minimum a `training_step()`, `train_dataloader()` and `configure_optimizers()` to be defined. Any idea why this error.

@pratik6447 2 жыл бұрын

Do we have the link for file in colab?

@vijayrahulvenugopal2735 3 жыл бұрын

Could you guys make a video of how transfer learning works with lightning, also may be with a complex dataset, coz it helps more in the real world scenario right? @Alfredo Canziani

@PyTorchLightning 3 жыл бұрын

Yes! in the meantime, here's a paper that used lightning for self-supervised learning and transfer learning. towardsdatascience.com/a-framework-for-contrastive-self-supervised-learning-and-designing-a-new-approach-3caab5d29619

@alfcnz 3 жыл бұрын

Next week we'll record the transfer learning video you've requested. Stay tuned! 😎

@ulugbekdjuraev3833 3 жыл бұрын

I wish you guys finished that "train_loss/val_loss" array setup for plotting later. Love the videos!

@michelangelo1749 2 жыл бұрын

Both of these guys have something that confuses them with their back wall, amazing

@g8a9 3 жыл бұрын

Hi William, Afredo, thank you for this introductory tutorial! I just wanted to point out something. I followed along with mine version of the code and I noticed that calling the training portion of the data "train" may cause some issues (you instantiate self.train and self.val in the setup hook): the LightningModule invokes self.train() at a certain point which became instead a Subset in your example :)

@alfcnz 3 жыл бұрын

Oops 😅

@DucPham-dq2mx 3 жыл бұрын

You guys should cover how to pass hyperparameters in the model's __init__() next! I think I just messed that up :D

@alfcnz 3 жыл бұрын

Okay, noted! Thanks for the request!

@lazypunk794 3 жыл бұрын

I think self.train is a reserved keyword again, gave me an error, changed them to self.train_dataset and self.val_dataset.

@talha_anwar 3 жыл бұрын

How to do custom data loading

@ramisketcher2069 3 жыл бұрын

Great job, you both! Your 'setup' method got a typo, it should be: "train_data = datasets.MNIST('da ..." istead of: "datasets = datasets.MNIST('da ..." But it gives me an error: 'TypeError: setup() takes 1 positional argument but 2 were given'

@alfcnz 3 жыл бұрын

Yeah, torchvision.datasets.MNIST() requires you to tell what's the root folder where to find the data set on your drive. If this does not exist, then it expect you to tell it to download it. So, the first time your using, it needs root='data_path' and download=True.

@soujanai 3 жыл бұрын

The code works with the following changes: 1. Change `def setup(self)` to `def setup(self, step)` 2. Move `prepare_data`, `setup`, `train_dataloader`, `val_dataloader` into `class MNISTDataModule(pl.LightningDataModule)` 3. Add `data_module = MNISTDataModule()` 4. Change `trainer.fit(model)` to `trainer.fit(model, data_module)`

@Flyforward226 2 жыл бұрын

I am following the code, but when I ran the trainer.fit(model). it pops error" 'ResNet' object has no attribute 'val'. Also, there has been many changes to the pytorch lighting. The accuracy should all come from torchmetrics. The pbar is different now, based on the document should be set as " self.log("avg_val_acc",avg_val_acc)"

@shikharsaxena9989 Жыл бұрын

Correct

@osamansr5281 5 ай бұрын

the return dict of the training_step [21:00] , unfortunately the docs don't provide a lot of info about this point

@xinqifan8192 3 жыл бұрын

Is this colab available somewhere? Thx

@alfcnz 3 жыл бұрын

No, you need to program alongside, in order to gain better understanding. This is a class. The objective is for you to learn the topics we're covering.

@mastafafoufa5121 3 жыл бұрын

Had plenty of trouble with the additional functions setup(), prepare_data(). And couldn't figure out the origin of the problem from the error message which I think is confusing most of the time

@robosergTV 2 жыл бұрын

learn to debug the code....

@mastafafoufa5121 2 жыл бұрын

@@robosergTV I did figure it out - what about others? The point of the video is to bring clarity to lightning, not confusion.

@MateuszModrzejewski 2 жыл бұрын

Accuracy as demonstrated in the video is deprecated as of now. I think now you have to use `torchmetrics` and `self.log(prog_bar=True)` to obtain the effect demonstrated in the vid. Correct me if I'm wrong?

@MateuszModrzejewski 2 жыл бұрын

Also, the DataLoader stuff would be factored out into a LightningDataModule subclass

@Nextswordsayf 3 жыл бұрын

i'm learning pytorch lightning for my Bachelor thesis, so i'm a newbie on this topic.. I didn't understand what should be in forward, and what is the main differnece between forward and training step?

@haohuynhnhat3881 3 жыл бұрын

wrong choice my man, you should use Pytorch instead, all of the things Pytorch requires you to do are essential and should not be abstracted

@Nextswordsayf 3 жыл бұрын

@@haohuynhnhat3881 thank you man but my supervisor want me to use lightning

@rickymort135 3 ай бұрын

@@haohuynhnhat3881 this comment aged badly, the word "should" is doing a lot of heavy lifting there. I agree for learning, but then just learn pytorch, do a project in it and then start with lightning. Then you know what's happening underneath the abstraction but don't have to have reams of boilerplate and can make use of clusters easily

@michaelscheinfeild9768 10 ай бұрын

my colab didnt show the accuracy

@arjunp2014 3 жыл бұрын

every talk is a Lightning talk

@alfcnz 3 жыл бұрын

😎

@johngrabner 3 жыл бұрын

Wow 2 GPU was easy. What if I have nested custom nn.Module?

@williamalejandrofalcon3452 3 жыл бұрын

it still works!

@alfcnz 3 жыл бұрын

@@williamalejandrofalcon3452 we can show that in the next episode too right? Let's note it down 🧐📝

@rahuldeora1120 3 жыл бұрын

You should do more advanced tutorials to really show off the features

@alfcnz 3 жыл бұрын

Yes, everything will come out at the right time. We don't want to scare you away right from the beginning.

@michaelscheinfeild9768 10 ай бұрын

a colab share will be useful

@nmpai 3 жыл бұрын

self.train and self.val wont help as there already methods with that name, i solved this renaming the variables thanks for the video

@rockythere81 3 жыл бұрын

pytorch lightning for pytorch is like keras for tensorflow

@alfcnz 3 жыл бұрын

Not quite. Lightning is just a way to organise your PyTorch code. You still have to write PyTorch code. You don't have to write non-PyTorch code, such as loops, logs, job schedulers, distributed training, resuming interrupted training, and all non-ML stuff.

@rockythere81 3 жыл бұрын

@@alfcnz ooh thanks for clarifying that... I am new to pytorch that's why I misinterpreted that

@re1konn 3 жыл бұрын

training on a cluster.... Hmm.....GPT-50?😶😱

@alfcnz 3 жыл бұрын

😱😱😱

@Prasad-MachineLearningInTelugu 3 жыл бұрын

Colab notebook pls

@alfcnz 3 жыл бұрын

Code along with us! Typing things out will reinforce the concepts covered in this episode. If you have any doubts, then please, let us know, and we'll clear them all out.