Tutorial: CUDA programming in Python with numba and cupy

  Рет қаралды 73,810

nickcorn93

nickcorn93

Күн бұрын

/Using the GPU can substantially speed up all kinds of numerical problems. Conventional wisdom dictates that for fast numerics you need to be a C/C++ wizz. It turns out that you can get quite far with only python. In this video, I explain how you can use cupy together with numba to perform calculations on NVIDIA GPU's. Production quality is not the best, but I hope you may find it useful.
00:00 Introduction: GPU programming in python, why?
06:52 Cupy intro
08:39 Cupy demonstration in Google colab
19:54 Cupy summary
20:21 Numba.cuda and kernels intro
25:07 Grids, blocks and threads
27:12 Matrix multiplication kernel
29:20 Tiled matrix multiplication kernel and shared memory
34:31 Numba.cuda demonstration in Google colab
44:25 Final remarks
Edit 3/9/2021: the notebook is use for demonstration can be found here colab.research.google.com/dri...
Edit 9/9/2021: at 23:56 one of the grid elements should be labeled 1,3 instead of 1,2. Thanks to _______ for pointing this out.

Пікірлер: 73
@ErolErten
@ErolErten Жыл бұрын
I have been looking into gpu programming using numba and python for a while, this seems to be the best tutorial I was able to find so far.. . thank you
@prietjepruck
@prietjepruck Жыл бұрын
Really great introduction to GPU programming. I hope you make a new one soon.
@Omgtired
@Omgtired Жыл бұрын
Thank you so much. Probably the best introdution to CUDA with Python. The example you use, while very basic, touches on usage of blocks, which is usually omitted in other introduction-level tutorials. Great stuff! Hope you return with some more videos. I have subscribed!
@kayakMike1000
@kayakMike1000 Жыл бұрын
Cuda is bullshit closed source. Just wait for Tenstorrent, it's gonna be HUGE.
@taj-ulislam6902
@taj-ulislam6902 2 ай бұрын
Definitely a lot of new material not seen else where - not a run-of-the-mill video. Great job on originality.
@vallurirajesh
@vallurirajesh 2 жыл бұрын
Thank you so very much. This is the exact kind of material I was looking for on this very specific subject. Kudos.
@jakob3267
@jakob3267 2 жыл бұрын
Really nice video, thank you for sharing!
@thousandTabs
@thousandTabs Жыл бұрын
this was such an excellent video, thank you so much!
@ouaililydia3835
@ouaililydia3835 Жыл бұрын
thank you so much, it is the best explaination i found. Please keep going and give us more information and examples on that
@kineticraft6977
@kineticraft6977 11 ай бұрын
This reminds me a lot of the mindset you need to program in assembly.
@sciencewolf963
@sciencewolf963 2 жыл бұрын
Excellent explanation, keep going with this content man ;)
@andrjo
@andrjo 2 жыл бұрын
wanted to comment that the information in this presentation is very well structured and the flow is excellent.
@nickcorn93
@nickcorn93 2 жыл бұрын
Thanks man!
@Zysperro
@Zysperro 2 жыл бұрын
Just what I needed! Thanks!
@shaheeng8034
@shaheeng8034 3 ай бұрын
Thanks a lot! Still the best guide I could find.
@leaodev
@leaodev 2 жыл бұрын
Great video, nick!
@PhoenixReflex
@PhoenixReflex 7 ай бұрын
Thank you so much. Keep up the hard work. Just hoping that more and more libraries in python will support GPU computations soon.
@terriplays1726
@terriplays1726 2 жыл бұрын
Thanks for the video, I found the first half and the wrap up really excellent.
@LoneXeaglE
@LoneXeaglE Жыл бұрын
Thank you so much sir, you are an amazing human being !
@silkworm6861
@silkworm6861 2 жыл бұрын
This is a great video!
@therealbatman664
@therealbatman664 2 жыл бұрын
Thanks a lot really got me started .
@______373
@______373 2 жыл бұрын
wait i tought that this made by some popular channel, done pretty well and then saw, 29 subscribers
@nickcorn93
@nickcorn93 2 жыл бұрын
you would be surprised what powerpoint can do. To be honest I don't enjoy making videos that much, it's a lot of work, it always turns out kind of shit (especially audio and webcam quality), and I get nothing in return. But when I encounter a really niche topic that I struggled with myself and I don't find many resources for it I figure I make it myself hopefully such that it may be useful to someone else.
@______373
@______373 2 жыл бұрын
@@nickcorn93 "nickcorn93 nickcorn93 2 hours ago you would be surprised what powerpoint can do." not only powerpoint))))))
@mfatihaydogdu7
@mfatihaydogdu7 Жыл бұрын
Very helpful, thank you.
@duongkstn
@duongkstn Жыл бұрын
great tut ! thanks
@tooniatoonia2830
@tooniatoonia2830 Жыл бұрын
Really learnt a lot here, thanks!💪
@localhost_mds
@localhost_mds Жыл бұрын
thank you. good video!!! it was very helpful
@dfrank5157
@dfrank5157 2 жыл бұрын
This is really helpful for my computing. Thank you.
@Shoz_
@Shoz_ Жыл бұрын
Thank you, this is gold
@srepmub
@srepmub 2 жыл бұрын
fantastic video.
@ArijitBhattacharya971
@ArijitBhattacharya971 2 жыл бұрын
wold love to see a video on what are a few CUDA programming challenges
@user-tx1we1hw8b
@user-tx1we1hw8b Жыл бұрын
thank you! super helpful
@Khaled_Elsadani
@Khaled_Elsadani 7 ай бұрын
Thanks for sharing INFO
@lfmtube
@lfmtube 2 жыл бұрын
Perfect Video! Saw was revealing to me to understand how it works. Thank you! I am a new subscriber of your channel. Regards from Buenos Aires, Argentina
@rezidwipradana495
@rezidwipradana495 2 жыл бұрын
Thank you very much
@timharris72
@timharris72 2 жыл бұрын
This was really good. Thanks for posting this!
@nucspartan321
@nucspartan321 Жыл бұрын
Great video
@mattiskardell
@mattiskardell 5 ай бұрын
Thank you so much
@plumberski8854
@plumberski8854 Жыл бұрын
Great intro for me. Waiting for my new GPU (likely 4060 Ti) for me to dig deeper into Python, CUDA, deep learning ...
@1Eagler
@1Eagler Жыл бұрын
Very educational. One thing I've missed: The function matmul is running on the PC or the GPU?
@AngeloHafner
@AngeloHafner 7 ай бұрын
Muito bom...
@Julian-tf8nj
@Julian-tf8nj 2 жыл бұрын
VERY helpful, thank you!!!!
@garywilliams4214
@garywilliams4214 9 ай бұрын
Great tutorial, Nick! One minor critique: your pronunciation of ‘array’ was confusing…a more standard pronunciation is “uh-RAY”.
@zaharkohut7881
@zaharkohut7881 Жыл бұрын
Thank you for this tutorial, it has been very helpful! But since it is only an introduction could anyone tell me what I should watch or read next on this topic? Thanks in advance for the advice!
@user-um9sl1kj6u
@user-um9sl1kj6u 11 ай бұрын
What about if you want to develop a library for neural net work? A highly specialized library
@glenneric1
@glenneric1 2 жыл бұрын
You say ARRay, I say arRAY. Let's call the whole thing off. But seriously, good stuff.
@Julian-tf8nj
@Julian-tf8nj 2 жыл бұрын
I kept thinking, "huh? what is he talking about?? Oh, he meant an ARRay!" lol Other than that, awesome vid!
@nickcorn93
@nickcorn93 2 жыл бұрын
Interesting, so I've basically been pronouncing array incorrectly my whole life. Will try to watch out for that in the future.
@glenneric1
@glenneric1 Жыл бұрын
@@nickcorn93 I've heard other people saying it your way too.
@rweaver6
@rweaver6 Жыл бұрын
​@@nickcorn93 it was very distracting. Work on it google it and use the pronunciation feature. Otherwise outstanding and very useful tutorial.
@HectorHernandez-ws3el
@HectorHernandez-ws3el 2 жыл бұрын
Thanks for the video, it isn´t very information about, sorry for my english
@0Clappy
@0Clappy Жыл бұрын
Can you do a tutorial series on how to accelerate things using cuda python?
@nickcorn93
@nickcorn93 Жыл бұрын
I've thought about it but it's a lot of work to make and edit a silly video like this, and at the moment I really don't have the time. I don't get anything for making these videos.
@richardbennett4365
@richardbennett4365 Жыл бұрын
Wait. At 12:10, the narrator says the timeit magic function reports a duration of 5 ms, but the number is 0.01 ms from 6 ms. The number us far away from 5 compared to 6. It shoukd be 6 ms if he's rounding, not 5 ms. He's truncating the decimals to arrive at an integer.
@nickcorn93
@nickcorn93 Жыл бұрын
Congratulations, you have invalidated the entire video by spotting this massive mistake ;) !
@richardbennett4365
@richardbennett4365 Жыл бұрын
@@nickcorn93 🆗.
@billyblackburn864
@billyblackburn864 2 жыл бұрын
hi, I have a program that I want to translate to numba. could you help me?
@nickcorn93
@nickcorn93 2 жыл бұрын
- what should the program do? - who is the program for? - what is it currently written in?
@niffoxichere8394
@niffoxichere8394 2 жыл бұрын
is it only me or the cooling fan going brrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr.
@gauravdeshpande4298
@gauravdeshpande4298 Жыл бұрын
I am unable to install cupyx from pip any help
@jakubkahoun8383
@jakubkahoun8383 2 жыл бұрын
Hi, I m trying this on my local computer, but cannot install Cupy, I have NVida geforece RTX 3060. EDIT: Installed CUDA 11.6 toolkit and it works now.
@nickcorn93
@nickcorn93 2 жыл бұрын
What is your OS? You may be having issues if you are using windows and pip. Easiest to install cupy in a conda virtual environment, as it will also install the cuda toolkit.
@jakubkahoun8383
@jakubkahoun8383 2 жыл бұрын
@@nickcorn93 Sorry for bother you, the problem was not installing Cuda Toolkit, srly I hate people who doesnt watch full video closely and ask stupid questions....and now I m one of them :D. Thx alot for this tutorial in 2 months i will try write my own GPU operator for my program, would be interting if this will be faster than CPU. (Btw using normal Visual code in python 3.10 env. on win 11, so far so good. (Altrough i have some code output delay problem when using openCV for some strange reason)
@wrcz
@wrcz 19 күн бұрын
all these tutorials using light mode while I learn at night... I'm gonna go blind :X
@kayakMike1000
@kayakMike1000 Жыл бұрын
GPUs aren't general purpose... sigh... They are really good at specific executing the same operation on many data banks. It just happens to be similair type of needs for graphics an machine learning
@nickcorn93
@nickcorn93 Жыл бұрын
Isn't that what I say in this video? Did you even watch it?
@jesusmtz29
@jesusmtz29 4 ай бұрын
Approximate arbitrary function? There are caveats.
@nigmaxus
@nigmaxus Жыл бұрын
Cupy does not install well through the use of pip
@nickcorn93
@nickcorn93 Жыл бұрын
typically it is easier via conda yes.
@TheAIEpiphany
@TheAIEpiphany 11 ай бұрын
Something is seriously off with your fast matmul implementation, it's 3 orders of magnitude slower than the built-in method (12.5 ms vs 8.82 us)? You probably have some host-device copying going on?
@nickcorn93
@nickcorn93 11 ай бұрын
The matmul example shown is the example from the numba documentation so I don't think it's wrong. It's (relatively) slow because matrix multiplication is something that is so common, it is insanely optimized in available implementations. You won't write a matrix multiplication implementation with numba that's faster than cupy. But if you have something custom you need to do, a custom kernel can be faster than a combination of cupy operations.
@snapo1750
@snapo1750 Жыл бұрын
There is a python opencl package (pyopencl) a = pyopencl.array.arange(queue, 400, dtype=numpy.float32) b = pyopencl.array.arange(queue, 400, dtype=numpy.float32) krnl = ReductionKernel(ctx, numpy.float32, neutral="0", reduce_expr="a+b", map_expr="x[i]*y[i]", arguments="__global float *x, __global float *y") my_dot_prod = krnl(a, b).get() 🙂 Benefit is it works on ALL GPU's not only Nvidia, (works on intel built in cpu gpu's and on amd gpus)
Writing CUDA kernels in Python with Numba
49:22
CUDA Community Meetup Group
Рет қаралды 5 М.
CUDA Simply Explained - GPU vs CPU Parallel Computing for Beginners
19:11
Python Simplified
Рет қаралды 236 М.
FOOLED THE GUARD🤢
00:54
INO
Рет қаралды 61 МЛН
DELETE TOXICITY = 5 LEGENDARY STARR DROPS!
02:20
Brawl Stars
Рет қаралды 22 МЛН
WHO DO I LOVE MOST?
00:22
dednahype
Рет қаралды 16 МЛН
The Fastest Way to Loop in Python - An Unfortunate Truth
8:06
mCoding
Рет қаралды 1,4 МЛН
Nvidia CUDA in 100 Seconds
3:13
Fireship
Рет қаралды 1,1 МЛН
Make Python code 1000x Faster with Numba
20:33
Jack of Some
Рет қаралды 439 М.
Best Order to Learn Algorithms & Data Structures
1:00
NeetCodeIO
Рет қаралды 122 М.
CUDA Programming on Python
21:34
Ahmad Bazzi
Рет қаралды 1,1 МЛН
Cython makes Python INSANELY FAST
19:08
Carberra
Рет қаралды 32 М.
Unlocking your CPU cores in Python (multiprocessing)
12:16
mCoding
Рет қаралды 294 М.
Turn Python BLAZING FAST with these 6 secrets
5:01
Dreams of Code
Рет қаралды 36 М.
Хотела заскамить на Айфон!😱📱(@gertieinar)
0:21
Взрывная История
Рет қаралды 3,3 МЛН
Will the battery emit smoke if it rotates rapidly?
0:11
Meaningful Cartoons 183
Рет қаралды 26 МЛН
Ждёшь обновление IOS 18? #ios #ios18 #айоэс #apple #iphone #айфон
0:57
Asus  VivoBook Винда за 8 часов!
1:00
Sergey Delaisy
Рет қаралды 791 М.
WWDC 2024 - June 10 | Apple
1:43:37
Apple
Рет қаралды 10 МЛН