Nvidia CUDA in 100 Seconds

  Рет қаралды 1,125,194

Fireship

Fireship

3 ай бұрын

What is CUDA? And how does parallel computing on the GPU enable developers to unlock the full potential of AI? Learn the basics of Nvidia CUDA programming in this quick tutorial.
Sponsor Disclaimer: I was not paid to make this video, but Nvidia did hook me up with an RTX4090
#programming #gpu #100secondsofcode
💬 Chat with Me on Discord
/ discord
🔗 Resources
CUDA nvda.ws/3SF2OCU
GTC nvda.ws/3uDuKzj
CPU vs GPU • CPU vs GPU vs TPU vs D...
🔖 Topics Covered
- How does CUDA work?
- CUDA basics tutorial in C++
- Who invented CUDA?
- Difference between CPU and GPU
- CUDA quickstart
- How deep neural networks compute in parallel
- AI programming concepts
- How does a GPU work?

Пікірлер: 1 300
@Fireship
@Fireship 3 ай бұрын
Shoutout to Nvidia for hooking me up with an RTX4090 to run the code in this video, get the CUDA toolkit here nvda.ws/3SF2OCU
@universaltoons
@universaltoons 3 ай бұрын
🥇
@light-gray
@light-gray 3 ай бұрын
ZLUDA be like:
@TuxikCE
@TuxikCE 3 ай бұрын
yes mom, I need a 4090 to run CUDA.
@u_j134s
@u_j134s 3 ай бұрын
Damn you really put that rtx4090 through hell
@HolyRamanRajya
@HolyRamanRajya 3 ай бұрын
So this is sponsored?
@tigerseye1202
@tigerseye1202 3 ай бұрын
Little know fact, CUDA is actually so fast, that it can bend spacetime and make 100 seconds last 3 minutes and 12 seconds, truly revolutionary.
@killerdroid99
@killerdroid99 3 ай бұрын
Underrated comment
@JJGlyph
@JJGlyph 3 ай бұрын
He ran the seconds in parallel with Cuda.
@sarimsalman2698
@sarimsalman2698 3 ай бұрын
Serious question, why are these videos never 100 seconds?
@_Nonines
@_Nonines 3 ай бұрын
Because it's just the name of the series. A catchy title, really. I don't think anyone cares if they're exactly 100s.
@Clarity-808
@Clarity-808 3 ай бұрын
To be fair, he explained it in 90 seconds, the rest is building an app.
@mrgalaxy396
@mrgalaxy396 3 ай бұрын
I've done a bit of CUDA in uni for a class in parallelism. Let me tell you, writting truly parallel code is a pain in the ass. Ain't no way all those scientists are writing CUDA code, probably some Python abstraction that uses C++ and CUDA underneath.
@acoupleofschoes
@acoupleofschoes 3 ай бұрын
Like PyTorch and Tensorflow
@Imperial_Squid
@Imperial_Squid 3 ай бұрын
"model.to("cuda:0") is the only cuda you need to know unless you're developing new algorithms or doing something truly wacky
@MaeLSTRoM1997
@MaeLSTRoM1997 3 ай бұрын
some (x) mostly (o)
@oksowhat
@oksowhat 3 ай бұрын
yeh thats why pytorch and tensorflow exist, i have parallelism and HPC both this sem, writing openmp and MOI codes, truly a pita
@CraftingCake
@CraftingCake 3 ай бұрын
There are a few geniuses who write libraries and then there are thousands of devs who build products out of them....
@mjiii
@mjiii 3 ай бұрын
The #1 computing platform for vendor lock-in
@PRIMARYATIAS
@PRIMARYATIAS 3 ай бұрын
And so is Apple.
@AchwaqKhalid
@AchwaqKhalid 3 ай бұрын
Dell in the server space too
@turolretar
@turolretar 3 ай бұрын
Cisco as well
@anonymouscommentator
@anonymouscommentator 3 ай бұрын
yall forgetting about aws? 😂
@ps3guy22
@ps3guy22 3 ай бұрын
No, Nvidia is an open computing platform dedicated to the development of democratized development and open standa--- Pfff 🤣🤣🤣 hahdahha!!
@meh3lp
@meh3lp 3 ай бұрын
0:36 this just taught me matrix multiplication, thanks
@ulz_glc
@ulz_glc 3 ай бұрын
fr, this 3 seconds animation was better in explaining it than most other explanaitions, and he didnt even spoke about it really.
@alvinbontuyan8083
@alvinbontuyan8083 3 ай бұрын
The best thing that had ever happened to me was figuring our what matrices actually represent (a linear transformation) and I've been able to do matrix multiplication without any memorizing simply because its just intuitive now. Try this also because schooling has failed us
@_rshiva
@_rshiva 3 ай бұрын
I think that is taken from @3blue1brown, @Fireship ??
@goddamnit
@goddamnit 3 ай бұрын
​@@alvinbontuyan8083 can you give a quick example on what you mean with this? I'm not that smart, thanks!
@AiSponge2
@AiSponge2 3 ай бұрын
lmao fr, those 3 seconds are extremally helpful
@RichardMoore-jg5tl
@RichardMoore-jg5tl 9 күн бұрын
One question please! Is NVIDIA a safe buy to outperform the market this year? I'm tired of these new buys every week, just to make up some assets with low percentage on my $236k portfolio and try to keep everything around 10%.
@NicoleBarker-he2vp
@NicoleBarker-he2vp 9 күн бұрын
I've always advised the lnvestors i know to exercise caution when it comes to new buys, especially right now. Its best you thread the market with the guidance of a qualified specialist or reliable counsel if you dont know where to look.
@RossiPopa
@RossiPopa 9 күн бұрын
I deal with an investment advisor for this reason. I currently have over $800k invested in a diversified portfolio that has grown exponentially and is suitable for all market seasons. Our current project for this year is a more concrete ballpark target.
@FusunTumsavas-cq7tp
@FusunTumsavas-cq7tp 9 күн бұрын
How can I participate in this? I sincerely aspire to establish a secure financlal future and am eager to participate. Who is the driving force behind your success?
@RossiPopa
@RossiPopa 9 күн бұрын
Monica Shawn Marti is the licensed advisor I use. Just research the name. You’d find necessary details to work with a correspondence to set up an appointment.
@FusunTumsavas-cq7tp
@FusunTumsavas-cq7tp 9 күн бұрын
I looked up her name online and found her page. I emailed and made an appointment to talk with her. Thanks for the tip
@0seele
@0seele 3 ай бұрын
Seeing "Hi Mom!" continue to be in your videos is such a beautiful thing. Hope you're holding up well
@FengHuang13
@FengHuang13 3 ай бұрын
Yes, my eyes got wet when I saw that
@forhadrh
@forhadrh 3 ай бұрын
Mom be like: I am proud of you, my son
@kamikaze9271
@kamikaze9271 3 ай бұрын
Wait, where?
@forhadrh
@forhadrh 3 ай бұрын
Where? What did you watch in this video then, lol. @@kamikaze9271 Here: 1:45, 2:53
@depralexcrimson
@depralexcrimson 3 ай бұрын
​@@kamikaze9271 2:52
@smx75
@smx75 3 ай бұрын
0:45 IEEE 754 moment
@cloudytheconqueror6180
@cloudytheconqueror6180 3 ай бұрын
When you use TFLOPs, is it single precision or double precision? Because I see double precision here.
@adialwaysup8184
@adialwaysup8184 3 ай бұрын
Gives me PTSD from my master's thesis. Had to modify 4 flags in clang to get acceptable results. Took me a while to figure out.
@Temari_Virus
@Temari_Virus 3 ай бұрын
​@@cloudytheconqueror6180Single precision. Double precision is often much slower, though the rtx 4090 is just able to get into the teraflop range for f64
@WolfPhoenix0
@WolfPhoenix0 3 ай бұрын
I did some CUDA programming assignments for my college Parallel Computing class. That course was the second hardest CS course I've ever taken (The hardest one is Compilers but that's in its own league). Human brains really weren't designed to think in parallel.
@DK-ox7ze
@DK-ox7ze 3 ай бұрын
Which college and course?
@skyhappy
@skyhappy 3 ай бұрын
The teacher probably sucked like most academic teachers. If you had fireship it would be a hundred times easier
@duckbuster1572
@duckbuster1572 3 ай бұрын
I hope that was graduate level, cause otherwise that is horrific
@KoaIa200
@KoaIa200 3 ай бұрын
I would argue that people were not really "designed" to think in any specific way... neuroplasticity for the win... same way that most programmers can think of code. Practise makes perfect.
@KoaIa200
@KoaIa200 3 ай бұрын
@@duckbuster1572 It's common for it to be a course in your last year of undergrad... I dont see why it would be horrific.
@r.y.z.
@r.y.z. 3 ай бұрын
ngl, I'm really loving how often these videos are being uploaded. It's often, but not so often that I feel overwhelmed and just spaced out enough that I feel a little excited when a new one comes out!
@YOTUBE8848
@YOTUBE8848 3 ай бұрын
wait until he drops some existential crisis type content lol
@Julzaa
@Julzaa 3 ай бұрын
1:09 still day zero of not mentioning AI
@2099EK
@2099EK 3 ай бұрын
AI is definitely worth mentioning.
@rkvkydqf
@rkvkydqf 3 ай бұрын
​@@2099EKPlease, can we just don't? Physics models (for example) are much more interesting (in my opinion) than curve fitting on steroids. (Just a matter of avoiding a cliche and showing a greater range of GPU computing applications)
@thecutepika
@thecutepika 3 ай бұрын
​Why, fitting so much complex curves that reflect reality is indeed worth mentioning ​@@rkvkydqf
@devrim-oguz
@devrim-oguz 3 ай бұрын
It’s more like zero minutes 😂
@mechadeka
@mechadeka 3 ай бұрын
@@anon8510You're literally on a technology channel, you Twitter drone.
@imWaytooRad
@imWaytooRad 3 ай бұрын
Thanks! I was having this discussing with my coworkers the other day about what separates a gpu from a cpu and this was an excellent explanation!
@johnfrusciantefan90
@johnfrusciantefan90 3 ай бұрын
Wrote Cuda at university .. getting the indices, blocks etc right ... that was fun (also since thread count depends on the actual GPU model). For the final project, we were allowed to use libraries such as thrust which made my life a ton easier by abstracting away most of the fun stuff.
@KoaIa200
@KoaIa200 3 ай бұрын
thread count is not depended on GPU model (max 1024 threads per block), total block size and number of cores are depended on number of SMs and cuda computability.
@Brahvim
@Brahvim 3 ай бұрын
Sounds like the "fun" was actually "fun boilerplate but it's still just boilerplate". Correct? Or... are you being _purely_ sarcastic?
@johnfrusciantefan90
@johnfrusciantefan90 3 ай бұрын
@@BrahvimBoth actually. It was fun in the beginning, but with more complex projects/tasks it became harder to understand how to use it correctly (espeically kernel launch configs with the dimensions, etc). Mabye, with more experience, it would be easier for me today than it was at that time. But don't get me wrong, they also showed how to do the same thing with OpenCl and the amount of boilerplate code for this to run was way more than with Cuda. And when they allowed using thrust for the final project, most of the boilerplate code was gone because thrust abstracts that away. It was more fun to work with an API that offers host and device vectors and a standard library for common tasks. But, thrust also abstracts away the launch configurations for kernels etc, so you loose control (which was fine for me because I struggelded with the more advanced concepts). But I guess you will loose some speed/memeory effeciency like with all abstractions.
@johnfrusciantefan90
@johnfrusciantefan90 3 ай бұрын
@@KoaIa200you are right. I am sorry. The more advanced kernel launch configs with block size etc was quite hard for me and I haven't used Cuda in years now. But I remeber struggeling with the concepts after the initial easy tasks
@johnfrusciantefan90
@johnfrusciantefan90 2 ай бұрын
@@BrahvimNo, it actually was fun, but it is also hard. And if you compare to OpenCL it is actually much much less boilerplate code. In the beginning, exercise were quite easy but with more complex tasks, it became much harder. For the final project we were allowed to just thrust which is a library that makes things much easier. E.g. it provides host and device vectors and it also handles all boilerplate stuff. However, you will loose control because it is a abstraction and probably some speed. But today, if I would need to do Cuda again it would be with thrust (at least in the beginning)
@ucantSQ
@ucantSQ 3 ай бұрын
Whoa, my universes are operating in parallel. I just learned about CUDA this morning for the first time, and here's a new fireship video about it.
@petrsehnal7990
@petrsehnal7990 3 ай бұрын
Man, you are a genius. I wrote my masters thesis on CUDA and there's no way how I would be able to explain this in 100 seconds. Respect! 🎉
@klekaelly
@klekaelly 3 ай бұрын
Can I read your master's thesis?
@PappuGongA
@PappuGongA 3 ай бұрын
same , LMK when you get it@@klekaelly
@maymayman0
@maymayman0 3 ай бұрын
Could you do it in 192 seconds??
@noanyobiseniss7462
@noanyobiseniss7462 3 ай бұрын
Really, I thought Opencl will do this just fine. Funny thing is ALL GPU's are designed to be parallel computers and AMD in actually more massively parallel than Ngreedia. He didn't describe anything that is just cuda specific, did you really not get that when writing your thesis?
@petrsehnal7990
@petrsehnal7990 3 ай бұрын
@klekaelly thank you, but it was on cuda version 1.0, which is really outdated from both software and hardware perspectives. Furthermore it is not in English. But I really appreciate your interest!
@Rohinthas
@Rohinthas 3 ай бұрын
Not using or planning to use CUDA but man did this just help me make sense of some terms I see being thrown around! Awesome!
@munto7410
@munto7410 3 ай бұрын
Bruh, are you my FBI agent? I just looked CUDA up a few hours ago.
@guinea_horn
@guinea_horn 3 ай бұрын
Yeah man, he monitored your web traffic, saw that you wanted to learn about cuda, and then made this video as fast as he could since he knew you would watch it.
@MrMudbill
@MrMudbill 3 ай бұрын
Now I'm scared about tomorrow's video
@bbom9197
@bbom9197 3 ай бұрын
I was thinking to learn about CUDA. He is a mind reader
@gosnooky
@gosnooky 3 ай бұрын
That's classified.
@soufianenajari8900
@soufianenajari8900 3 ай бұрын
literally doing an homeword in cuda rn
@bartlx
@bartlx 3 ай бұрын
Nice to see a video touching C++'s ecosystem for a change. Now make one about SYCL, so even people who don't find free RTX 4090 cards in their mailbox can get into high performance parallel computing using modern ISO C++ instead of custom CUDA syntax.
@vladislavakm386
@vladislavakm386 3 ай бұрын
yeah, Nvidia dominates in parallel computing because software engineers only know CUDA.
@TheRealFFS
@TheRealFFS 3 ай бұрын
@@vladislavakm386 You got that backwards, but ok.
@user-go5oe6td3k
@user-go5oe6td3k 11 күн бұрын
SYCL is needlessly low level. Use OpenMP, with GPU targets.
@scapegoat079
@scapegoat079 3 ай бұрын
Yo I just wanted to say thank you for making this kind of stuff so interesting and digestible. You make these extremely complex, time intensive languages, apis, tools, etc., and make them incredibly approachable. Love your content. Cheers.
@Officialjadenwilliams
@Officialjadenwilliams 3 ай бұрын
Surprised that it took this long to get a CUDA in 100 seconds. 😆
@scapegoat079
@scapegoat079 2 ай бұрын
I did not expect this... I'm calling Miguel.
@MaxoticsTV
@MaxoticsTV 3 ай бұрын
Funny, I had to install NVIDIA CUDA for a thing I'm doing and forgot what CUDA does, searched it, and found this video that was just posted an hour ago! WHAT TIMING!!!
@neuronscale
@neuronscale 3 ай бұрын
Great presentation of the topic of CUDA architecture and Nvidia GPUs in such a compact and fast form. As always, brilliant video!
@BattlewarPenguin
@BattlewarPenguin 3 ай бұрын
Awesome video! Thank you for the heads up in the conference!
@davidf6592c
@davidf6592c 3 ай бұрын
I'll admit, I tear up a little every time I see the "Hi Mom" in your vids.
@TheHackysack
@TheHackysack 3 ай бұрын
1:39 Complier :D
@YuriG03042
@YuriG03042 3 ай бұрын
no, complier
@Sarfarazzamani
@Sarfarazzamani 3 ай бұрын
Gotcha moment😀
@incognito3678
@incognito3678 2 ай бұрын
Marcomplier
@gagd7351
@gagd7351 2 ай бұрын
As a programmer I absolutely love your series on programming languages and tools ! Cannot be more clear, and full of knowledge. Thank you. This also refresh common knowledge such as the C video!
@h3lpkey
@h3lpkey 3 ай бұрын
Many thanks for every video on your channel, you doing very big and cool work
@KorruFreez
@KorruFreez 3 ай бұрын
Sometimes I regret my career choices
@n.w.4940
@n.w.4940 3 ай бұрын
Aside from this very informative video ... Heartwarming that you put in that "Hi mom"-message. Probably one of the most concise videos on this topic.
@boredofeducation-sb6kr
@boredofeducation-sb6kr 3 ай бұрын
I loved the animations and thr explanation..i just finished a cuda course for my masters so it was minx blowing to see a whole weeks worth of lectures effortlessly compressed in ... 100 seconss
@khSoraya01
@khSoraya01 2 ай бұрын
Can I see the course?
@The472k
@The472k 3 ай бұрын
Thanks for the video! Easy to understand and that helped me a lot to get a basic understanding of CUDA
@sepro5135
@sepro5135 3 ай бұрын
Im using cuda for fluid simulation, it’s a real game changer in terms of speed
@wywarren
@wywarren 3 ай бұрын
The SDK has already gotten alot more convenient in the last 5-6 years. Memory used to require the SDK to manually copy back and forth. From what I remember the manual copying is still available, but in my DLI course when I was trying it out, having it be auto managed is slower than manually moving it all into memory first and running the operation. Using it in managed improves the developer experience signficantly but on each access if the memory block hasn't been copied I believe the managed system will still need to move it over on demand. To pass my CUDA DLI exam to meet the passing criteria, one of the steps I opted to manually copy. One can only dream of the day we have unified memory architectures then we don't have to deal with the copies.
@niamhleeson3522
@niamhleeson3522 3 ай бұрын
Yeah, you can probably keep on dreaming about that. Memory management is the primary contradiction that you must solve if you want your CUDA program to go fast. Either you need to get all of the data in the register file / shared memory or you have Too Much Data and have to do horrible things and maybe even have some of that data out of core and it will go much slower than it could. There's no cache coherence protocol so if you need it you have to move things around manually and do some synchronization. Fun stuff.
@dfsafsadfsadf
@dfsafsadfsadf 3 ай бұрын
That was a great summary! Thank you!!!
@4RILDIGITAL
@4RILDIGITAL 3 ай бұрын
Impressive explanation of how we can harness the power of our GPU using Nvidia's CUDA for more than just gaming. The practical demonstration expounded the potential of parallel computing considerably.
@otakuotaku6774
@otakuotaku6774 3 ай бұрын
Bro, Can you do more Hardware videos, just like this
@recursion.
@recursion. 3 ай бұрын
Hardware videos 💀
@arinahomuleba4165
@arinahomuleba4165 3 ай бұрын
You just explained parallel computing in 100s better than my lecturer did in more than 100 days🔥
@noanyobiseniss7462
@noanyobiseniss7462 3 ай бұрын
Yet misses the fact this is NOT cuda specific.
@bakedbeings
@bakedbeings 3 ай бұрын
Or your lecturer set you up well to follow this very basic, high speed summary. Like a reader of the LOtR series can see meaning in the film series' long, dreary shots.
@klaotische5701
@klaotische5701 3 ай бұрын
Just as I needed. Simple and quick introduction for it.
@sachethana
@sachethana 3 ай бұрын
Cuda is Awesome! I did one of my thesis on parallel processing in 2016 using CUDA for a super fast blood cells segmentation. Then used CUDA for mining crypto on the GPU.
@bnaZan6550
@bnaZan6550 3 ай бұрын
You didn't explain what CUDA does you explained what a GPU does... CUDA just has special optimizations over normal GPU parallels. Your example will work fine on every GPU and doesn't require CUDA to be parallel. All GPUs calculate the pixels using multi threading and multiple cores.
@Aoredon
@Aoredon 3 ай бұрын
I mean he explained how to get started with it and clarified how it's different to programming on the CPU. Also I'm pretty sure the > syntax is specific to CUDA so you wouldn't be able to just run this anywhere. And GPUs in graphics are usually just dealing with essentially a 2D array of pixels rather than 3D like here.
@HoloTheDrunk
@HoloTheDrunk 3 ай бұрын
@@Aoredon AMD's ROCm also uses the > syntax and I kinda agree with OP, this would've been good if it was titled "GPUs in 100 seconds" but as things stand it's hardly anything CUDA-specific
@oghidden
@oghidden 3 ай бұрын
This is a summary channel, not overly detailed.
@noanyobiseniss7462
@noanyobiseniss7462 3 ай бұрын
Correct and well said!
@julesoscar8921
@julesoscar8921 3 ай бұрын
The extension of the file was .cu tho
@StefanoBorini
@StefanoBorini 3 ай бұрын
Interesting little factoid: if you are doing parallel cuda programming, and have to compute on a subset of a large block of memory, often it's faster to operate on the whole block and simply ignore the additional data, without checking for actual boundaries. If conditions kill performance in cuda kernels, at the point that often it pays off to just compute garbage and discard it at the end, rather than prevent it from computing it.
@9SMTM6
@9SMTM6 3 ай бұрын
If conditions are usually translated to compute discard. But they give false appearances, and also if the if condition is difficult to compute that adds to the runtime cost.
@KoaIa200
@KoaIa200 3 ай бұрын
warp divergence does not matter if the other threads are doing nothing in the first place... just dont have if else and you are fine.
@janisir4529
@janisir4529 3 ай бұрын
Better add those bounds checks, don't want to crash with access violations...
@dheovanixavierdacruz3043
@dheovanixavierdacruz3043 3 ай бұрын
YES! I was waiting for this one
@xbozo.
@xbozo. 3 ай бұрын
awesome animations on the video man
@desoroxxx
@desoroxxx 3 ай бұрын
Next please do OpenCL in 100 Seconds, seriously
@noanyobiseniss7462
@noanyobiseniss7462 3 ай бұрын
He didn't get paid for that.
@whamer100
@whamer100 3 ай бұрын
id love to see that
@Sarfarazzamani
@Sarfarazzamani 3 ай бұрын
Savage comment 😁@@noanyobiseniss7462
@ProjectPhysX
@ProjectPhysX 3 ай бұрын
OpenCL for the win! Same performance as CUDA, yet runs on literally every GPU from Nvidia, AMD and Intel.
@user-go5oe6td3k
@user-go5oe6td3k 11 күн бұрын
OpenCL is obsolete and dead. Use OpenACC for these purposes.
@batoczki93
@batoczki93 3 ай бұрын
But can CUDA center a div?
@abhishekpawar921
@abhishekpawar921 3 ай бұрын
💀💀💀
@drangertornado
@drangertornado 3 ай бұрын
Yes when you center a div in CSS, the browser uses your GPU for rendering the pages on your browser
@mulletmate8
@mulletmate8 2 ай бұрын
center div exit vim I use arch btw hmm yes, very original "I've been programming for two weeks" joke
@NEOchildish
@NEOchildish 3 ай бұрын
Great Video! A ROCM video would awesome too. Could help me explain my suffering to friends on using CUDA native apps in a crappy docker container for less performance vs native Nvidia.
@lucasgasparino6141
@lucasgasparino6141 3 ай бұрын
Hey, that was nice! I use both CUDA and OpenACC EXTENSIVELY to build CFD applications, and the performance on gpus is really fantastic... when done well xD strongly recommend against managed memory for complex production codes, if only for the fact that it seems to disable device/device DMA comms when using MPI. For anyone thinking about porting to GPUs, recommend to not half-arse it, and just make all data available to devices. Host/device exchanges can be brutally costly, and will likely eat up all your gains. Finally, it works with C and Fortran as well, for anyone curious about it :) Fireship, be nice to see a beyond 100 seconds of this, covering OpenACC and offloaded OpenMP as well😊
@jaiveersingh5538
@jaiveersingh5538 3 ай бұрын
Which CFD software has CUDA acceleration? Just Ansys Fluent right now right?
@lucasgasparino6141
@lucasgasparino6141 3 ай бұрын
@adialwaysup8184 not really, we performed some testing on A100s and H100s and offloaded omp was WAY slower. Sure it's portable, but acc is still getting love. It's also syntatically easier and cleaner in my opinion.
@lucasgasparino6141
@lucasgasparino6141 3 ай бұрын
@jaiveersingh5538 take a look at research code. Nek5000 uses CUDA, and as well as NekRS if I remember well. Our own code started as CUDA Fortran but we eventually moved to OpenACC. Easier to use and explain to other users. Quite a few libraries behind research soft also uses CUDA, or even OpenCL. For matrix free SEM methods, CUDA might be a bit hard to implement, but it's as fast as it gets.
@adialwaysup8184
@adialwaysup8184 3 ай бұрын
@@lucasgasparino6141 For us, omp was performing 2% slower than acc and 6-8% slower than cuda. Though, the performance was much worse on clang than nvhpc
@adialwaysup8184
@adialwaysup8184 3 ай бұрын
@@lucasgasparino6141 In my experience, currently, there's a major discrepancy in how well a compiler optimizes code for accelerators. The is doubly important when it comes to nvidia, since the nvptx backend is far from perfect. But if the same tests are done on nvidia say with nvhpc. I found an overall 2-3% gap between openmp and openacc. I do agree with your second point, openacc is much cleaner to write and integrates well, but at that point you're backing up in a corner with nvidia's hardware. Openacc might be an open standard, but no one except nvidia gives it a serious consideration. If you're going all in with nvidia anyway, why bother with openacc and just move to cuda.
@Ibbysz
@Ibbysz 3 ай бұрын
Great video, Fireship. However, it's worth noting that writing performant and optimized raw CUDA code is very difficult and not practical. Usually, you aren't writing your own CUDA code but rather using NVIDIA's highly optimized CUDA libraries, such as cuBLAS, cuFFT, and cuDNN. These libraries implement common primitives such as matrix multiplication, neural net operations, etc
@yogsothoth00
@yogsothoth00 3 ай бұрын
Yes, but where is the fun in that
@niamhleeson3522
@niamhleeson3522 3 ай бұрын
@@yogsothoth00 If you think that is fun you would probably get hired by Nvidia to write more libraries for them
@el_teodoro
@el_teodoro 3 ай бұрын
He did a 100 seconds video on PyTorch. So, he probably expand on this too. This video is specifically about CUDA.
@masteraso
@masteraso 3 ай бұрын
Yes , if you can install them and find the right version
@RudolfJvVuuren
@RudolfJvVuuren 3 ай бұрын
So basically: "when writing code one uses libraries." Thank you Capt. Obvious.
@romanino
@romanino 3 ай бұрын
I didn't understand MOST of it, but still loved it , thanks!
@marcellsimon2129
@marcellsimon2129 3 ай бұрын
Love how this video came out 20 minutes after I did intensive google search about CUDA :D
@augustinmichez8874
@augustinmichez8874 3 ай бұрын
0:46 truly a masterpiece from our beloved GPU
@augustinmichez8874
@augustinmichez8874 3 ай бұрын
@@starsandnightvision not a native speaker but ty for pointing it out
@demonfedor3748
@demonfedor3748 3 ай бұрын
Just recently seen the news abour Nvidia banning the use of translation layers on CUDA software like ZLUDA for AMD. That video's right on time.
@noanyobiseniss7462
@noanyobiseniss7462 3 ай бұрын
Which is what he should be making a video on but you don't get free 4090's for that content.
@demonfedor3748
@demonfedor3748 3 ай бұрын
@@noanyobiseniss7462 NVIDIA doesn't wanna let go that sweet sweet monopoly type proprietary stuff.
@noanyobiseniss7462
@noanyobiseniss7462 3 ай бұрын
@@demonfedor3748 Pretty anti competitive company that bleeds users dry. I have no clue why its userbase is so filled with gaslit fanbois. I guess it comes down to the misery likes company mantra.
@demonfedor3748
@demonfedor3748 3 ай бұрын
@@noanyobiseniss7462 Every big company wants to get as much profit as the next guy. NVIDIA does it through proprietary stuff, AMD does it by open standarts to claim the moral high ground. Pros and cons to each approach but the goal remains the same. NVIDIA has a lot of fans because they innovate a lot and are trailbrazers in multiple areas. Real time hardware ray tracing, DLSS, G-SYNC, frame generation, GPGPU aka CUDA, OPtiX, just to name a few. I know most of this stuff is proprietary and/or hardware locked but it's still innovation. I don't mean that AMD doesn't innovate. Mantle that subsequently led to Vulkan was a big deal, chiplet GPU and CPU design, 3D-Vcache on CPUs and GPUs, SAM. There's no clear winner, however NVIDIA is currently performance king. Intel wants in the game for over 15 years but they got big shoes to fill. Was a big blow when Larrabee failed.
@demonfedor3748
@demonfedor3748 3 ай бұрын
@@noanyobiseniss7462 Every big company wants to get as much profit as the next guy. NVIDIA does it through proprietary stuff, AMD does it by open standarts to claim the moral high ground. Pros and cons to each approach but the goal remains the same. NVIDIA has a lot of fans because they innovate a lot and are trailbrazers in multiple areas. Real time hardware ray tracing, DLSS, G-SYNC, frame generation, GPGPU aka CUDA, OPtiX, just to name a few. I know most of this stuff is proprietary and/or hardware locked but it's still innovation. I don't mean that AMD doesn't innovate. Mantle that subsequently led to Vulkan was a big deal, chiplet GPU and CPU design, 3D-Vcache on CPUs and GPUs, SAM. There's no clear winner, however NVIDIA is currently performance king. Intel wants in the game for over 15 years but they got big shoes to fill. Was a big blow when Larrabee failed.
@sn5806
@sn5806 3 ай бұрын
Great timing! Just got a new green GPU to mess around with and this'll help.
@BingleBangleBungle
@BingleBangleBungle 3 ай бұрын
This is a very slick advert for Nvidia 😅 didn't realize it was an ad until the end.
@noble.reclaimer
@noble.reclaimer 3 ай бұрын
I can finally build my own LLM now!
@markosdelaportas3089
@markosdelaportas3089 3 ай бұрын
Can't wait to install ZLUDA on my linux pc!
@JLSXMK8
@JLSXMK8 3 ай бұрын
Can I mention this video as part of my channel intro? I use NVIDIA CUDA to re-render and upscale all my video clips for KZfaq nowadays!! You give a really good explanation of how it all works.
@ace9463
@ace9463 3 ай бұрын
Having used the CUDA Toolkit for implementing LSTMs and CNNs for Computer Vision and Sentiment Analysis projects using Tensorflow GPU and ScikitLearn libraries of Python which utilized my laptop's NVIDIA GPU, the process of writing raw CUDA Kernels in C++ is somewhat new for me and seems fascinating.
@historyrevealed01
@historyrevealed01 3 ай бұрын
A: how complex the CUDA is ? B: Even the Fireship doesnt make sense
@lucasgasparino6141
@lucasgasparino6141 3 ай бұрын
Honestly, it's a rather low-level API, so it CAN get excessively complicated. That being said, you'd mostly use the basics of CUDA, and complexity would come from making the algorithm you're trying to implement parallel itself. Of course, the real magic is that you can optimize the SHIT out of it, I.e. overengineer the kernel 😅 but yeah, trust me when I say he covers only the intro bits about CUDA, this thing is a rabbit hole.
@stefantanuwijaya8598
@stefantanuwijaya8598 3 ай бұрын
Opencl next!
@noanyobiseniss7462
@noanyobiseniss7462 3 ай бұрын
I doubt AMD will pay him a 7900XTX to do it.
@practicalsoftwaremarcus
@practicalsoftwaremarcus 3 ай бұрын
Nice! I use Thrust to abstract a bit on those cuda and apply generic programming. Maybe do a video on openCL? 😊
@bramvdnheuvel
@bramvdnheuvel 3 ай бұрын
I would love to see Elm in 100 seconds soon! It definitely deserves more love.
@aghilannathan8169
@aghilannathan8169 3 ай бұрын
Data Scientists don’t use CUDA, they use Python abstractions like Tensorflow or Torch which parallelize their work using CUDA assuming an NVIDIA GPU is available.
@el_teodoro
@el_teodoro 3 ай бұрын
"Data scientists don't use CUDA, they use CUDA" :D
@drpotato5381
@drpotato5381 3 ай бұрын
​The guy above you doesnt knows what the word abstraction means lmao​@@el_teodoro
@HUEHUEUHEPony
@HUEHUEUHEPony 3 ай бұрын
@@el_teodoroor rocm? or vulkan? or metal?
@zainkhalid3670
@zainkhalid3670 3 ай бұрын
Getting CUDA to run on your Windows machine is one of the greatest problems of modern computer science. Edit: "getting CUDA-related libraries in a Python environment to correctly run neural networks"
@eigentensor
@eigentensor 3 ай бұрын
lol, holy wow this really is a noob channel
@user-qm4ev6jb7d
@user-qm4ev6jb7d 3 ай бұрын
Getting it to run the "official" way, from Visual Studio, is not much of a problem. Now, getting CUDA-related libraries in a Python environment to correctly run neural networks - THAT's a challenge. Especially with how much of a bother Conda is.
@MrCmon113
@MrCmon113 3 ай бұрын
Lots of ML stuff doesn't have good support on windows. Probably good idea just to run an Ubuntu VM if you plan to do much locally.
@OK-ri8eu
@OK-ri8eu 3 ай бұрын
I worked on a porject using CUDA enviornment, this brought some memory like the copying from host to device and vice versa. I'm sure I'll be working on it again in the future.
@somerandomdudemc6201
@somerandomdudemc6201 2 ай бұрын
Hello sir, Today is my High school IT exam. I thank you for giving so much knowledge in these years. Thank you sir
@3lqm89
@3lqm89 3 ай бұрын
hey, that's more than 100 seconds
@Joey-dj4cd
@Joey-dj4cd 3 ай бұрын
Use me as the button "I understood NOTHING"
@AO-ek9qw
@AO-ek9qw 3 ай бұрын
0:36 this matrix multiplication animation is really REALLY good!!!!!
@joshDotJS
@joshDotJS 3 ай бұрын
Thank you for the video!
@bradenhelmer9795
@bradenhelmer9795 3 ай бұрын
I literally just finished an exam on cuda wtf
@acestandard6315
@acestandard6315 3 ай бұрын
What course do you offer
@SalomDunyoIT
@SalomDunyoIT 3 ай бұрын
@@acestandard6315 where do u study?
@bradenhelmer9795
@bradenhelmer9795 3 ай бұрын
@@SalomDunyoIT Nunya University
@gourav7315
@gourav7315 3 ай бұрын
0:25 what is the game name
@pramodgoyal743
@pramodgoyal743 3 ай бұрын
Leaving a dot here for a captain to show up.
@BinaryBlueBull
@BinaryBlueBull 2 ай бұрын
I also would like to know this. Anyone?
@pherd-0884
@pherd-0884 3 ай бұрын
I would really enjoy a follow-up to this, maybe on the other channel to discuss ROCM.
@tjmarx
@tjmarx 3 ай бұрын
That was a pretty entertaining ad.
@noanyobiseniss7462
@noanyobiseniss7462 3 ай бұрын
Cuda is closed source and therefor a non starter for anyone that believes in freedom standards.
@Volian0
@Volian0 3 ай бұрын
I wouldn't recommend nvidia to anyone, their CEO is crazy!!
@MrCmon113
@MrCmon113 3 ай бұрын
And the alternative is what? Hospitals, the garbage collection, fire departments, etc aren't open source either, but you're kinda forced to use them. Nvidia has got us all by the balls. Your balls are firmly placed in Nvidia's hands. God speed your efforts to come up with a freedom alternative.
@Volian0
@Volian0 3 ай бұрын
@@MrCmon113 the alternatives exist! In case of CUDA, OpenCL is the alternative that works on all GPUs. And in case of gaming, AMD cards preform very well (and their drivers are open source)
@MaybeBlackMesa
@MaybeBlackMesa 3 ай бұрын
Nothing worse than buying an AMD card and being locked out of anything AI (and these days it's a LOT of things). Never again.
@noanyobiseniss7462
@noanyobiseniss7462 3 ай бұрын
Your not too bright are you.
@montytrollic
@montytrollic 3 ай бұрын
Google ZLUDA my friend ...
@M7ilan
@M7ilan 3 ай бұрын
Valuable video!
@TheVilivan
@TheVilivan 3 ай бұрын
Would love to see some more videos on parallel computing, with more explanation of this kind of code. Maybe a more in-depth video on Beyond Fireship?
@vectoralphaAI
@vectoralphaAI 3 ай бұрын
Game Developers Conference (GDC) is also that week.
@MatheusLB2009
@MatheusLB2009 3 ай бұрын
I honestly recommend the GTC if you're into graphics or just interesting curiosities
@uDubRiceBoy
@uDubRiceBoy 3 ай бұрын
Thanks @fireship, does amd gpus enable parallel math processing ?
@livelife3051
@livelife3051 3 ай бұрын
Bro, your way to teach, much faster than my mind..
@vladislavkaras491
@vladislavkaras491 3 ай бұрын
Thanks for the video!
@devrim-oguz
@devrim-oguz 3 ай бұрын
You should do a video on SHMT (simultaneous and heterogeneous multithreading)
@superspies32
@superspies32 2 ай бұрын
I'm working on sequence alignment for NIPT results. Barracuda is the best thing I never heard.
@NoDebut
@NoDebut 3 ай бұрын
This is great! Thank you 👏
@gamemotronixg3965
@gamemotronixg3965 3 ай бұрын
Finally 🎉🎉🎉 I challenge you to do CUDA matrix multiplication using C
@zard0y
@zard0y 3 ай бұрын
This channel should go down the history is the greatest work done by humanity. Absolutely legendary introductions & quality level
@hyperpug2898
@hyperpug2898 3 ай бұрын
Wow what great timing to mention ZLUDA
@jason_max
@jason_max 3 ай бұрын
great visualization and videos indeed!
@k7ufo819
@k7ufo819 2 ай бұрын
Just subscribed for more "in 100 seconds" videos 👍🏻
@delta-function
@delta-function 3 ай бұрын
Will you make video about Intel’s OneAPI?
@CoughSyrup
@CoughSyrup 3 ай бұрын
While you are correct for crediting both Buck and Nichols for the prior work leading up to CUDA, I felt like it was important to point out that they did not both contribute equally to the research in question, as most people will agree that one Buck is worth about 20 Nichols.
@Kromface
@Kromface Ай бұрын
Early congrats on the 1M views!
@bonobo3748
@bonobo3748 3 ай бұрын
The video editing must take hours for each upload Well done brother
@survivalskillspodcast
@survivalskillspodcast 3 ай бұрын
Fireship, is smart, when are you creating the first ever teleportation machine?
@julendominadas4040
@julendominadas4040 3 ай бұрын
The fun part of your program is that it would take the same time to allocate that memory on the GPU than making the summ. Because of cpu pipelines, u would probably make about 4 integer sum per cycle. I dont know if this is dependant of AVX register. If someone can give more extended explanation i would be so glad !
@judevector
@judevector 3 ай бұрын
This is just mind-blowing 😮
@user-tl8le5mg4l
@user-tl8le5mg4l 3 ай бұрын
1:42, Typo Complier -> Compiler
@ScriptureFirst
@ScriptureFirst 3 ай бұрын
Outstanding visuals: I wish I had those when I first learned. Where get?!
Erlang in 100 Seconds
2:44
Fireship
Рет қаралды 438 М.
CPU vs GPU vs TPU vs DPU vs QPU
8:25
Fireship
Рет қаралды 1,6 МЛН
Smart Sigma Kid #funny #sigma #comedy
00:19
CRAZY GREAPA
Рет қаралды 8 МЛН
Универ. 10 лет спустя - ВСЕ СЕРИИ ПОДРЯД
9:04:59
Комедии 2023
Рет қаралды 1,3 МЛН
Writing Code That Runs FAST on a GPU
15:32
Low Level Learning
Рет қаралды 540 М.
Reacting to Controversial Opinions of Software Engineers
9:18
Fireship
Рет қаралды 2 МЛН
CUDA Simply Explained - GPU vs CPU Parallel Computing for Beginners
19:11
Python Simplified
Рет қаралды 234 М.
this is why you're addicted to cloud computing
5:25
Fireship
Рет қаралды 842 М.
How a CPU Works in 100 Seconds // Apple Silicon M1 vs Intel i9
12:44
how NASA writes space-proof code
6:03
Low Level Learning
Рет қаралды 2,1 МЛН
How do Video Game Graphics Work?
21:00
Branch Education
Рет қаралды 3,2 МЛН
Generative AI in a Nutshell - how to survive and thrive in the age of AI
17:57
Inside Nvidia HQ: What a $2T Company’s Office Looks Like | WSJ Open Office
7:47
The Wall Street Journal
Рет қаралды 1,5 МЛН
Why NVIDIA is suddenly worth $3 Trillion
12:49
Phil Edwards
Рет қаралды 220 М.
WWDC 2024 Recap: Is Apple Intelligence Legit?
18:23
Marques Brownlee
Рет қаралды 5 МЛН
Iphone or nokia
0:15
rishton vines😇
Рет қаралды 1,7 МЛН
сюрприз
1:00
Capex0
Рет қаралды 1,3 МЛН
wireless switch without wires part 6
0:49
DailyTech
Рет қаралды 3,9 МЛН
Урна с айфонами!
0:30
По ту сторону Гугла
Рет қаралды 6 МЛН