Speed up your Rust code with Rayon

  Рет қаралды 45,443

Let's Get Rusty

Let's Get Rusty

Күн бұрын

Today we are learning how to easily parallelize your sequential Rust code with Rayon.
FREE Rust Cheat Sheet: letsgetrusty.com/cheatsheet
Chapters:
0:00 Intro
0:31 How to use Rayon
4:03 How Rayon works
6:04 Customization
6:16 Outro

Пікірлер: 63
@letsgetrusty
@letsgetrusty Жыл бұрын
📝Get your *FREE Rust cheat sheet* : www.letsgetrusty.com/cheatsheet
@mathijsfrank9268
@mathijsfrank9268 Жыл бұрын
I really like how your first example showed a situation where using multiple threads is actually slower. A lot of people/explanations talk about big Oh notation and performance as '(better big Oh || more threads) == more better'. But what often doesn't get mentioned is that it is highly dependent on the context and the amount of data you are working with. Optimization without benchmarking isn't optimization.
@Rigardoful
@Rigardoful Жыл бұрын
I wonder if the benchmark runs share the same threadpool. That'd explain a lot(Like the worst case for parallel 2 mill benchmark being about the same as non parallel)
@CottidaeSEA
@CottidaeSEA Жыл бұрын
The thing about multithreading is that there is a cost to initializing it. If the cost of that and joining the result is higher than simply doing it in one thread, it will be slower and require more resources.
@ckpioo
@ckpioo 9 күн бұрын
it's not "big Oh" it's "O", people say big to communicate that it's the capital version of the letter, but you don't need to say that in text.
@timwhite1783
@timwhite1783 Жыл бұрын
I definitely appreciate you showing a case where it isn't actually faster. Really helps highlight the importance of bench-marking if you intend to actually optimize.
@williamdroz6890
@williamdroz6890 Жыл бұрын
Polars could be a good fit for the next video (Lightning-fast DataFrame library). You could even benchmark the same way you did in this video.
@PavitraGolchha
@PavitraGolchha Жыл бұрын
Polars is great
@PavitraGolchha
@PavitraGolchha Жыл бұрын
Blazingly Fast
@BigFx
@BigFx Жыл бұрын
@@tardagardovarg you have to read that in the primeagens voice :) It is Blazingly Fast.
@LtdJorge
@LtdJorge Жыл бұрын
Thankfully, I already used it for my weekend ray tracer. It's pretty easy and scales perfectly for longer tasks.
@codeshowbr
@codeshowbr Жыл бұрын
great video! At 4:49 I had to check my glasses, then my connection, then I realized what you did :)
@SlexisSlacks
@SlexisSlacks Жыл бұрын
What's the joke?
@codeshowbr
@codeshowbr Жыл бұрын
@@SlexisSlacks which joke?
@SlexisSlacks
@SlexisSlacks Жыл бұрын
@@codeshowbr Why it is blurred. I thought you said "what you did" as in "I see what you did!", so as if there was a secret joke 😅
@DavidAlsh
@DavidAlsh 7 ай бұрын
Mate, I fkn love your videos. Every time I am stuck, you have a video there to save the day
@Perspectologist
@Perspectologist Жыл бұрын
The content was great. It was helpful to see that the parallel version can be slower in some situations. The stock footage was a bit distracting, especially the blurry bit (maybe that was an in-joke I didn’t get).
@user-yf3ec9ml1j
@user-yf3ec9ml1j Жыл бұрын
It reminds me of a Java Stream's API parallelStream method
@TheLomsor
@TheLomsor Жыл бұрын
Great overview. I heard of rayon and got a good impression back then but this kind of concise insight is much easier on my brain. I would appreciate such a treatment for Elementum once it's out.
@AD-rf4wf
@AD-rf4wf Жыл бұрын
Rust never ceases to amaze.
@jongeduard
@jongeduard Ай бұрын
Rayon is really amazing! It's actually incredibly performant. I have even been comparing with loop parallelization in Fortran and C which can be done by a compiler such as GCC, with a lot less guarantees. I tested it to perform faster, even though especially modern Fortran has some interesting features as well, such as a `do concurrent` loop and also so called array programming features. And apart from that, loop parallelization absolutely only works with the proper compiler flags, otherwise it does not. I like the expressive functional programming way of Rust a lot more, where that problem does not exist and Rayon handles it so much smarter, and you can tRust it. It also reminds me of Parallel LINQ in the C# programming language, which is similar, although obviously it cannot compete with Rust performance at all. In this whole daylight, I would also like to mention NDArray, which is a really powerful crate for multidimensional array functionality. With things like these, I totally see a very serious place for Rust in both game development as wel as in scientific parallel computing. Actually amazing!
@Direkin
@Direkin Жыл бұрын
Another crate to cover would be Polars or DataFusion. Both are DataFrame libraries based on Apache Arrow. Polars's documentation is a bit sketchy for Rust atm, and DataFusion appears to prefer doing everything asynchronously.
@yousifalfaki4389
@yousifalfaki4389 Жыл бұрын
amazing video my friend.
Жыл бұрын
I could only calculate the performance benefits of rayon and xargs with benchmarking. Is there a deterministic way to calculate the performance benefits for the specific task beforehand? In my case, I have large chunked files and have like 30 CPU cores in the computation center. Whenever I use rayon and xargs together, the performance somehow drops. Let's say the task is creating a frequency table where each line is the count of a quantity in these large files.
@fsaldan1
@fsaldan1 7 ай бұрын
I ran a comparison of similar code with and without rayon. The non-parallelized code ran more than 2X faster. But it did not call collect() so it wasn't a perfect comparison. I wasn't able to adapt the rayon code to run without calling collect() first. I was able to change the non-parallel code so it would call collect and then iterate. This was a more meaningful comparison, and the parallelized code was about 10% faster. The task was to add one billion f64 numbers all equal to 1.0.
@oxey_
@oxey_ Жыл бұрын
I've been making benches with criterion which works fine but for small tests like this I had absolutely no idea this way of benchmarking existed lmao
@AlwinMao
@AlwinMao Жыл бұрын
Curious what it does under the hood. If you wrote the parallelism yourself, for two cores, you would split the 200,000 items into two arrays and assign 1 core to each array, which ought to have minimal overhead. But if it splits the 200,000 items into 200,000 tasks which have to be stolen, that is a lot of overhead per small item. If you rewrote your iteration to be over chunks, and did counting over chunks, adding together at the end, would rayon perform better? You could divide task into 2,4,8,16,32,64,128 chunks and see how much performance degrades. But I bet even 128 chunks, which would spread out well over most CPUs, would have a 1000x better ratio of overhead to benefit than 200,000 individual tasks.
@polimetakrylanmetylu2483
@polimetakrylanmetylu2483 Жыл бұрын
Rayon's parallel iterator actually has a chunk(n) method that splits work into chunks of size N. I managed to make my performance be on par with serial solution for 200_000 elements by doing pub fn par_number_of_adults(people: &Vec) -> usize { people.par_iter() .chunks(4096) .map(|p|p.iter().filter(|&&p|p.age >= 18).count()).sum() } Notice that we now have map method that calculates a sum of number of adults per row as chunks changes the iterator to operate on rows instead of individual elements. Also, probably because I messed up, the inner filter method now operates on type &&&Person (does it do triple dereferencing?) As I mentioned, this solution is about on par with the sequential one, and for 2_000_000 elements it's twice as fast, just like in the video running 2 tests test tests::bench_number_of_adults ... bench: 45,952 ns/iter (+/- 1,795) test tests::bench_par_number_of_adults ... bench: 43,662 ns/iter (+/- 5,494) running 2 tests test tests::bench_number_of_adults ... bench: 580,440 ns/iter (+/- 65,093) test tests::bench_par_number_of_adults ... bench: 278,091 ns/iter (+/- 43,477) Also, I don't like this solution in general but use it often. I use Rayon for generating images as a hobby, and if you want to generate a pixel array, chunks() requires you to collect() twice and use flatten(). Here's my function that takes an image loaded from file and calculates distance between current pixel and nearest black pixel. calculate_distance(x, y, &Image, (usize, usize)) takes a coordinate and loops over every pixel in an image let buffer: Vec = image.as_raw() .par_iter() .enumerate() .chunks(128) .flat_map(|row| { row.iter().map(|(index, value)| { let index = *index as u32; if **value == 0 {0} else {calculate_distance(index%dimensions.0, index/dimensions.0, &image, dimensions)} }).collect::()}).collect(); Maybe someone who's smarter than me (not a big requirement) could come up with a better solutions
@Rigardoful
@Rigardoful Жыл бұрын
@@polimetakrylanmetylu2483 Got any idea how to check if the benchmark rounds happen on the same threadpool? Then it might not be able to use more than one thread at a time
@some84884
@some84884 Жыл бұрын
I want some video about Actor-model libraries
@jeffg4686
@jeffg4686 Жыл бұрын
Some day, they'll have chat-gpt3/4 integrated to show you how to fix your compiler errors, and/or fix them for you with a little "fix it" button. Then, Rust will truly be easy for all.
@NostraDavid2
@NostraDavid2 Жыл бұрын
ChatGPT now has APIs available, so who knows!
@Metagross31
@Metagross31 Жыл бұрын
What would be cool would be how to parallelize code across different CPUs (not threads of the same CPU) on the same machine, e.g. on a HPC cluster. In C you would use MPI for that. How would that work in Rust?
@noviriustomeisho6630
@noviriustomeisho6630 Жыл бұрын
Does Rayon understand resource limits when running in a container?
@zackrobat
@zackrobat Жыл бұрын
What's with all the stock footage?
@meetthereqs
@meetthereqs Жыл бұрын
How are your code snippets so swifty? What extension are you using? or are you making edits to the video?
@Schoksen
@Schoksen Жыл бұрын
You could code it out, delete everything and then just tab it back with Ctrl+y, no add-ons needed
@quicksilver1752
@quicksilver1752 Жыл бұрын
How did you learn and get this fluent in Rust?
@HAL-9000-
@HAL-9000- Жыл бұрын
make a video about tantivy (full-text search engine)
@isabelkaspriskie7726
@isabelkaspriskie7726 Жыл бұрын
On the second benchmark example, I think the error bars should have been called out. It compares 300 +/- 30 to 200 +/- 130 milliseconds.
@minatonamikaze2637
@minatonamikaze2637 Жыл бұрын
Nice one!
@meetthereqs
@meetthereqs Жыл бұрын
How did you know that the filter logic slowed down the benchmark for the par function? What were the tells?
@xphreakyphilx
@xphreakyphilx Жыл бұрын
The filter ran so fast the overhead of parallelism was greater than the gains of doing it in parallel
@shamaldesilva9533
@shamaldesilva9533 Жыл бұрын
Speed i need more speed 🥳🥳
@thebigVLOG
@thebigVLOG Жыл бұрын
If I have an iOS app using a Rust library, if I call the Rust library from Swift in its own thread does Rayon know not to block the main thread of the iOS app?
@japrogramer
@japrogramer Жыл бұрын
🙌
@EvertvanBrussel
@EvertvanBrussel 4 ай бұрын
So, if I understand correctly, in your example code, parallelizing it only made sense when you were processing a vector of at least a certain length. And it would've only made sense with a smaller vector if your filter operation had been more expensive, right? And obviously Rayon can't see how expensive your filter operation is, so Rayon can't make an educated guess about when the vector is long enough to justify parallelizing it. In that case, wouldn't it make sense if Rayon would offer a method like `people.iter_par_if(people.length > 2e6)` ?
@albert_rocha
@albert_rocha Жыл бұрын
I used rayon in a presentation for my job and compare with a JavaScript operation. I implemented the same database query and order the result with a quick sort and Rayon. The Rayon was 7x better than JavaScript code.
@Ir0nman55
@Ir0nman55 Жыл бұрын
Using indexes properly and ordering in the query itself is probably like 1000x faster though
@hck1bloodday
@hck1bloodday Жыл бұрын
​@@Ir0nman55 yes, and no, asuming that proper indexing is already in place, since scalling horizontally is far more easier in the application server than in the database, if the number of elements returned is the same, ordering and doing other types of calculations is better on the application since you release the database early allowing it to perform more work for other queries.
@blueredgame
@blueredgame Жыл бұрын
I've tried it a few times (20 000 000) and it's always slower using rayon ? :/
@farzadmf
@farzadmf Жыл бұрын
Please cover nom
@chinoto1
@chinoto1 Жыл бұрын
I'm always hoping for more, but you only ever scratch the surface. I guess this channel is more for... discovery of topics?
@everyhandletaken
@everyhandletaken Жыл бұрын
If only the video was sponsored by Raycon..
@hck1bloodday
@hck1bloodday Жыл бұрын
looks good but I would preffer paralleel_iter rather than par_iter, for readability
@edhahaz
@edhahaz 11 ай бұрын
A multithreaded engine? I mean... why not?
@corinnarust
@corinnarust Жыл бұрын
chi chi
@pineappleexpress2307
@pineappleexpress2307 Жыл бұрын
The
@TheSkepticSkwerl
@TheSkepticSkwerl Жыл бұрын
API is "a set of functions and procedures allowing the creation of applications that access the features or data of an operating system, application, or other service." a Library(crate) is code you can use to simplify your programming needs. A method is a function that applies to a specific object type. a function is a block of code. When ever you call rust crates (or parts of a crate) an API. it really confuses me. I'm not sure why the rust community adopted this. But it is whole heartedly wrong. I wish they would stop.
@JOHNSMITH-ve3rq
@JOHNSMITH-ve3rq 9 ай бұрын
Bro the B roll is super dumb and distracting
@tototitui2
@tototitui2 11 ай бұрын
Parallism is not "magical", this video should talk about SIMD, memory bottlenecks etc...
@babuOOabc
@babuOOabc Жыл бұрын
rust brawser with the new version off tor, artic. rust with godot. rust with blockchains and dags.
8 deadly mistakes beginner Rust developers make
14:14
Let's Get Rusty
Рет қаралды 157 М.
All Rust features explained
21:30
Let's Get Rusty
Рет қаралды 286 М.
Универ. 13 лет спустя - ВСЕ СЕРИИ ПОДРЯД
9:07:11
Комедии 2023
Рет қаралды 6 МЛН
Rust Functions Are Weird (But Be Glad)
19:52
Logan Smith
Рет қаралды 127 М.
Rust multi-threading code review
12:13
Tantan
Рет қаралды 195 М.
Intro to async/.await in Rust
13:57
Let's Get Rusty
Рет қаралды 83 М.
Rayon: Data Parallelism for Fun and Profit - Nicholas Matsakis
24:43
Rust Belt Rust Conference
Рет қаралды 29 М.
Async Rust Is A Bad Language | Prime Reacts
28:46
ThePrimeTime
Рет қаралды 88 М.
The Flaws of Inheritance
10:01
CodeAesthetic
Рет қаралды 902 М.
How to fight Rust's borrow checker... and win.
8:29
Let's Get Rusty
Рет қаралды 37 М.
Simple error handling in Rust
23:46
Let's Get Rusty
Рет қаралды 31 М.
The Rust Survival Guide
12:34
Let's Get Rusty
Рет қаралды 130 М.
Обзор Sonos Ace - лучше б не выпускали...
16:33
Нашел еще 70+ нововведений в iOS 18!
11:04
Will the battery emit smoke if it rotates rapidly?
0:11
Meaningful Cartoons 183
Рет қаралды 30 МЛН
Iphone or nokia
0:15
rishton vines😇
Рет қаралды 1,8 МЛН
#miniphone
0:16
Miniphone
Рет қаралды 3,5 МЛН