Single Producer Single Consumer Lock-free FIFO From the Ground Up - Charles Frasch

Single Producer Single Consumer Lock-free FIFO From the Ground Up - Charles Frasch - CppCon 2023

Рет қаралды 16,747

4 ай бұрын

cppcon.org/
---
Single Producer Single Consumer Lock-free FIFO From the Ground Up - Charles Frasch - CppCon 2023
github.com/CppCon/CppCon2023
In this session we will construct a Single Producer Single Consumer Lock-free Wait-free Fifo from the ground up. But why write our own if we can get one from reliable sources such as Boost.Lockfree? There are a couple of answers to this question.
* Writing such a fifo is a fairly gentle introduction to lock free programming.
* There are some interesting performance optimizations that can be made.
* There may be some specific requirements that are not met in out-of-the box implementations.
In the presentation we will first develop a simple circular fifo. Next we will make it thread-safe as well as lock-free and wait-free. After that we will address issues associated with cache coherency and false sharing in particular. We will show a few additional optimizations that can be added as needed to meet specific requirements. Finally, the performance of our fifo will be compared with a few out of the box implementations. Along the way we will touch on subjects such as thread safety, data-races, false sharing, object lifetime, and relaxed atomics.
---
Charles Frasch
Charles Frasch is a Senior Core Developer with the IEX Group where he is working to re-platform their core trading infrastructure in modern c++. Charles has worked in the financial technology world for more than twenty years in areas such as high-frequency trading and low-latency, high-performance infrastructure.
__
Videos Filmed & Edited by Bash Films: www.BashFilms.com
KZfaq Channel Managed by Digital Medium Ltd: events.digital-medium.co.uk
---
Registration for CppCon: cppcon.org/registration/
#cppcon #cppprogramming #cpp

Пікірлер: 38

@anon_y_mousse 4 ай бұрын

I've probably given this tip to more people than any other, but if you're optimizing around an array of any variety, use a size that's a power of two. If you need to constrain an index into such an array, a `bitwise and` is the fastest way to do so. Never use division/modulus and oddly sized arrays, even when setting up the backing store for a hash table. Prime number sizes may yield better distribution, but it's always at the expense of performance. Also, with regards to hash tables, always store the unnormalized hash with the keys so that when you need to resize the table you don't need to query the hashing function again to rebuild the entire table.

@phusicus_404 4 ай бұрын

Are there benchmarks which show that power of 2 size is better?

@Dominik-K 4 ай бұрын

I very much agree with this advice. Especially for anything that is performance sensitive, bitwise and is a go-to means to restrict value ranges.

@user-mc9wh3df5v 4 ай бұрын

While I completely agree regarding power of 2 + bitwise and on one minus the size, I would also not be surprised if the hardware didn't recognize this optimization. In my day job I constrain the size to be a power of 2.

@MenaceInc 4 ай бұрын

Seems like he was going to say the same thing at 8:30 before he lost his train of thought

@MenaceInc 4 ай бұрын

Ah, he gets it out by 9:50

@fabiogaluppo2635 4 ай бұрын

Great talk. We need more of this type of content in C++ talks!

@zmweb4828 7 күн бұрын

I’m glad I’m not the only one struggling to understand memory ordering 😂 To understand this talk fully, I read chapter 5 from the book “C++ Concurrency in Action” (2nd edition), which goes into depth on the topic of memory ordering. Had to read it twice to understand everything, but it was worth it. Thank you for this excellent talk!

@nathanieldoromal6904 4 ай бұрын

One of the best explanations for implementing a lock-free SPSC I've come across. Kudos!

@starchsky 4 ай бұрын

So many good things explained so well in one go. Bravo.

@dian-lunlin7349 4 ай бұрын

Great talk! I enjoyed it a lot.

@amaama4140 4 ай бұрын

Wow, what a marvelous presentation! I enjoyed every single second of it. I have always wanted some similar learning material to start with lock-free programming, and this one was just the right one. Thank you very much, Mr. Frasch.

@user-mc9wh3df5v 4 ай бұрын

You are welcome. The references in the slides are also very informative.

@DedmenMiller 4 ай бұрын

41:00 I had that issue too. My solution was to not store objects in the queue, but just store pointers. the new/delete are done outside. I used another fifo of "Available buffers", pre-allocated pointers that can be reused. Instead of new grab one from it, instead of delete push it back to be reused. That way I get rid of the push/pop memcpy's, and ALSO of runtime allocations. Problem is just that you have to handle when the push fails and the queue is full, you need to either throw the pointer back into the available buffers, or hold onto it until it can be pushed.

@yb9737 4 ай бұрын

Beautiful talk

@user-mc9wh3df5v 4 ай бұрын

Thank you.

@user-mc9wh3df5v 7 ай бұрын

I completely misunderstood Roi's question about 1:03. Yes, I agree that if you grab multiple pushers or poppers in the same block they will be destructed in the reverse order in which they were acquired. And, if you do that you must carefully consider that fact. The pusher and popper destructors just advance the cursor. So, it is possible that at the end of the block the cursors are correct. But, my real answer is, if you need to read or write multiple values from and to the FIFO you should extend the API to add functions that do that correctly.

@szirsp 4 ай бұрын

Well, yeah, they just advance the cursor... But because of that only the cursor values will be correct at the end! The FIFO won't be! Because when you are getting multiple pushers in a row they are getting a reference to the same memory, and when one lifetime ends only the cursor position is changes the still existing pushers won't be, so they will overwrite the previous item and skip one item when they go out of scope. Instead of advancing the cursor implicitly in the destructor I would use a commit() method. That way you can get a pusher, commit and then get another pusher in the same scope. I'm not sure if I would actually implement pusher and popper at all, just use some kind of peek methods to get direct access the write and read location and use commit and free or similarly named methods to advance the write and read head.

@youtou252 4 ай бұрын

how is your comment 3 months old when the video was published 2 days ago ?

@Jeanpierrec19 4 ай бұрын

Wonderful talk. Have done a similar implementation but never needed the extra performance over fifo2 so never looked into how it worked. Your talk explained everything very well. Thank you.

@user-mc9wh3df5v 4 ай бұрын

@@youtou252 I'm the presenter; they allowed me to preview the video.

@user-bk4ux9rv3t 6 күн бұрын

Great talk! Just a warning though, it is not a working FIFO, once pushCursor will get to the end of max size_t, it will start overwriting the buffer at [0] and on wards.

@LaserFur 4 ай бұрын

One way to handle a pointer request when near the end of the FiFo wrap around point is to allocate extra space past the main circular buffer. Then the get pointer function can in the rare cases where the buffer wraps around to do a copy into this extra space so that the caller has a linear buffer. and then the release can copy data back into the FiFo buffer. The only problem is when the FiFo can auto expand the buffer. In that case a mutex is needed and all callers can't be holding pointers. Luckily in my case I don't need lock free and the code just grabs a mutex for any operation and also keeps the mutex held between the getting of the pointer and the release of the pointer. I also use RAII in that my get pointer returns a class to release the pointer and the mutex.

@szirsp 4 ай бұрын

It's not clear to me what problem you are describing or trying to solve. But your proposal doesn't seem like a sound solution to me.

@ksvishvajith Ай бұрын

That gentleman looks so much like the cousin of Bjourne, cool presentation

@Justin_Wang 4 ай бұрын

Great talk. I wonder how hardware works and how often do fifo1 with out atomic get error, but after 10,000 times of test in release mode, it still work correctly, do any one have any hint of it?

@Justin_Wang 4 ай бұрын

I find out that x86 is a strongly-ordered system; everything is release-acquire by default. So, whether Fiio1 will pass the test or not is decided by how the program is compiled, not at runtime. Am I right about it?

@user-ms2if2un7f 4 ай бұрын

slide 28, in push function, shouldn't we load popCursor with acquire first, and then load pushCursor with relaxed, correct me if I misunderstand this.

@user-mc9wh3df5v 4 ай бұрын

From a correctness point of view the load order shouldn't make any difference. In the push function and while the cursors are being loaded the fifo can only become emptier. If the cursor comparison shows full but the reader thread popped something off, the function will return full and noop. The next time through a slot will be obtained and the push will succeed. The full condition is already the slow path. From a performance point of view it might be a bit more efficient to start the load acquire first since it is a more expensive operation than load relaxed. OTOH, the processor's instruction scheduler show run the loads concurrently. Regardless, the comparison cannot proceed until both values are available.

@rocknroooollllll 4 ай бұрын

Don't we have to compare/exchange in order to get thread safety here?

@user-mc9wh3df5v 4 ай бұрын

You can certainly use atomic compare and exchange. It is a more complicated and expensive operation that the atomic load and stores I use.

@szirsp 4 ай бұрын

12:07 While this would work for most, this is not correct [when capacity is not 2's power]. It relies on size_t not overflowing. (Which is a safe bet when it's 64 bit since it would take hundreds of years, but it is NOT when it's 32 bit or less, for example when running on a 32 bit or 8 bit microcontroller, where it only takes minutes to overflow.) To demonstrate the problem create a FIFO with 10 capacity and set the cursors to 0xfffffffaUL (or push and pop that many times), then push 10 items into the FIFO (compiled to a 32 bit platform). The write index will be 0,1,2,3,4,5,0,1,2,3. Meaning even though the FIFO was initially empty and had enough space to hold 10 items, it overwrites 4 of those items... and it will try to destruct some items twice. Also I usually don't like using modulo operation for indexing. Because capacity is not a compile time constant modulo division won't be optimized which could be a problem (mostly on processors without hardware division unit) (even when capacity is 2's power). I'd rather limit the cursor to valid index range / capacity to make sure it at all times contains actual valid array index (which makes debugging easier). (of course you would need to differentiate between full and empty in another way) I have written a number of lock free FIFOs for microcontrollers over the decades, so it seemed trivial to me that this is possible (even though the implementation might not be trivial, and may include memory barriers) I also realize that it might not be obvious for "modern"/young developers. I still wondered what could be said about them to fill out a 1 hour long presentation... So I still watched it (on 2x speed) to see if he ever addresses this overflow issue, and the talk did contain some interesting pitfalls. This shows that multi-threading and modern processors are complicated, so nearly everyone should use standard libraries instead of rolling their own FIFO.

@user-mc9wh3df5v 4 ай бұрын

That's right. The presentation serves its purpose as is. It would be far too complicated to to deal with ABA issues; perhaps I should have mentioned that.