How to avoid cascading failures in a distributed system 💣💥🔥

  Рет қаралды 115,375

Gaurav Sen

Gaurav Sen

Күн бұрын

In this video we solve the thundering herd problem. This problem occurs when there are a huge number of requests on the server, and this results in the server crashing due to overloading. One of the solutions to this problem is rate limiting. In a distributed environment, the rate limiting problem is a complex problem to solve.
When designing systems on the server side, we often need to predict the capacity of a server and apply limits on the number of requests it can receive per second. This is mentioned in the service QPS document. QPS stands for queries per second.
The system design discussion helps us understand how to deal with requests based on priorities, how to deal with cascading failures, how to handle a large number of requests, viral posts or videos and their requirements.
Some other scenarios like job scheduling or cron jobs being fired in batches is also discussed. We use batch processing and approximations to reduce server load.
The last few approaches include gradual deployments and using caches to store common requests information. This helps us improve performance. Improving QPS and performance helps us handle more requests, which means more users and more money for the product.
Caching and coupling systems can also help improve performance. However, they must have time outs and appropriate cache eviction policies set while designing the system.
Request throttling is one of the many approaches we discuss, but it is the most important. Dropping requests helps the server recover and operate at the right requirement level.
We often use message queues for rate limiting requests.
References:
highscalability.com/blog/2012/...
blog.ably.io/how-adopting-a-d...
www.researchgate.net/post/How...
Social Links:
Facebook: / gkcs0
Quora: www.quora.com/profile/Gaurav-...
LinkedIn: / gaurav-sen-56b6a941

Пікірлер: 142
@zeref6437
@zeref6437 5 жыл бұрын
simply awesome. Learned new things. There videos are a great help thank you for making them.
@gkcs
@gkcs 5 жыл бұрын
Thanks!
@creative-freedom
@creative-freedom 5 жыл бұрын
I think the topic of this video should not be saying "rate Limiting" as it was not discussed in depth!
@anastasianaumko923
@anastasianaumko923 Жыл бұрын
Amazing job, very clear, thank you so much for creating these educative videos.
@notthatguy1923
@notthatguy1923 5 жыл бұрын
Hey Gaurav, the entire system design series is great. I was wondering if you could share how you break down a topic for research or making a video, or even pick a topic(less from a content creator point of view and more from an understanding pov) to begin with. Thanks!
@gkcs
@gkcs 5 жыл бұрын
I have the interview prep video where I mention some of the sources :)
@gkcs
@gkcs 5 жыл бұрын
Hey guys! I posted this video by mistake, about a week before it should be out. But while we are at it, enjoy :D Do tell me how you like the system design series, and post your suggestions and comments below!
@Aniruddhdwivedi
@Aniruddhdwivedi 5 жыл бұрын
Hi Gaurav, I really need your help to understand gRPC and protocol buffers, if it is possible please make a video on that
@alokprusty6759
@alokprusty6759 5 жыл бұрын
Would be great to add few functional design as well - How to design the rate limit API - How to design a distributed rate limiter that each micro service can use - Discuss about few simple rate limiting algorithms implementations (token bucket e.g. used in guava rate limiter)
@NannanAV
@NannanAV 3 жыл бұрын
Appreciate the hard work @Gaurav Sen.!
@SiddharthKulkarniN
@SiddharthKulkarniN 5 жыл бұрын
Very eloquent. Thanks for posting.
@gkcs
@gkcs 5 жыл бұрын
Thanks!
@ak20k6
@ak20k6 5 жыл бұрын
I have an interview coming up so watching this video again and I have to say, its Fantastic. There is one minor error at 13:38 - Solution to Going viral is Auto Scale and solution to Predictable Load increase is Pre-Scale (can be auto scale too actually) but 13:38 has it swapped.
@gkcs
@gkcs 5 жыл бұрын
Thanks for pointing it out 😁
@curiousbhartiya8410
@curiousbhartiya8410 3 жыл бұрын
I was about to point that out. Thanks.
@ajaykumar-oh2kz
@ajaykumar-oh2kz 5 жыл бұрын
Hi Gaurav, Thank you for awesome videos and sharing your knowledge, just one feedback,it would be better if you can turn off the auto focus mode of your camera. So many times video zoomed in and out during the recording. Thank you again for the awesome series on System Design :)
@gkcs
@gkcs 5 жыл бұрын
Thanks Ajay!
@anishtaneja5665
@anishtaneja5665 5 жыл бұрын
Hi Gaurav, Regarding that queue part for rate limitation , where does the exact implementation being done ? Also, is it. Implemented in the same way like other messaging queues like MSMQ, rabbit MQ etc. Thanks
@mukundsridhar4250
@mukundsridhar4250 5 жыл бұрын
Excellent video gaurav :)
@thedanglingpointer8411
@thedanglingpointer8411 4 жыл бұрын
Can we apply rate limit on the number of times camera shifts focus :D Great video!! Thanks.
@8Trails50
@8Trails50 4 жыл бұрын
Simply this is a discussion on what actually happens at scale (or rather, what HAS to happen): consistency has to be relaxed.
@MaoDev
@MaoDev 3 жыл бұрын
Where did you get your profile picture from?
@8Trails50
@8Trails50 3 жыл бұрын
@@MaoDev idk forgot
@divyeshgaur
@divyeshgaur 5 жыл бұрын
your videos are good. you could also have added more information on rate limiting algorithms. thank you for sharing.
@gkcs
@gkcs 5 жыл бұрын
Thanks!
@thndesmondsaid
@thndesmondsaid 4 жыл бұрын
6:02 'one small thing to remember here is that the client shouldn't be stupid' hahah
@AkanshaGupta
@AkanshaGupta 4 жыл бұрын
Amazing video : watching 3rd time. Hope KZfaq had this video cached. :)
@gkcs
@gkcs 4 жыл бұрын
Hahaha, thanks!
@raj_kundalia
@raj_kundalia Жыл бұрын
thank you for the video!
@SuboptimalEng
@SuboptimalEng 4 жыл бұрын
These are amazing!
@gauravkumar-ff3bv
@gauravkumar-ff3bv 5 жыл бұрын
At 2.38, Can the load balancer be smart enough to re-distribute the load of S1 to S3 and S4 or decide on the basis on computation power of each server?
@tamalmondal8550
@tamalmondal8550 4 жыл бұрын
As always the video was nice, Gaurav. I noticed that this is not part of your SD playlist, please check and may be you would like to update the playlist as many like us head over there when it comes to SD interviews. Thanks again for wonderful contents like this. :)
@gkcs
@gkcs 4 жыл бұрын
Thanks Tamal! I didn't find this detailed enough to be added to the playlist. There are a few which I have floating around outside the list, since I keep only the high quality ones in it 😁
@mayurranjan9185
@mayurranjan9185 4 жыл бұрын
Hi Gaurav, Could you please suggest any book for system design.
@kipkip6712
@kipkip6712 3 жыл бұрын
Title Suggestion. "Scale pitfalls and how to avoid them"
@prasadj8676
@prasadj8676 5 жыл бұрын
Not bad gaurav. This architectural pattern is similar to bulkhead partitioning from the book " Release it". I highly recommend everyone read it. It has many such problem solutions patterns.
@gkcs
@gkcs 5 жыл бұрын
Interesting 😁
@nayanchoudhary4353
@nayanchoudhary4353 5 жыл бұрын
Good stuff! Learning something new. Great content, Gaurav!
@gkcs
@gkcs 5 жыл бұрын
Thanks!
@SatyamKumar-so5yb
@SatyamKumar-so5yb 5 жыл бұрын
Hey Gaurav , I love to watch in depth architecture of Modern Uber Carpool (full system design) with optimisation solution, if possible. I know there is lot of online materials out there. About this video, Each solution has its own advantages, limits, use cases and wider horizon. May be this video will just give us introduction of the topic.
@gkcs
@gkcs 5 жыл бұрын
I'd like to talk about this detail too 😁
@prithviamruthraj
@prithviamruthraj 4 жыл бұрын
Hi Gaurav .. could you please make a video of leaderboard system design .. for example “a hacker rank global coding contest”
@alessandrocamillo4939
@alessandrocamillo4939 5 жыл бұрын
This is a good general survey of protecting/mitigating techniques. However each one would deserve a deeper analysis. Distributed rate limiter for instance would be good subject for a new video
@gkcs
@gkcs 5 жыл бұрын
That's a good idea Alessandro 😁
@ildar5184
@ildar5184 Жыл бұрын
What technologies represent or implement functionality of a rate limiter queue - proxies, load balancers, frameworks?
@vishnugoundhi5072
@vishnugoundhi5072 5 жыл бұрын
Hi, I really liked the video and learned something today worth. (There are camera focus issues I think which focus and autofocus frequently try to fix this!.) Other than that keep posting more about design patterns.
@gkcs
@gkcs 5 жыл бұрын
Thanks Vishnu! Your feedback is noted, thank you 😁
@arunkarepu
@arunkarepu Жыл бұрын
For popular posts, what about using streaming pipelines? By using streaming pipelines, process can be asynchronous and we don't need to update datastore every time. Instead we can use fixed windows and watermarking(to deal with late events) to save the data.
@bowang1825
@bowang1825 4 жыл бұрын
can you add this video to your system design playlist?
@rahulsinghai3033
@rahulsinghai3033 5 жыл бұрын
Hi gaurav can you please create video on creating a microservices based website like ESPN cricinfo
@vaibhavsharma1653
@vaibhavsharma1653 3 жыл бұрын
binge watching this playlist like got
@nirupama28
@nirupama28 3 жыл бұрын
The pewdiepie scenario you explained is an example of giving importance to availability over consistency(lag is fine).
@nirupama28
@nirupama28 3 жыл бұрын
We can also add more explanation on RPC call retries and response deadlines(as request moves through layers of system, stale requests can be rejected as deadlines expires), queue lengths for worker threads(more the number of requests in queues, more latency).
@amitagnihotri30
@amitagnihotri30 5 жыл бұрын
Another excellent video from you, can you please pre-focus and then lock it, focus switched too back and forth during this video.
@gkcs
@gkcs 5 жыл бұрын
Yes it was an issue with some videos then. Sorry for that 🙈
@amitagnihotri30
@amitagnihotri30 5 жыл бұрын
That just a minor issue, the content in ur vidoes has already been of great help to us. So thanks bro🙏🙏
@gkcs
@gkcs 5 жыл бұрын
@@amitagnihotri30 Thanks 😁
@iamkrishn
@iamkrishn 3 жыл бұрын
10:41 You've been hit by- A Smooth Criminal ! xD
@HusainDalal
@HusainDalal 2 жыл бұрын
Thanks!
@tusharthaker9399
@tusharthaker9399 2 жыл бұрын
Looks like the lettering on your t-shirt is really doing a number on the poor camera's auto-focus feature.
@godsonjoseph
@godsonjoseph 3 жыл бұрын
this is a cool video.. not sure why it has less views..
@GovindaSakhare
@GovindaSakhare 4 жыл бұрын
I'm terribly late to this video. Could you please recommend resources you used to prepare. Apart from highscalability and design data driven app.
@gokulramanansoundararajan5342
@gokulramanansoundararajan5342 4 жыл бұрын
@gaurav sen: I have a doubt.. in some of your videos and here, you mentioned, we can separate auth service from other services, and when a new request comes, api gateway will use auth service to validate the user and forward it to actual required service right? in that case, how will other service knows that particular request is validated by gateway? or whatever request forwarded by gateway is always validated?
@gkcs
@gkcs 4 жыл бұрын
The latter.
@kishorepola5930
@kishorepola5930 5 жыл бұрын
Is there any thing related to rate limiter design?
@vishaljain9634
@vishaljain9634 9 ай бұрын
Can we use faultolerance at API gateway itself and Caching to reduce call to actual server
@ShivamSingh-jw8ey
@ShivamSingh-jw8ey 4 жыл бұрын
Why is this video not in the System Design playlist you created?
@rahulsharma5030
@rahulsharma5030 3 жыл бұрын
add to playlist of system design.:)
@digisecureagent7679
@digisecureagent7679 3 жыл бұрын
very good
@sachindodti6733
@sachindodti6733 5 жыл бұрын
Please make a video on Dragger
@kennethcarvalho3684
@kennethcarvalho3684 3 жыл бұрын
I never felt so sad for a server as I did for S2
@faizanfareed9076
@faizanfareed9076 4 жыл бұрын
Keep it up bro
@amitpaliwal3544
@amitpaliwal3544 4 жыл бұрын
I appreciate the efforts....but where exactly is a design here? there are discussions of scenarios..but no particular deisgn for rate limiting..like how are you designing rate limiting solution here?
@ammarshareef462
@ammarshareef462 5 жыл бұрын
Maaan! You're Awesome :)
@gkcs
@gkcs 5 жыл бұрын
Thanks Ammar!
@shauryachamoli1155
@shauryachamoli1155 5 жыл бұрын
great video Gaurav. I was wondering how do companies manage API limits per user, like there could be a specific person trying to bombard your server with consecutive hits. Autoscaling or any other mechanism would not really be a good solution in that case. The server should be able to identify the source and limit the allowed hits by a user or IP. Do you have a case related to this?
@gkcs
@gkcs 5 жыл бұрын
Most of the rate limiting I have seen deals with mapping the request counts of users in a particular window. Simple rate limiters can rarely do a good job in a distributed environment, but are useful to limit huge spikes from 'bad' users.
@songs4enjoy
@songs4enjoy 5 жыл бұрын
Few observations 1. Some confusion about temporary vs permanent. Permanent does not mean there is an issue with the request client sent as the issue is not the client request rather its on the server. Again, I am not so sure what you meant by that. Can you clarify 2. Also, Caching authentication in other services is a bad idea unless the authentication service itself says how long the credentials are valid using say JWT (again, this has its own problems w.r.t logouts). So, the last option needs to be looked at for cases where business might not care if the data being served is completely stale
@shashankojha3452
@shashankojha3452 Жыл бұрын
For the 2nd point, the video might sound confusing but the point is we can cache the response from auth service in some other service (like gateway service). The response that could be cached is {user_id, JWT, TTL(Time to live)) and based on this cached information all the subsequent requests from the user can be authenticated (keeping TTL into consideration) without hitting the auth service everytime (reducing network calls)
@puneetkhanna7803
@puneetkhanna7803 4 жыл бұрын
Adding queue infront of a service say (book a ride /transfer funds ) ,how does the consumer of the message be able to send the response back to the user ? I am not sure if I got that for the case when you have POST creatBooking
@puneetkhanna7803
@puneetkhanna7803 4 жыл бұрын
May be processing audits , move the order to another system like logistics management ,it makes sense but how do I add a queue infront of a service say POST createOrder
@PiyushSingh-vx7bx
@PiyushSingh-vx7bx 4 жыл бұрын
🔥🔥🔥
@kanishkumar6176
@kanishkumar6176 5 жыл бұрын
Nice video
@gkcs
@gkcs 5 жыл бұрын
Thanks!
@ramganeshsudhakar8733
@ramganeshsudhakar8733 Жыл бұрын
The most common technique to deal with these are buffering . Lets say all your request hits Kafka First and Queued Up before you read it to assign to the servers, The backpressuring to the source will take care of auto healing , you never drop requests rather buffer it in the source ..
@gkcs
@gkcs Жыл бұрын
You need to drop packets in certain cases like a DDOS attack. Adding them to a queue will just eat into your resources here. Also, a request has a typical response time (10 seconds), after which serving it is useless. The client is likely to have retried the request anyway. You can log the requests. But adding them to a dead letter queue doesn't work either, since it will overflow with too many useless requests eventually. Have a look at InterviewReady's distributed rate limiting chapter for more details.
@ngneerin
@ngneerin 2 жыл бұрын
Auto scaling has never worked with any company in case of a multifold traffic event. It scales slow and only upto a limit
@Amritanjali
@Amritanjali 2 жыл бұрын
🔥🔥
@vijaydhanakodi5591
@vijaydhanakodi5591 5 жыл бұрын
@gaurav sen what are some good resource to learn system design
@gkcs
@gkcs 5 жыл бұрын
Highscalability. KZfaq. Blogs.
@user-ro9yu3hg5w
@user-ro9yu3hg5w 5 жыл бұрын
Can you go deeper into rate limiting? How it is implemented in a distributed system
@gkcs
@gkcs 5 жыл бұрын
Perhaps in a later video
@amanparihar1949
@amanparihar1949 5 жыл бұрын
Sir why you didn't accept request on Facebook 🤔 by the way I love system design series
@gkcs
@gkcs 5 жыл бұрын
Thanks!
@hjklmn9526
@hjklmn9526 3 жыл бұрын
Does Gradual deployments mean Rolling Upgrades ? :)
@ak20k6
@ak20k6 5 жыл бұрын
10:40 You could not hold yourself back, could you? :D :D
@gkcs
@gkcs 5 жыл бұрын
Subscribe! 😛
@kjangla
@kjangla 4 жыл бұрын
This is barely a rate limiting video. Change the title of this video. Also, you never talk about tradeoffs in your videos. Systems cant be designed without thinking about tradeoffs.
@jayeshudhani99
@jayeshudhani99 3 жыл бұрын
I think you should edit your video title to : How to solve thundering herd problem ?
@gkcs
@gkcs 3 жыл бұрын
Yes, updated :)
@sandeepsinghsingh3715
@sandeepsinghsingh3715 4 жыл бұрын
I think you have deviated from the topic of Rate limiting in this video. There is a lot to discuss on rate-limiting. So you should make one more explanatory video on rate-limiting.
@gkcs
@gkcs 4 жыл бұрын
I have one in my system design course. This is useful to get an idea of rate limiting.
@user-cg8hm3pp9c
@user-cg8hm3pp9c 6 ай бұрын
😍😍
@AnubhavShrivastava
@AnubhavShrivastava 5 жыл бұрын
suppose the rate limit is 300 requests per 2 seconds, that what should be the queue size? is the queue size equal to number of requests per second?
@gkcs
@gkcs 5 жыл бұрын
Yes.
@AnubhavShrivastava
@AnubhavShrivastava 5 жыл бұрын
@@gkcs So suppose, rate limit is 4 request per second: request 1: start: 0.1 sec, end at 0.3 sec and then removed from the queue by the consumer request 2: start: 0.2 sec, end at 0.3 sec and then removed from the queue by the consumer request 3: start: 0.3 sec, end at 0.4 sec and then removed from the queue by the consumer request 4: start: 0.4 sec, end at 0.5 sec and then removed from the queue by the consumer request 5: start: 0.5 sec clearly when request 5 will come, at that time the queue size will be zero, so how would the rate-limiting work? 1. do you suggest that we will create separate queue for every second. 2. or do we need to have a sliding window which is computed every millisecond? What am I missing?
@AnubhavShrivastava
@AnubhavShrivastava 5 жыл бұрын
a guy from uber has posted this article: medium.com/@saisandeepmopuri/system-design-rate-limiter-and-data-modelling-9304b0d18250
@gkcs
@gkcs 5 жыл бұрын
Yes, I have read it. The ideas there are good :)
@AnubhavShrivastava
@AnubhavShrivastava 5 жыл бұрын
@@gkcs thank you very much.. I have interview this week and your series is like everything at one place :)
@kaushikreddy2642
@kaushikreddy2642 5 жыл бұрын
Why did you skip notifications part when a popular user uploads some data( I mean how is it notified, you can’t notify a subscriber after 10 hours the video is uploaded) and how’s the changing trend of (from 0 to 1M) people accessing a pop video is handled which is real time problem and needs a hard solution.
@gkcs
@gkcs 5 жыл бұрын
Have a look at the Instagram design video on the channel. It discusses this in detail.
@doruwyl
@doruwyl 5 жыл бұрын
Is there a reason why this video is not part of System Design playlist?
@gkcs
@gkcs 5 жыл бұрын
I can make a more detailed video. Hopefully soon 😁
@TheKunalbansal
@TheKunalbansal 5 жыл бұрын
Why is this video not in your system design playlist. Would have been sad if I skipped this.
@gkcs
@gkcs 5 жыл бұрын
The video quality was an issue, with the camera focus shifting. I thought of shooting a "Part 2" video before adding this to the list. Glad you liked it though :)
@Mike-ci5io
@Mike-ci5io Жыл бұрын
Nobody retires after 5 minutes in the real world service more like in sub 5 seconds
@rahulsambyal
@rahulsambyal 5 жыл бұрын
I have few doubts, can I chat with you in direct messages, please?
@gkcs
@gkcs 5 жыл бұрын
I have an FB page 😊
@dionpinto3627
@dionpinto3627 5 жыл бұрын
Was your college fcrit?
@gkcs
@gkcs 5 жыл бұрын
Frcrce 😋
@dionpinto3627
@dionpinto3627 5 жыл бұрын
@@gkcs I feel for you😥😅
@navneetchoudry
@navneetchoudry 5 жыл бұрын
How about sharding
@gkcs
@gkcs 5 жыл бұрын
You mean sharding? How does that help?
@blasttrash
@blasttrash 5 жыл бұрын
scaring? You will scare your users away so they won't be able to make request? Genius.... :P
@gkcs
@gkcs 5 жыл бұрын
Easy buddy. He had said Scarding to be fair 😂 Nice to see blasttrash's comments 😛
@blasttrash
@blasttrash 5 жыл бұрын
@@gkcs haha I was just kidding. No offense to op. :)
@sumitchakraborty2475
@sumitchakraborty2475 3 жыл бұрын
This video is good in general, many topics have been touched. The title of this video is misleading though. Please correct it. The point on coupling with the example chosen with respect to authentication is confusing. In general , like your videos. Thanks for the hard work.
@khsmurthy
@khsmurthy 4 жыл бұрын
Gaurav: solution 8, pls remove it. it will do more damage than good. The example is even worse, it will open huge security holes. In scale deployments, replay attack can kill u.
@gkcs
@gkcs 4 жыл бұрын
Sorry I forgot, what is solution 8?
@tusharpandey6584
@tusharpandey6584 3 жыл бұрын
503 internal error, reason : NO 5:24
@ss2445
@ss2445 3 жыл бұрын
I was really hoping to see how to design a distributed rate limiter from the video title but was disappointed.
@gkcs
@gkcs 3 жыл бұрын
This talks about the problems faced in distributed systems related to rate limiting. You can see the solution and implementation in this course: get.interviewready.io/courses/system-design-interview-prep
@RohitKumar-mn9oi
@RohitKumar-mn9oi 4 жыл бұрын
there the concept of docker comes into play! whenever server fails automatically new server is created
@ravisoni9610
@ravisoni9610 4 жыл бұрын
video is not even aligned to the title.
@RahulGolwalkar
@RahulGolwalkar 4 жыл бұрын
You go too much way off the core topic .. would have been great if you d have focused on rate limiting
@HusainDalal
@HusainDalal 2 жыл бұрын
Thanks!
@gkcs
@gkcs 2 жыл бұрын
You are welcome!
마시멜로우로 체감되는 요즘 물가
00:20
진영민yeongmin
Рет қаралды 33 МЛН
Пранк пошел не по плану…🥲
00:59
Саша Квашеная
Рет қаралды 5 МЛН
Heartwarming Unity at School Event #shorts
00:19
Fabiosa Stories
Рет қаралды 20 МЛН
마시멜로우로 체감되는 요즘 물가
00:20
진영민yeongmin
Рет қаралды 33 МЛН