What is CONSISTENT HASHING and Where is it used?

Рет қаралды 770,533

Күн бұрын

Load Balancing is a key concept to system design. One of the popular ways to balance load in a system is to use the concept of consistent hashing. Consistent Hashing allows requests to be mapped into hash buckets while allowing the system to add and remove nodes flexibly so as to maintain a good load factor on each machine.
The standard way to hash objects is to map them to a search space, and then transfer the load to the mapped computer. A system using this policy is likely to suffer when new nodes are added or removed from it.
Consistent Hashing maps servers to the key space and assigns requests(mapped to relevant buckets, called load) to the next clockwise server. Servers can then store relevant request data in them while allowing the system flexibility and scalability.
Some terms you would here in system design interviews are Fault Tolerance, in which case a machine crashes. And Scalability, in which case machines need to be added to process more requests. These two principles are allowed by Consistent Hashing, and hence it is an important building block to a system design architect's toolbox.
Another term used often is request allocation. This means assigning a request to a server. Consistent hashing assigns requests to the servers in a way that the load is balanced are remains close to equal.
Server architecture is a subjective concept, and there are outliers for many cases. Don't think of Consistent Hashing as a silver bullet for fault tolerance and scalability, but a useful concept for request allocation.
Use it to solve software questions in interviews and real life. Best of luck!
Prerequisite: • What is LOAD BALANCING...
Recommended system design video course:
interviewready.io
00:00 Request Hashing
03:00 Request Mapping
06:02 Problems
07:01 Virtual Servers
09:40 Applications
10:18 Thank you!
Along with video lectures, this course has architecture diagrams, capacity planning, API contracts and evaluation tests. It's a complete package.
References:
www.hackerearth.com/practice/...
www.tomkleinpeter.com/2008/03/...
michaelnielsen.org/blog/consis...
• Consistent Hashing - G...
System Design:
highscalability.com/
• What is System Design?
Code:
github.com/coding-parrot/Syst...
#consistent-hashing #system-design #load-balancing

Пікірлер: 659

@pranay020692 4 жыл бұрын

Sitting in the hotel room, watching this 1 hour before my google interview in New York. Thanks Gaurav!

@gkcs 4 жыл бұрын

All the best!

@gkcs 4 жыл бұрын

@@pranay020692 wow, tough stuff. How'd you reckon it went?

@pranay020692 4 жыл бұрын

@@gkcs I believe, It went well. I have watched most of your system design videos, they were quite helpful. I am on the junior side 3 YOE so I think they went easy on me in Sys Design. Also, I was able to complete all coding questions in time. Google is always a long shot though. 🤞🤞

@karthikmucheli7930 4 жыл бұрын

@@pranay020692 hope you got the job

@Leptoszom 3 жыл бұрын

You got the job, Bajpai?

@shreysom2060 3 жыл бұрын

I used to see your "Competitive Programming" videos before getting into a company and now after getting learning things there ,I am watching your "System Design" it feels good to grow with this channel. Thank you so much 😊

@headoverbars8750 3 жыл бұрын

What an outstanding video! No shortage of tutorials on how to code or write algorithms out there buy not enough on Systems design... This is truly outstanding... been writing software 10 years and fringely do I touch these concepts, heck work within them daily yet either forgot or never knew. Thanks so much!!

@SP-db6sh Жыл бұрын

This channel is like System-Design Wala , far far better than most paid courses, simple explanation

@andreimarculescu911 5 жыл бұрын

the best solution is not to use K hash functions, but to generate K replica ids for each server id. Designing K hash functions while maintaining random uniformity and consistency is hard. Generating K replica ids is easy: xxx gives K replicas xxx + '1', xxx + '2', ..., xxx + 'K'. Then you take these replicas and generate K points on the ring with the same hash function and this is what is actually used in practice. Chord algorithm is just an example of this technique to add K replicas for each server id

@gkcs 5 жыл бұрын

That makes sense. K numbers assigned to each server would do the job :)

@pradipacharjee4915 5 жыл бұрын

Hi Andrei, can you just tell me how to choose idle replica count(k) ? for efficiently add or remove servers.

@dudejaa 5 жыл бұрын

The example that you took mentions xxx+1,+2,+3...+k. Correct me if I am wrong but if you assign k consecutive numbers to the same server the load wouldn't distribute (on adding or removing a server) uniformly. That could be one reason to look for different hash functions ?

@charchitpatodi8677 5 жыл бұрын

@@dudejaa Just a thought : he probably not means +1, +2... instead if xxx is id, M is ring capacity and k is number of servers then second position (after hash(xxx) )will be hash(xxx) + (M/k) OR hash(xxx+M/k).. And probably third position will be hash(xxx) + 2*(M/k) and so on till multiple of 'k'

@rishabhmalhotra7058 5 жыл бұрын

@Abhishek Dudeja xxx, xxx+1.. are ids for one server to take a hash on and then reach the respective points on the ring, not the points on the ring itself. And then the hash generated on xxx and on xxx+1.. would be completely different and random, and hence would plot k uniformly random points. @CHARCHIT PATODI I dont think that's the case cause if you think about it , if you add multiple servers each with k different points with that technique -> hash(xxx) + 2*(M/k)..till K, then you're not really randomizing and there would be no difference between adding 1 point or k points per server when it comes to choosing a server for a request. It would be like if you multiplied the ring length into k after choosing one point per server which would not get us what we want.

@timurmukhtarov1319 4 жыл бұрын

This was amazing! Havent seen other videos that talked about provisioning virtual servers/using multiple hash functions! Hooked!

@akshatagrawal3300 3 жыл бұрын

You are simply amazing gaurav, system design concepts couldn't be explained better than this!

@bouzie8000 5 ай бұрын

That virtual server solution blew my mind I'm so sorry. Geniuses have really paved the way for us in computer science.

@user-oy4kf5wr8l 4 жыл бұрын

u r amazing Gaurav! i watched ur video one year ago, i didnt understand then, now i watch again lol ...i understand most of it... thank u !

@Justinkol 6 жыл бұрын

Thanks for making these videos! I was always unsure about load balancing, but this helped explain a lot of my unanswered questions :)

@gkcs 6 жыл бұрын

Glad it helped :)

@AbhishekKumar-ub8co 5 жыл бұрын

I loved the way with ease and simplicity you explained the problem using some pictorial diagram. Good work keep it up!!

@gkcs 5 жыл бұрын

Thank you 😋

@AbhishekChoudhary-tu7ig 3 жыл бұрын

I am a 3rd sem student and I guess I should not be bothering about these things but your explanations are sooooo gooood that I always wanna watch them :D

@nankitable 4 жыл бұрын

With multiple hash being applied, can there be case of collisions, i.e. multiple servers ending up on the same bucket? If not , why? If yes, how is it handled?

@UlfAslak 2 жыл бұрын

Notes to self: * The previous video gives the impression that there is a mapping from ranges of integers to server ids, and that consistent hashing is about to mapping request ids to integers in ranges resulting in more consistent routing of requests to same servers. -> I did realize that this would not work very well over time, as you would end up completely changing the ranges for higher-index servers with the addition of multiple servers. * In this video, requests ids map to an index in a ring with `M` indices. The "trick" then, is the map the server indices to indices in the ring using the same hash function that also hashes request ids. Now, to assign a server to a request, one simply looks clockwise for the nearest server. * To make it less likely that load will be unbalanced due to (what I would call) unlucky hashing, another idea is used: simply have multiple hash functions for the servers, such as to map them to multiple locations in the index ring! (clever). * @Andrei Marculescu points out that better than using multiple hash functions for server ids, it is easier to maintain multiple aliases for each server id ("...xxx gives K replicas xxx + '1', xxx + '2', ..., xxx + 'K'.") and thus map servers to multiple locations in the index ring.

@Luk3Stein 2 жыл бұрын

Thank you!! I was having so much doubts after watching, reading this made it more clear.

@codingfork6708 2 жыл бұрын

How can we determine the value of `M`? Is [0, M-1] the range of the output of the hash function?

@UlfAslak 2 жыл бұрын

@@codingfork6708 Correct. I think there are good heuristics for choosing M (and probably everyone uses the same standard values). Your hash function has to apply modulus M, otherwise you get an index out of range.

@nxpy6684 Жыл бұрын

Thank you! This helped me a lot!

@krishnasandeep4779 5 жыл бұрын

Your Videos are very informative. Thanks for making it Gaurav. Your explanation is crisp and Clear

@jeffruan7701 5 жыл бұрын

Knowledgeable and confident presenter!

@johnleonardo 3 жыл бұрын

your content is insanely good. seriously, the best! you were destined to teach others!

@harshdusane8687 5 жыл бұрын

Awesome explanation. This has truly elevated my understanding of Hashing and Load Balancing in general. Keep up the good work!!!! :)

@baby_adventures 4 жыл бұрын

If we add a new server in this consistent hashing ring then again caching problem will remain same? The requests which was going through s3 before adding new server are now handle by s4.. so, s3 cache for those requests will be useless? Please explain

@consistentthoughts826 3 жыл бұрын

When you said try to think of solution, first thing come to my mind is "change the hash function" Thank god i'm understanding it well and its first time I am studying

@shishirkumar8335 5 жыл бұрын

Great video. One comment is using consistent hashing seems good option for distributed search scenarios (like you pointed for distributed cache, DB search algo) but not for use cases of load balancing where nodes are added to server large numbers of request (like web servers, applications etc). Please comment your view

@SuiMizu 5 жыл бұрын

You are a really good teacher, Gaurav! Please keep up your good work! :)

@gkcs 5 жыл бұрын

Thanks!

@asafmesika Жыл бұрын

Brilliant explanation! I read the Wikipedia article on this and Cassandra docs and your video clicked everything together!

@Reichawalia Жыл бұрын

how did you add users to the ring ? after taking hash value of request ID, did you took remainder of it by number of servers ? which implies if server count changes, user position will change.

@jananiravichandran8370 6 жыл бұрын

Thanks for doing this! Your videos have really helped me understand things better =)

@gkcs 6 жыл бұрын

Thanks Janani!

@NehaKumari-my7cv 4 жыл бұрын

Hi Gaurav, Thanks for sharing such a nice concept.I have one doubt what happen if one server die suppose s1 for 2 hr and then again come back after that so in this case how request are handled.

@abdelrhmansamir1426 2 жыл бұрын

What would happen if there is a collusion when you calculate the virtual servers? I mean if h1(S0) = h2(S1) = 1. So there are 2 servers with the same ID right?

@jrajesh11 3 жыл бұрын

Simply brilliant and clear explanation . Keep doing such awesome work.

@jeyakumar4728 4 жыл бұрын

Hi Gaurov, Wont removing / adding servers to the cluster affects the hash function modulo(%) Example: initially we have 4 servers hash(req for same id) % 4 -> s2 if we remove 1 server :- Hash(req for same id) % 3 -> s1 in this way, still the server 2 have stale cache data right?

@gautamtyagi8846 3 жыл бұрын

many thanks Gaurav for making this concept so clearly explained.

@maitrivasa613 3 жыл бұрын

This question might have been asked before, but how do we choose the value of M for the ring? And do we increase M if the no of requests increase such that one slot in the ring can only contain either one request or one server

@saurabh1203 3 жыл бұрын

What if the 2 different hashing algorithms for 2 different servers produce same result ? Like for S1, H1 gives 19 and for S4, H2 gives 19. Now both of them will be placed at same location in the ring. What is the solution for this ?

@NitishGupta-hs2rj 5 жыл бұрын

Hi Gaurav.. by applying this circular algorithm for consistent hashing problem of local caching will still be there.. bcs every time your hash function changes your server range will also change then how this can be better approach wrt traditional hashing ?

@fiveyearclub6024 5 жыл бұрын

Super helpful, thanks! I never got a CS degree and needed to learn more about sharding.

@GauravSharma-wb9se 2 жыл бұрын

what will be the M for other Hash functions ? it can't be same otherwise we'll get same value...so either M must be changed or input value must be changed so which one we should change ?

@emretapci 2 жыл бұрын

To which server does a request get routed to if two servers map to the same hash value?

@theappareddy8376 3 жыл бұрын

Hi Gaurav how can consistent hashing can solve hot user server load problem. Lets say we are sharding by user id even after consistent hashing hot user server ger the same load

@raghuvamsi8740 4 жыл бұрын

After this video, I downloaded the entire playlist!! More love More support!! Gratitude _/\_

@user-qz2qj8yk5o 19 сағат бұрын

Gaurav I have few questions: 1) We discussed, one of the drawbacks of the simple hashing approach is that when a server is removed then the cached information it stores regarding ReqID or UserID is lost. How is that solved in Consistent hashing? 2) In Consistent hashing, server Id's are also hashed and placed somewhere in the range [0 to M-1] what if two servers end up getting simultaneous hash, example 37, 38. In this case no requests will be send to one, in a clockwise setup.

@pranavp8023 Жыл бұрын

Do you recommend this method to implement custom loadbalancing server for myself over purchasing from other companies like AWS ?

@YoursTrulyAltruist 4 жыл бұрын

what if you hash two servers to the same place on the ring while using k hash functions ?

@ssriharikrishna 3 жыл бұрын

Maybe a noob question. In this design. Doesn't a request coming from a user (unique request Id) always map to the same server? Since in hash(requestId)%M hash and M give predictable outputs to each input. Or is there a randomising component to the hash that I'm missing ?

@makjn10 4 жыл бұрын

What happens if there is a collision between servers (they are mapped to same place on the ring) ?

@krutikpatel906 5 ай бұрын

Can it be used to distribute workload among worker nodes? I see it mainly used in db applications

@KejriwalBhakt 2 жыл бұрын

Just a query, It looks all good with having K replicas of every servers. But for that we need K hash algorithms. As far as I know in Python we have SHA-3 hashing algorithm being used. Don't you think having K different hash function which is consistence in giving results for a particular object/number is same, hard? Please do let me know.

@keshavabhamidipaty3126 4 жыл бұрын

Great video! I was wondering though, with this architecture, do you have to ensure that the hash functions don't ever collide though right? What would happen if an incoming request suddenly mapped to two servers that fell on the same point?

@gkcs 4 жыл бұрын

It's answered in the other comments 🙂

@bhaskartiwari7164 4 жыл бұрын

@Gaurav Sen If we taking request ID exam: h(r) = 12345 how this has value mapped with server ? means request and server mapping , because server having modulo but request not . if we make modulo and add/removed server then it will not get correct response. As I feel if we do not take modulo to server and request and request hashing value goes to more than hashing value to server and hashing function and so On.... can you add your point ?

@arnabthakuria2243 2 жыл бұрын

Learned a lot from the actual implementation in the attached git repo . Thanks

@iammehrabalam 4 жыл бұрын

@Gaurav Sen Can you plz explain or share any resource about Adding a node in the live system where consistent hashing is used and how insert and search query will work?

@aayoushsrivastava1867 3 жыл бұрын

I have doubt that is it possible that the servers can overlap with this consistent basing ? If yes what would happen in that case ?

@smitmandavia5044 5 ай бұрын

Hi Gaurav, Thanks for the video! How about we divide request into M groups and assign each group to a given server. By say keeping a map? If a server goes down, we can remap its ids to other servers randomly. If new server is added, we can take one group from each server? Is having a mapping somewhere makes requests slow??

@vinieth 4 жыл бұрын

What value should i define for the constant M.

@harshmashru Ай бұрын

what if the number of requests is decreased or increased? In that case, M would not be constant right and the value of K would be changing, correct?

@anushak9117 2 жыл бұрын

Gaurav, what if number of requests increase (i.e., m) in that case also server consistent hashing value keeps on changing right??

@perfectlyfantastic 4 жыл бұрын

8:33 it was told that k value should be log(M),Is it just a suggestion or its the value we should definitely consider

@roamwithashutosh 4 жыл бұрын

🙂

@subhabera5775 4 жыл бұрын

Legendary tutorial, specially I really like where you try to prove your logicswith mathematical equations, same goes for one of the video called "finding loop in a linked list". Thanks Gaurav again :)

@dharmiknaik1772 4 жыл бұрын

I'm stuck at the point that for every new request (even for the same client) a new request id will be generated.

@RajeevSoni007 4 жыл бұрын

@Gaurav Sen , this is the Cassandra ring architecture. right?

@jyotir124 2 жыл бұрын

Thanks Gaurav. One question, can consistent hashing be applied on SQL db

@giobaldu 4 жыл бұрын

Great video! Question: where do the requests sit in practice? Is there a node acting as a scheduler dispatching request by request, or the requests are mapped immediately to a server and kept internally in memory? Or both, so that the requests can be rescheduled if the server goes down? (I suppose this would require the scheduler to periodically ping each server, or set a timeout). What happens if the scheduler goes down? Second question: would it be possible to use work-stealing instead do reduce inbalance? Whenever a server is out of work, it would steal a request from the back of the queue of another random server. Or could this skew too much the execution order of the requests?

@gkcs 4 жыл бұрын

Thanks! The load balancer is a service which needs to tell the other services where a request is to be routed. It can either be queried per request (which is very expensive), or a snapshot of the current assignments can be cached by all services. If the snapshot changes at the load balancer, it can notify all interested clients. The service is distributed and backed by a 'reliable' database, so a single failure won't take the system down. Second answer: It sounds complicated and I have never seen it implemented on a large scale system.

@Naton 2 жыл бұрын

as a normal distributed system engineer, is this something you would have to implement in code? or it is a feature most distributed software allows you to configure? eg. kubernetes?

@sivas09 3 жыл бұрын

'n' being the number of servers and 'm' being possible hash values, would spacing out the servers at a value of m/n be a working solution? For ex - with m as 256 and n as 4, first server could be at 64, second be at 128, third at 192 and 4 at 256 - along those lines Understood the possibility of skewed allocations and the need for replicating ids tho. Hooked to your amazing content! kudos

@azeeztaiwo2802 3 жыл бұрын

best explanation of consistent hashing i have seen so far.

@ishaangupta4941 5 жыл бұрын

Why not make a virtual server for every server at its diagonally opposite end? will it be efficient?

@ashishmittal7048 Жыл бұрын

Thanks for the amazing video and describing the ring buffer based design for load balancing. I am wondering how this design will work efficiently when say for an example a subset of users are making too many requests? Because of consistent hashing the requests may land to the same machine , and certain machines might get more work assigned whereas all other machines are starving for the jobs.

@jatinderarora2261 5 жыл бұрын

Thanks Gaurav. Excellent video.

@sridharbalabhadrapatruni1247 3 жыл бұрын

@Gaurav, What would happen to the caching that we talked about in the initial part of the discussion? I understand caching is not going to happen because the requests are too randomized for caching to occur. Is this algorithm so efficient that even without caching it's more efficient than having an algorithm that relies on caching? Also, as a performance engineer, i dealt with load balancing a few times, but never got to see these kinds of algorithms for load balancing. we have implemented algorithms that distributed load across servers based on several parameters such as - geographic proximity of the request to the server, hardware utilization (Server with least CPU, RAM, utilization gets the request), Least connections(Server serving the least active connections gets the new request), etc. Do you have anything to say about those logics, and whether they are related to the hashing algorithm we have seen...

@kaustubhparmar4274 8 ай бұрын

May be late, but I think the caching will happen because the hashes will always return the same output for same input, so if the servers do not change then caching is not affected. But if the number of server changes we need consistent hashing so as to minimise the remapping of the request to the server.

@gauravganna 2 жыл бұрын

K hash functions can have collision. Two hash functions assigning same search space to different servers. How should this be avoided?

@vedantpate4827 6 жыл бұрын

When you use k hash functions for all servers how do you handle the case when two servers hash to the same value. Which server is going to serve the user request in that case or will you just choose M so that such case won't occur?

@gkcs 6 жыл бұрын

It's highly unlikely to happen. M is really huge 😊

@325venkat1 3 жыл бұрын

Gaurav: What is maximum value to use for the ring size i.e. M? And how to design the ring node size efficiently?

@SayHelloMeetPatel 5 жыл бұрын

Very nice explanation. Really liked the video. Thanks. Keep making it. 👍

@gkcs 5 жыл бұрын

Thank you!

@rahulkushwaha7896 2 ай бұрын

why not always devide the server equally on the ring, on any event for addition and removal. ?

@aniket5736 2 жыл бұрын

Will consistent hashing be irrelevant if we use caching server such as redis ??

@divyaprakashjha8389 4 жыл бұрын

How many level of hashing should be done for each server? twice, thrice...?

@rakeshvarma8091 3 жыл бұрын

Gaurav, This video is wonderful Have small doubts Let's assume that request R1 is served by server S1. Now we have added a new server S2. Because of this let's assume the request R1 is now coming to S2. How the above scenario gets handled ? Is it like when a new server S2 is added , we have to move some portion of the data from the existing servers (S1) to the new server S2 based on its position on the ring? If it is the case, how can we do the distribution in real time ?

@gauravluthra7959 4 жыл бұрын

Hi Gaurav, I understood this load balancer. Can you help me with my requirement? Suppose my key is "key1" if its first request is served by "S4" Server, Now if "S4" is down, Then with consistent hashing "key1" will be served by may be "S4" (which is adjacent to "S4"). Now even if "S4" comes up, Then I want "key1" to be always served by "S5" only. (Even when "S4" is now up.). Now if "S5" goes down, Then may be i can go back to its original server "S4" otherwise "key1" shall be served by "S5" only. And I do not want to maintain any state at my Load balancer per key wise. Thanks in advance.

@nafeezahid214 5 жыл бұрын

Excellent video Master. Thanks a lot.

@darkreaper4990 3 ай бұрын

I am not a computer science student so the remainder calc. is kinda not intuitive. why do we use remainder here? sorry for the stupid question but I need to know.

@siddhartheswaramoorthy6413 3 жыл бұрын

Gaurav, In case of a server failure the next server in the ring will serve the load which means the user has to re-login right?, Why don't we use a session store using any in-memory data stores and store the session. So that we don't need to worry about the server going down we will still have the session data in the session store.

@vikramr3295 3 жыл бұрын

Gaurav, how do we do this in the real world? Adding a server to the ring is more complex than it appears to be. The data constantly changes in the real world, so how do we ensure that the hits to the new server are all served appropriately?

@tanvirt16 3 жыл бұрын

Gaurav, thanks so much for your videos! Very informative and easy so follow despite the complexity of the concepts. Just had a couple questions from this video! #1 So one of the original problems in the regular hashing solution was that when you add a new server, you'd have to destroy much of the local caches of the other servers because they become useless, which makes sense. So in this case, less changes occur, but how would you update the local caches to make sure you don't have to clear out the entire cache? Do you need some form of algorithm to determine what cache items should be evicted? #2 Also, how about the algorithm required to determine what the "closest" server is in the ring which will serve the request? Is there a simple mathematical solution for that, or is it somewhat complex? It does seem that the additional complexity in maintaining a consistent hashing system is worth the advantages, just want to understand a bit about how complex it actually is, or if it's simply just a genius solution to a problem.

@soumyajitdas4433 Жыл бұрын

Try looking into Chord Algorithm (en.wikipedia.org/wiki/Chord_(peer-to-peer) for #2 Tl;dr; - every node in the hash ring maintains something called a finger table containing the information around it's predecessor node, successor node and also pointer to nodes (n+2, n+4, n+8 ... n+2^k). This way we can query any node and find the successor node to a particular hash value in O(log n) time.

@rishabhagarwal9871 5 жыл бұрын

A good video. I am really impressed. Thanks a lot.

@ozichukwu Жыл бұрын

One question. I'm new to system design. Please pardon my naivety; Instead of consistent hashing of request and server ids, why can't the load balancer just randomly pick a server id and send the request to this id? Example: I have 5 servers numbered 0-4. Why cant a simple random function pick a say server number 2. What am I missing?

@Gbyrd99 6 жыл бұрын

If you used a load balancer like HAproxy is this concept happening under the hood? Can the load balancing be deterministic to allow for caching on boxes instead of a distributed cache

@gkcs 6 жыл бұрын

We can configure HAProxy to do this. The LB is deterministic. That helps maintain a cache on each server.

@Gbyrd99 6 жыл бұрын

Gaurav Sen this is great info. I would suggest maybe making some ask.fm or a discord server where you can share your wealth of knowledge in system design with a lot of people. And it could become more consumable

@arjundixit5913 3 жыл бұрын

Nice explanation but I have 2 doubts …please help me understand where I’m losing the track? Doubt 1: In the whole process we assumed that the no. of requests is constant (0-M), based on which the whole calculation of uniform distribution is carried on . But is that really the case practically ? I mean how would we know the range of the number of requests we are getting in advance? Doubt 2: If we use k hash functions for a particular server id and divide each by M, isn’t there a possibility of same id of 2 different server …specially if we have less number of numbers?

@jpwengertv 4 жыл бұрын

Gaurav, given K hash functions and a request id, how to decide which function to apply on the request id? Do you just take one of the K function randomly?

@anshulgupta1159 3 жыл бұрын

K hash functions are only for the server IDs. For request IDs, there'll only be 1 hash function mapping the request to a point in the ring.

@premabhisek 5 жыл бұрын

Gaurav, at 8:23, you mentioned that when we have K points (on performing consistent hashing), 'the load on each server is much much lesser'. Could you explain that? 'coz even if we have 12 points, wouldn't the load on each "server" (considering actual 4 servers) be effectively the same?

@gkcs 5 жыл бұрын

The chance of a skew in load is much lower as the number of points on the ring increase.

@premabhisek 5 жыл бұрын

@@gkcs ok. I got the entire concept correctly now. and got the point you tried to explain. As in the hash ring, (on applying the same hash fn) all request ids having hash value greater than S0, are to be served say by S1, and all request ids> hash of S1, are served by S2 and so on. And that, the difference of S0 - S1 - S2 - S3 may not be same. which means a server might end up serving more requests than one other. So when we increase the no. of server hash ids/replicas id to K points, the chances of load on each virtual point (or to say one specific server) is less.

@premabhisek 5 жыл бұрын

and speed of your response to my question was like as fast as searching a key from a hashtable :) thanks for that!

@gkcs 5 жыл бұрын

@@premabhisek Absolutely!

@jayanthmanklu8642 5 жыл бұрын

@@gkcs Hijacking the pinned thread to ask 2 more clarifications - a) How is the number M chosen? Isn't that number dependent on the traffic load - (a website getting a few hundred visits a day v/s Uber's price for a ride API) b) Can you explain intuition behind using log(M) as the number of hash functions?

@ananyamathur5359 2 жыл бұрын

Hey! Thank you for your explanation. Please could you clarify my one doubt:- In a scenario where one server crashes (S4)-> it's respective k point server nodes being removed-> Now load will be distributed to the next upcoming server nodes in the Circle.. which might be say S1,S3 S1 and S2.. My question is that for all these requests blocks we would now need to configure the change of server for upcoming k nodes.. the change will be K Times .. That lands us into the same problem like your previous video. Instead it will be k times. Do let me know if I'm getting confused or this problem will happen ?

@laharibangaru3756 Жыл бұрын

same doubt 🥲could you figure out why ?

@dannywadhwa1759 3 жыл бұрын

After doing all this , suppose there may be a case where some server might have a long request queue and some of servers have no load , can't the request from former server be assigned to next available server node?

@gkcs 3 жыл бұрын

No, we do not have "work stealing" in this model.

@nagarishyendarpanguluri6322 4 жыл бұрын

Hi Gaurav, Great work! Quick question, How do we choose the number "m"?

@romanesterkin 4 жыл бұрын

Gaurav, I have a question: if the hash function h(x) maps values to the range of (0...M-1), why do you need h(Server Number)%M? %M is redundant here, isn't it?

@vipindixit5532 9 ай бұрын

Same question from my side.

@dhvanilvadher1356 4 жыл бұрын

Why can't we make a random generator irrespective of IDs so that load would be evenly distributed why do we have to cache that ids if we can make a decision in O(1)?

@hitmusicworldwide 3 жыл бұрын

Why can't the servers push "ready!" states to the load balancer to initiate requests for tasks, so then the tasks are only sent to servers that have made requests? In addition, if the servers or balancer is able to calculate time to completion they can inform the algorithm as to when each server is predicted to be ready to handle the next task/request. Would this then create a predictive balancing hash that adjusts to the environment?

@SuperSam4y0u 3 жыл бұрын

This works for distributing "tasks" among the servers, bt this is for distributing data among the servers. If servers are enabled to request for data, then when the data is actually to be read, how can you deterministically pick the server that has that data?

@ashutoshmishra2328 4 жыл бұрын

Hey gaurav, Thanks for this great video. i have one question, can we achieve the same results using a stick-table (which will keep user/IP and server mapping) in loadbalancer with some nondeterministic load balancing algorithm like RoundRobin or Least connection. if not then can you explain why.?

@gkcs 4 жыл бұрын

The main objective here to reduce the "rebalancing", the total number of cache loads and evictions. This is useful for load balancing on a cache cluster. The RoundRobin or Least connection algorithms are also useful in different scenarios.

@GunjanAgarwal0811 5 жыл бұрын

Question - There are K hash functions to map servers on ring ? Then how do incoming requests uniformly get assigned to K virtual servers ? Are there K hash function to hash request id's as well ?

@gkcs 5 жыл бұрын

I think I was ambiguous here. No, the request falls on the ring and is picked up by the nearest clockwise server on the ring. This point is among the server points(N*K in total). Each server has K points on the ring. So the request is mapped to just one server.

@RohitSharma-jc5bz Жыл бұрын

Question - how to prevent collisions? What if multiple servers hash to the same value?

@xiuwenzhong7375 4 жыл бұрын

thx a lot, really helpful for people like me has no sense of system design.

@gauravgarg9842 4 жыл бұрын

Gaurav, nice video. However i had one doubt i mind. When a new node is added then how does key invalidation happens in existing servers.

@manavdahra207 5 жыл бұрын

Is it possible that while spitting out multiple hash values for all server, we get collisions ? For eg: h1(s0), h2(s0), ...., hm(s0) + h1(s1), h2(s1), ...., hm(s1) + ... + h1(sn), h2(sn), ...., hm(sn) -> Set of all indexes generated on the ring Is it possible to get duplicate values in that Set ? If yes, then how do we handle that ?

@gkcs 5 жыл бұрын

Realistically, it isn't. The possibility of that happening is less than an asteroid smashing into Earth right now and blowing us to bits. So we have our priorities set ;)