No video

Service discovery and heartbeats in micro-services 👍📈

  Рет қаралды 109,886

Gaurav Sen

Gaurav Sen

Күн бұрын

Servers crash due to various reasons like hardware faults and software bugs. Service Discovery and Health Checks are essential for maintaining a service ecosystem's availability and reliability. We talk about how a heartbeat service can be used to maintain system state and help the load balancer decide where to direct requests. Now when a server crashes, the heartbeat service and identify and restart the service immediately on the server.
Service Discovery is another important part of deploying and maintaining systems. The load balancer is able to adapt request routing. Both features allow the system to report and heal issues efficiently.
System Design Video Course:
get.interviewr...
A complete course on how systems are designed. Along with video lectures, the course has architectural diagrams, capacity planning, API contracts and evaluation tests.
Use the coupon code 'earlybird' for a 20% discount!
System Design Playlist: • System Design Playlist
References:
/ patterns-for-resilient...
/ active-active-for-mult...
#SoftwareEngineering #SystemDesign #ServiceDiscovery

Пікірлер: 161
@prafulparashar9849
@prafulparashar9849 2 жыл бұрын
Dude, the intro 😂😂 I suppose not many people appreciate the creative efforts gone into making the skits. So, here's one. Great opening !!
@anastasianaumko923
@anastasianaumko923 Жыл бұрын
It's amazing! 🤣🤣
@SsjRose-pr1hz
@SsjRose-pr1hz 11 ай бұрын
Bro i laughed so hard on that one 🤣 gk is so innovative mann
@AtticusFinch65
@AtticusFinch65 5 жыл бұрын
When you’re a systems architect and have no friends
@gkcs
@gkcs 5 жыл бұрын
Hahaha!
@akashsaha7994
@akashsaha7994 5 жыл бұрын
lol
@dokwme1211
@dokwme1211 5 жыл бұрын
Kyle Horne that’s system architect taking inside his mind
@ABHIJEETSINGH-gm6te
@ABHIJEETSINGH-gm6te 5 жыл бұрын
@@dokwme1211 🤦‍♂️🙈
@user-oy4kf5wr8l
@user-oy4kf5wr8l 4 жыл бұрын
HAHAHAHAHAHAHAHAHAH
@shivujagga
@shivujagga 5 жыл бұрын
The start was amazing xD
@sankalparora9374
@sankalparora9374 Жыл бұрын
The start of the video was amazing and so does the explanation! Thanks for the great video!
@gkcs
@gkcs Жыл бұрын
You're welcome!
@vinitp2004
@vinitp2004 3 жыл бұрын
The heartbeat service is also called a "Canary" or "Canaries"...whose main job is to ensure the service/application is running fine all the time. It is kind of a minimal test run against production at fixed intervals continuously and its failure gives devops team head-start into upcoming production issue.
@sneakykk
@sneakykk 11 ай бұрын
The opening intro was brutal honestly. Kind of resonated with me .
@Robert-lu3wc
@Robert-lu3wc 4 жыл бұрын
You intros are gold sir. Please do them more often :D
@gkcs
@gkcs 4 жыл бұрын
Thanks!
@vishalb1204
@vishalb1204 18 күн бұрын
It is sad moment when both of their employee got fired even though they shared their idea / thought-process.
@Emmanuel-px9lk
@Emmanuel-px9lk 5 жыл бұрын
Love the intro
@gkcs
@gkcs 5 жыл бұрын
:D
@drakezen
@drakezen 5 жыл бұрын
Your videos are great and fun, not to mention informative. Keep them coming!
@gkcs
@gkcs 5 жыл бұрын
Thanks!
@RajkumarSingh-dq8be
@RajkumarSingh-dq8be 5 жыл бұрын
Well, you have talked about two-way heartbeat service before, but in this video given the elaborated explanation.
@gkcs
@gkcs 5 жыл бұрын
Yup!
@namratasrivastava3089
@namratasrivastava3089 2 жыл бұрын
Man your videos and the way you communicate, people will feel not a single sec we are wasting while watching it. Thank you so much I’m going to have Google first interview round in a month
@gkcs
@gkcs 2 жыл бұрын
All the best!
@namratasrivastava3089
@namratasrivastava3089 2 жыл бұрын
@@gkcs Thank you 😊
@kunal_chand
@kunal_chand 5 жыл бұрын
Long time, since you solved some competitive coding problem. Can you do those more, they are quite helpful. Thanks in advance !!
@SamuelKarani
@SamuelKarani 4 жыл бұрын
"Are you alive?" health checker "YES" dead machine
@Viralvlogvideos
@Viralvlogvideos 3 жыл бұрын
This guy is awesome i really wanna learn from him
@craigfoster9996
@craigfoster9996 2 жыл бұрын
This is great production quality, lol.
@vaibhavsharma4944
@vaibhavsharma4944 2 жыл бұрын
don't get it, you mentioned using NoSQL for faster reads (in intro), but in the NoSQL video you explained NoSQL databases read time is slower
@rashminagpal9886
@rashminagpal9886 4 жыл бұрын
Man, your videos! Thanks a bunch, for creating information-rich content! Hands' down :)
@gkcs
@gkcs 4 жыл бұрын
Thanks!
@vijay517501
@vijay517501 5 жыл бұрын
you are rocking with your videos.
@gkcs
@gkcs 5 жыл бұрын
Thanks!
@s57452
@s57452 5 жыл бұрын
Awesome. Please keep uploading such awesome videos often. Thanks a lot :)
@gkcs
@gkcs 5 жыл бұрын
I do! :D
@EbonySeraphim
@EbonySeraphim 3 ай бұрын
I don’t think you need a two way heartbeat. Simply run a canary to hit all services, all basic operations at fixed intervals and report metrics on them. If there is a failure, operations team knows there’s a problem in production and can see what actual customers are seeing through canary logs. On the flip side, because you ensure your canary is always running, missing metrics (not just in alarm threshold) for something the canary exercises is a heads up that you also have a problem. The only thing potentially missing is if a minority of hosts have gone down such that services aren’t missing metrics but are working through a deplete fleet. This isn’t entirely a bad problem, as high load on remaining hosts should also be another set of metrics that are monitored and seen well before it’s a problem.
@himanshutripathi7441
@himanshutripathi7441 2 жыл бұрын
creativity is next level.
@NikhilTJK
@NikhilTJK 4 жыл бұрын
Restart is the best solution in IT:p
@gkcs
@gkcs 4 жыл бұрын
😛
@NikhilTJK
@NikhilTJK 4 жыл бұрын
@@gkcs A solution architect's best weapon 😛
@Vlad-xo8hm
@Vlad-xo8hm 3 ай бұрын
Thank you for the video sir!
@cheequsharma7391
@cheequsharma7391 Жыл бұрын
Overacting ke 50 Rupee kat.. Jokes apart. Great video, very consise and real time problem oriented. Thanks mate. keep the good work.
@gkcs
@gkcs Жыл бұрын
Thank you!
@pradipacharjee4915
@pradipacharjee4915 5 жыл бұрын
Hi @Gaurav, You talked about service discovery with health check, and its really interesting. Isn't kubernetes have readiness and liveliness probe and we can use them ? You are right that we can narrow down the problem by sending/collecting health information in some interval. But how to handle the below scenario: 1. Suppose my health check interval is 30 Sec. 2. At 0th sec the pod indicated the health is OK. 3. And the Load balancer sent some request(Suppose 10 request) at 10th Sec and for some reason pod crashed !! In this above scenario all 10 request will be failed !! How to continuously watch the health of the application ? (No interval strategy )!
@gkcs
@gkcs 5 жыл бұрын
I don't think it's possible. If your communication lag is more than your tolerable limit of failed requests, you can't do much. Apart from trying to reduce the delta between two consequent health checks.
@pradipacharjee4915
@pradipacharjee4915 5 жыл бұрын
@@gkcs hmm, or may be, let it be crash the application process (Like use assert in case of non recoverable situation in application code) and this will cause to crash the container. And kubernetes will automatically roll out a new instance/container and will be available to kubernete's internal load balancer. I think this can be viable solution( + along with interval strategy). But we have to be very careful while identifying non-recoverable situation(+ infact which are very rare). Else containers will frequently keep crashing and that can be more worst. What you say !!
@raguramgopi595
@raguramgopi595 5 жыл бұрын
If S3 server is down, load balancer itself takes care right, Here why do we need a health checkup service to notify a load balancer?
@rahulsinghkhokhar
@rahulsinghkhokhar 3 жыл бұрын
SO that means load balancer in S3 has a health check service inbuilt, which is abstracted.
@agentNirmites
@agentNirmites 5 жыл бұрын
Hello, I am student in computer engineering field. 2 Years are completed, C/C++ Expert... Now what can I do with that ? Can you please make a video for this... I mean compiling and running programs on terminal is not the practical thing... I need to make some applications but how ? Please reply...🙏
@tommysuriel
@tommysuriel 2 жыл бұрын
Hey, this is great content, I was wondering, do you have any videos that are focused on security?
@NBJavaDev
@NBJavaDev 5 жыл бұрын
Great job brother
@DarshitSuratwala
@DarshitSuratwala 5 жыл бұрын
Finally, Service discovery 😇
@gkcs
@gkcs 5 жыл бұрын
I always keep my promises 😋
@PACHUNURI
@PACHUNURI 5 жыл бұрын
haha love how gkcs popped up!
@SubhSingh
@SubhSingh 5 жыл бұрын
In my opinion, you can just introduce kubernetes to solve these problem without any pain :)
@gkcs
@gkcs 5 жыл бұрын
Best to know about internals though :)
@ravichauhan03
@ravichauhan03 4 жыл бұрын
Why do we need any health service at all, when you already have an LB. LB can do the health check, all you need to do is just configure the health URL. Do not understand the need for health service at all. It seems redundant.
@AhmedMohamed-xs5ij
@AhmedMohamed-xs5ij 5 жыл бұрын
You are amazing
@Ramesh-cn6pf
@Ramesh-cn6pf 5 жыл бұрын
Hey gaurav, send that two fired people so that I can easily win most of the competitive programming contest and they can take 80% of prize money😂😂😂
@gkcs
@gkcs 5 жыл бұрын
Hahaha!
@harisridhar1668
@harisridhar1668 3 жыл бұрын
Hi Gaurav - I notice that heartbeat mechanisms tend to be paired with load balancers. Is this a typical system design pattern done out of convenience ( as load balancers are already connected to most servers )?
@EliasCassab
@EliasCassab 2 жыл бұрын
The load balancers needs to know which servers are still alive at least. Otherwise, it might send requests to a dead server
@indrajeetnandy2157
@indrajeetnandy2157 2 жыл бұрын
Why is the load balancer stories these IP addresses? Shouldn't something like a registry service store this information? The load balancer should just distribute load properly right?
@IndishCholleti
@IndishCholleti 5 жыл бұрын
Hi Gaurav, awesome video. For this problem, can’t we just push the tasks or requests on to a message queue which will be consumed only by the active servers?
@gkcs
@gkcs 5 жыл бұрын
Why wouldn't you want to monitor server health always? Also, not every architecture allows a message queue push pull model.
@IndishCholleti
@IndishCholleti 5 жыл бұрын
Okay, let’s assume the architecture can support message queues. As per your message queue video regarding pizza shop system at 7:50 you stated that it will provide heartbeat mechanism as well. Do we need a exclusive health check service even then?
@gkcs
@gkcs 5 жыл бұрын
Yeah I would prefer one. Because it's a matter of money (Those servers aren't running, which means wasted compute power). People get fired for less :P
@IndishCholleti
@IndishCholleti 5 жыл бұрын
I kind of misunderstood the problem. Just one other question, instead of having the two way heart beat; we could do a health check with a get api request with response as just true or false or an error when service is down to the service so that it can acknowledge that the service is still active. This is to make sure the service is not dead even if server is alive.
@gkcs
@gkcs 5 жыл бұрын
@@IndishCholleti Hmm. Watch the video again and tell me why we have the two way heart beat.
@ayushjain8490
@ayushjain8490 Жыл бұрын
What is the issue in the below situation? If in 2-way heartbeat, one service (A1) did not send heartbeat after 5 seconds from server (S1), then HealthService can restart A1 only in S1. It can mark A1 in same way as critical and dead as it is doing for servers and maintain a similar cron job like servers for health checking services as well?
@indikakularatne5339
@indikakularatne5339 3 жыл бұрын
Awesome mate
@pallavisingh2912
@pallavisingh2912 5 жыл бұрын
Thank You for the awesome explaination. What happens when the health check service itself goes down?
@gkcs
@gkcs 5 жыл бұрын
Thanks! The health service should be a distributed service, so a single node going down is fine.
@user-rz5jj5vq4f
@user-rz5jj5vq4f 4 жыл бұрын
Thank you man
@user-oy4kf5wr8l
@user-oy4kf5wr8l 4 жыл бұрын
So damn good video man.....
@AbhideepChakravarty
@AbhideepChakravarty 4 жыл бұрын
Querying the LB ? Never seen any LB which gives way to query the LB for services. Are you mising service mesh with LB?
@user-wq3gw1lg6f
@user-wq3gw1lg6f 3 жыл бұрын
Your thinking a bit confusing (properly differentiate between a server and a service)
@cristianouzumaki2455
@cristianouzumaki2455 3 жыл бұрын
Gaurav we understand that some parts are meant to be funny but words like "fired" maybe sensitive to some people and they may not take it very well. Why not use other ideas ?
@gkcs
@gkcs 3 жыл бұрын
Please find me a word that people won't take "offense" to. In any case, if I want to accurately transmit information and ideas, I need my full set of tools, which includes the breath of langauge.
@cristianouzumaki2455
@cristianouzumaki2455 3 жыл бұрын
@@gkcs Of course, we are here for knowledge only . Just wanted to offer a suggestion. Don't take it otherwise. :) If you need, I will delete the comment.
@gkcs
@gkcs 3 жыл бұрын
@@cristianouzumaki2455 No it's good to put your points forward, I appreciate that 😁
@chengu8922
@chengu8922 5 жыл бұрын
Laughed when the person video in
@gkcs
@gkcs 5 жыл бұрын
:P
@SujeetGupta-hx6sl
@SujeetGupta-hx6sl 4 жыл бұрын
Nice intro 👍 How do you make duplicate of yourself ?
@AnkitKumar-zu7cn
@AnkitKumar-zu7cn 5 жыл бұрын
You said "...running on different machine...." by that, did you mean running in different environment? If so, does that mean there is at least one machine dedicated for each environment? These machines definitely can't be physically separate units. So what defines a machine? Is it like having multiple parking lanes in a parking lot, where each lane can take up certain number of vehicles?
@gkcs
@gkcs 5 жыл бұрын
Different machine means different physical compute box. A different computer. The other way is to containerise the services. Like a VM.
@rohan1456
@rohan1456 4 жыл бұрын
Cache would be a better option, Service's state UP/DOWN can be pushed at regular intervals and can be queried using some API .
@tvnathreviews
@tvnathreviews 3 жыл бұрын
2:00 S1 goes down because it had enough with so many "are you alive" questions from the health check...
@gkcs
@gkcs 3 жыл бұрын
:p
@pramodroy8137
@pramodroy8137 5 жыл бұрын
How do we make sure the health service is always running and able to send and receive heartbeats?
@gkcs
@gkcs 5 жыл бұрын
Distribute it? Use something like zookeeper for consensus.
@tarunrawat1281
@tarunrawat1281 5 жыл бұрын
I have seen people using influx db and grafana for this purpose..is my understanding correct ?
@br4676
@br4676 5 жыл бұрын
superb sir
@gkcs
@gkcs 5 жыл бұрын
Thanks!
@tryitnow5944
@tryitnow5944 5 жыл бұрын
Superb bro
@vivekvarma8367
@vivekvarma8367 4 жыл бұрын
Kubernetes already has this heartbeat monitoring with fail over restartability. This problem i started seeing when we moved to small,tiny container deployments, we never saw this issue at our company when we did deployments on VM
@gkcs
@gkcs 4 жыл бұрын
Thanks for sharing this Vivek :)
@karthikreddy2017
@karthikreddy2017 5 жыл бұрын
What you talk about in this video forms the basis for load balancing at any large scale internet company. This also forms the basis for a service mesh as well. cloud.google.com/traffic-director/ and even aws app mesh are two notable examples of service mesh control planes.
@gkcs
@gkcs 5 жыл бұрын
True, although the service mesh is richer in terms of metric transmissions and routing.
@abeynjose
@abeynjose 5 жыл бұрын
Nice👍
@fcx1439
@fcx1439 4 жыл бұрын
how do you make the video 3 of yourself? can you share the trick please?
@gkcs
@gkcs 4 жыл бұрын
Adobe Premiere Pro. You crop carefully :)
@deepakgupta1344
@deepakgupta1344 5 жыл бұрын
Boht hard!! :p
@gkcs
@gkcs 5 жыл бұрын
Hahaha!
@asdfghjkl1770
@asdfghjkl1770 5 жыл бұрын
For the 0.1% people who see this, I hope you follow your dreams. My dream is to be a successful youtuber and inspire others.
@01coolACE
@01coolACE 5 жыл бұрын
Can you please make video on google sheets system design.
@gkcs
@gkcs 5 жыл бұрын
Google docs is similar. I have that in my list of videos to do :)
@01coolACE
@01coolACE 5 жыл бұрын
@@gkcs Cool that was quick. Thanks you. Just subscribed :)
@charan775
@charan775 5 жыл бұрын
curious to know does this happen in real life? firing people this quick?
@aicnn3035
@aicnn3035 5 жыл бұрын
Isn't it what kubernets and zuul does ?
@DinukaWanasinghe
@DinukaWanasinghe 5 жыл бұрын
I heard something call Service Registry. Is this heartbeat check service is part of that ?
@gkcs
@gkcs 5 жыл бұрын
Yup, this is it :)
@nick-sx2zn
@nick-sx2zn 3 жыл бұрын
Maan! The intro😅😅
@Manwithsteelnerves
@Manwithsteelnerves 4 жыл бұрын
Intro - How did you do that :? ?
@gkcs
@gkcs 4 жыл бұрын
Adobe Premiere Pro :)
@varunjain7360
@varunjain7360 4 жыл бұрын
is one of the servers sending the snapshot (IPs, ports) to the LB?
@gkcs
@gkcs 4 жыл бұрын
All the servers are sending their health status. Have a look at my microservices coding tutorial for a better understanding.
@nakshatranahar
@nakshatranahar 5 жыл бұрын
true
@gkcs
@gkcs 5 жыл бұрын
Hahaha
@konarklohat8195
@konarklohat8195 4 жыл бұрын
Hey Gaurav, does this really happen like a person get fired like this?
@gkcs
@gkcs 4 жыл бұрын
Nope :P
@konarklohat8195
@konarklohat8195 4 жыл бұрын
@@gkcs Its relieving then. lol
@dokwme1211
@dokwme1211 5 жыл бұрын
Try reading bin logs
@gkcs
@gkcs 5 жыл бұрын
Lol. Takes forever, and doesn't restart the server automatically.
@dokwme1211
@dokwme1211 5 жыл бұрын
@@gkcs Haha I replied in relevance to data pipeline, bwt great content
@krishnasumanth007
@krishnasumanth007 5 жыл бұрын
Hi gaurav, when I tried to apply coupon code gkcs , I was get "unable to apply coupon code". Can you look into it.
@gkcs
@gkcs 5 жыл бұрын
The coupon for "grokking the system design interview" was valid for the first 100 users only. It's still a course worth purchasing though :)
@sagarpatil-js1fy
@sagarpatil-js1fy 5 жыл бұрын
what kind of companies ask questions on system design?ty
@gkcs
@gkcs 5 жыл бұрын
All?
@vijayhathimare4520
@vijayhathimare4520 5 жыл бұрын
@@gkcs Almost :P :P
@eeshwarasai
@eeshwarasai 5 жыл бұрын
cool!😊
@gkcs
@gkcs 5 жыл бұрын
:D
@runfunmc64
@runfunmc64 5 жыл бұрын
Instead of killing itself, S4 should've tried to get some help :(
@gkcs
@gkcs 5 жыл бұрын
Prepare yourself Goku, it's time.
@pulkitbajpai01
@pulkitbajpai01 5 жыл бұрын
can u make a video on kafka
@gkcs
@gkcs 5 жыл бұрын
It's on my list of things to do 😁
@saurabhsharma7123
@saurabhsharma7123 3 жыл бұрын
Gloomy start😅
@gkcs
@gkcs 3 жыл бұрын
Lol
@avanishsingh123
@avanishsingh123 5 жыл бұрын
Interesting day to release this video. Did you want to teach the FB/Insta/WhatsApp people how to handle outages 😂
@gkcs
@gkcs 5 жыл бұрын
Did something happen today?
@avanishsingh123
@avanishsingh123 5 жыл бұрын
@@gkcs Yes. Major outage for 1+ hours globally. Even FB's authentication service was down thereby taking down dependent services like tinder.
@gkcs
@gkcs 5 жыл бұрын
Crazy stuff...
@ZalexMusic
@ZalexMusic 5 жыл бұрын
I love you
@pranavkumar188
@pranavkumar188 5 жыл бұрын
I am making network intrusion detection system as my project of final year , can you help me sir Please it would be very helpful in project
@SaiReddyDubbaka
@SaiReddyDubbaka 4 жыл бұрын
You take a big topic like this one, reliability and resiliency, and w/o dwelling to explain failure modes of a particular architecture you discuss too many failure modes which doesn’t make any sense without context (architecture of the system)
@shubha12m345
@shubha12m345 5 жыл бұрын
Love you sir , inspired by you for competitive programming , but sir what happen with #GennadayKorotkevich interview video why you deleted and by any chance its available again,because i was excited to watch that interview video and you deleted immediately,so does interview video of #tourist will available again????
@gkcs
@gkcs 5 жыл бұрын
Thanks Shubham! You should ask Codechef. Read up on the updates I post on this channel, I made one when updating the video permissions.
@shubha12m345
@shubha12m345 5 жыл бұрын
@@gkcs So what you mean by it sir??Sorry for my english🤗🤗
@kunalr_ai
@kunalr_ai 5 жыл бұрын
The intro was harsh
@sarothsundar
@sarothsundar 4 жыл бұрын
Hyderabad Uber office
@shankarsundaram5513
@shankarsundaram5513 5 жыл бұрын
Do some low level design videos#gkcs
@gkcs
@gkcs 5 жыл бұрын
They will come soon 😁
@geedhaipriyan8418
@geedhaipriyan8418 5 жыл бұрын
Cool
@mohitbanerjee2417
@mohitbanerjee2417 4 жыл бұрын
banglai bolte ki hochee
@AshutoshSharma1812
@AshutoshSharma1812 5 жыл бұрын
You are too fast in speaking ,please be slow and make thing more clear.
Moving from MONOLITHS to MICROSERVICES 🎂 → 🍰🍰🍰
19:25
طردت النملة من المنزل😡 ماذا فعل؟🥲
00:25
Cool Tool SHORTS Arabic
Рет қаралды 23 МЛН
Schoolboy Runaway в реальной жизни🤣@onLI_gAmeS
00:31
МишАня
Рет қаралды 4 МЛН
wow so cute 🥰
00:20
dednahype
Рет қаралды 31 МЛН
What is API gateway really all about? Java Brains - Brain Bytes
8:56
WHATSAPP System Design: Chat Messaging Systems for Interviews
25:15
Gaurav Sen
Рет қаралды 1,8 МЛН
What is DATABASE SHARDING?
8:56
Gaurav Sen
Рет қаралды 921 М.
Why do Databases fail? AntiPatterns to avoid!
8:27
Gaurav Sen
Рет қаралды 113 М.
Google system design interview: Design Spotify (with ex-Google EM)
42:13
IGotAnOffer: Engineering
Рет қаралды 1 МЛН
5 Tips for System Design Interviews
8:19
Gaurav Sen
Рет қаралды 622 М.
Database Sharding and Partitioning
23:53
Arpit Bhayani
Рет қаралды 80 М.
What is a Protocol? (Deepdive)
18:14
LiveOverflow
Рет қаралды 167 М.
What's an Event Driven System?
14:59
Gaurav Sen
Рет қаралды 313 М.
طردت النملة من المنزل😡 ماذا فعل؟🥲
00:25
Cool Tool SHORTS Arabic
Рет қаралды 23 МЛН