System Design of Doordash: Geo-Hashing and WebSockets for Location Based Services

  Рет қаралды 92,248

Gaurav Sen

Gaurav Sen

Күн бұрын

We go through a popular interview question: Design Doordash.
The system design of Doordash (similar to Swiggy and Zomato in India) involves matching food orders to riders. A rider has to be selected based on their location proximity to the restaurant (since the time taken to deliver a order from restaurant to customer wont change on rider).
For this, we shard geographical locations using hashes, known as Geo-Hashing. An area can be recursively broken down using a geo hash. Riders within a goehash region can be asked to pick an order on demand.
Tracking a delivery is done using server-side events or WebSockets. I suggested the idea of WebRTC for this, but it seems like overkill. Let me know your thoughts :D
Jordan's KZfaq Channel: ‪@jordanhasnolife5163‬
Location-based databases: • Designing a location d...
System Design Website: interviewready.io
00:00 Intro
01:35 Functional Requirements
02:50 Capacity Estimations
06:45 API Endpoints
08:10 Data Sources
11:30 Onboarding a restaurant
12:20 GeoHashes
23:20 Driver Updates
27:00 Data Consistency
28:30 Consistent Hashing
36:40 Optimizing Deliveries
40:20 Delivery Tracking
44:44 WebRTC
46:18 Concluding thoughts
You can follow me at:
Github: github.com/coding-parrot/
Instagram: / applepie404
LinkedIn: / gaurav-sen-56b6a941
Quora: www.quora.com/profile/Gaurav-...
Twitter: / gkcs_
#SystemDesign #InterviewReady #Coding

Пікірлер: 67
@binwelbeck1482
@binwelbeck1482 Жыл бұрын
Jordan Knows what he is taking about ; I like his confidence and the way he explain the scenario. thanks gaurav for inviting Jordan
@modernsimplelife8706
@modernsimplelife8706 Жыл бұрын
Great video! It's very helpful to see a video where both of the interviewer and the interviewee know what they're talking about in a mock. These are some thoughts about WebRTC from my experience. I've been using and implemented WebRTC for 3 years. WebRTC uses UDP under the hood, however it uses SCTP protocol on top of UDP (probably will be replaced by QUIC eventually) to create a TCP-like experience for data channels. The developer is allowed to configure either guaranteed delivery (TCP) or lossy (UDP). WebRTC requires a signaling server to build a root of trust, and connection information for both the user and the driver. WebRTC and Websocket are stateful, meaning the server needs to keep the socket connection open for the entire session, otherwise the clients must reconnect. The stateful nature makes the solution harder to scale, and more complicated. As you guys mentioned, most users likely don't track on their food delivery as often as they would with Uber. I think WebRTC is overkill in this use case. The WebRTC connection likely gets broken (from closing the app) many times in the food delivery tracking session, and cause load churns to the server. I personally prefer doing HTTP polling with HTTP/3 transport (which is also based on UDP and support 0-RTT connection establishment). I think it is the simplest solution with reasonable low load to the server and easier to load balance horizontally. But, at the end of the day, we should look at metrics to decide the best solution for the problem. Strategically, starting with the simplest solution to quickly gather important metrics would be a good idea. So, I would start with HTTP polling, collect the data, then re-evaluate if WebRTC is needed. Finally, collect the data between HTTP polling and WebRTC. HTTP polling might be the most optimal solution already, simpler solutions can do wonder sometimes 😀. There are also ways to reduce states in WebRTC.
@hardikmenger7360
@hardikmenger7360 2 жыл бұрын
This guy was randomly seen on my feed and since then i am following him. His channel is the most no bs system design walkthrough channel.
@jordanhasnolife5163
@jordanhasnolife5163 2 жыл бұрын
Thanks Hardik!
@jackkohnjj
@jackkohnjj Жыл бұрын
This was a realistic interview in the sense that the interviewer realizes that the scale they are asking to design for is completely insane (500K drivers, 10M orders a day) but still proceeds as a thought experiment. I had an interview recently for a game platform and the interviewer said I should design for 50M concurrent users. LMAO.
@givemeabreakbro
@givemeabreakbro Жыл бұрын
Jordan knows what he's talking about. Every answer sounds calculated and well reasoned. Gaurav + Jordan is a killer combo. Looking forward to more such collabs
@payalgupta3487
@payalgupta3487 2 жыл бұрын
Digging deeper into single component is such a good idea. Rightly said that while discussing complex systems on the whole we tend to oversimplify. Enjoyed the video thoroughly. Thanks.
@geekydanish5990
@geekydanish5990 2 жыл бұрын
I have been watching Jordan’s content for quite a while now and dude you are just killing it thanks Gaurav for inviting him
@jordanhasnolife5163
@jordanhasnolife5163 2 жыл бұрын
Thanks Geeky! I appreciate it! 😙
@SoumikPradhan
@SoumikPradhan 2 жыл бұрын
This collab was worth the wait. Nice discussion.
@sachin_yt
@sachin_yt 2 жыл бұрын
It's amazing watching both of you! Learned a lot! Thanks Gaurav! Def need more of these :)
@jordanhasnolife5163
@jordanhasnolife5163 2 жыл бұрын
I appreciate it!
@TheHp6515b
@TheHp6515b 2 жыл бұрын
Great content. Jordan is very articulate and able to seamlessly translate theoretical knowledge to concrete solution. Quick note, consistently hashed cluster is great when keys evenly distribute across the hash space. In this case if we use `geo-hash` directly as the key, we get uneven distribution since small number of prefixes will have most of the data (e.g. geo-hashes representing Manhattan, Seattle etc) i.e. small number of keys having larger data. We could use a more dynamic range based sharding (involves a seperate manager service etc) to split key ranges based on load. Alternatively, we can hash the geo-hash itself to get even spread but loose colocation of relatively closer grids which could be important as we zoom-out for finding drivers. Either works but worth discussing the tradeoff especially considering latency is critical. (fan-out vs fan-in). Also wasn't clear what approach was proposed. Also, Jordan touched about point related to use of pure peer to peer communication for tracking (user and driver directly exchange communicate). Few reasons why this isn't practical generally, 1> Security & Firewalls. Exposed open ports in user's phones are prime target for DOS attacks. Most routers or n/w infra will actually block any new external incoming connection by default and will require a user override to whitelist/forward port. If it's a public router etc there won't be anyway at all to receive connection requests. 2> Relaying through a central service gives more reliable (in terms of throughput) network path (aws/gcp have interconnects with major isps and dedicated n/w infra), whereas for direct connections require to go through congested public interconnects. This is less of a problem when people are in close geo proximity so insignificant in this case, however relevant for whatsapp etc. Understandably not every aspect can be discussed in detail and interviewee isn't even expected to be aware of every aspect. Hence FAANG(at least F,G,U) gives interviewee leeway in picking one aspect of their choice for deep dive besides getting overall high level design. Side note. It is ok to say i don't know internals of x or y (say websockets or loadbalancers or storage engine etc) and move the interview to what you know. i.e. play by your strengths, not knowing deeply about few things is expected and doesn't result in -ve points but vaguely/incorrectly answering something will lead to -ve point and more followups around the same thing eventually leading to botched interview.
@ayonkar1534
@ayonkar1534 Жыл бұрын
This is one of the best videos I have come across for SD Excellent interview candidate.
@hitmusicworldwide
@hitmusicworldwide 2 жыл бұрын
In the USA geohash fixed addresses by zip code+4. The post office has done all the work. A more sophisticated analysis of verified address locations means that you aren't sharding for a lake, or other uninhabited area, saving tables, and flagging possible false orders and incorrect data.
@suhani7137
@suhani7137 2 жыл бұрын
hi gaurav! thank you for the amazingg content you are providing. i have binge watched your entire content in the past few months.on point explanation...your teaching skills are really goood. just a suggestion if you may...you should collaborate with people who amplify your reach ...like aman dhattarwal..or maybe launch some live courses on their platform...his reach is MASSIVE..it will also help you increase sales for your startup Idk if the suggestion i gave is even possible😅but just felt like this precious content and your courses should be reaching more and more people!
@ayushraj6525
@ayushraj6525 2 жыл бұрын
Learned a lot from this collab.. Thanks a lot Gaurav! 🔥
@colton8971
@colton8971 2 жыл бұрын
Jordan is a funny dude and not ugly!
@user-pp6fi2bt4w
@user-pp6fi2bt4w 4 ай бұрын
excellent explanation about geoshashing, thanks guys!
@devam9999
@devam9999 Жыл бұрын
Would have been great to see details about the DB/tables design, and then going for the data storage estimations based on that. Is that something generally expected, or is the high level data storage estimation without getting into details about schema design, as done by Jordan here, also acceptable in an interview?
@tanveer.shaikh
@tanveer.shaikh 2 жыл бұрын
hi Gaurav, for every video can you also please mention the pre-requisite concepts , i felt few of the things going over head
@okidokiyowyow356
@okidokiyowyow356 2 жыл бұрын
Hahaha. The introduction is really funny
@michahubert1179
@michahubert1179 3 ай бұрын
Thats a great content ! One thing that happens notoriously without noticing are units. 5Mb is 5 mega bits "b" not bytes "B".
@apoorvchaturvedi2493
@apoorvchaturvedi2493 Жыл бұрын
Thanks Gaurav for this great video. Just one question as redis is being used here as geo-sharded database how concurrency will be handled with redis.
@imranshaikh115
@imranshaikh115 Жыл бұрын
Very nice 👌 I appreciate 🙏 💛 your efforts
@nishantgarg2815
@nishantgarg2815 2 жыл бұрын
Nice one. Learnt a lot. Keep up the good work
@MrMAhmed22
@MrMAhmed22 10 ай бұрын
Very nice effort, felt like gaurav missing some stuff on geo sharing and asking off questions was a bit confusing for the interviewee
@ankitnmnaik229
@ankitnmnaik229 2 жыл бұрын
Bhaiya, i just passed clas 12 th..and i want to get into programming (ml/Ai) but i have never been good at problem solving..so how can I improve my problem solving from now on...can u suggest ways..pls..
@RahulDasAdhikary-usa
@RahulDasAdhikary-usa Жыл бұрын
Nice work! First example is nice - Sri Lanka to India Food delivery 😀 ha ha.. Anyways, I like your channel Gaurav.😊
@secondIncomePartTime
@secondIncomePartTime Жыл бұрын
jordan is genius
@sanketsardesai1961
@sanketsardesai1961 Жыл бұрын
I would have used PostGIS definitely, store the data in geometry and use all the spatial functions, indexing and what not
@RedBeardRetroTech
@RedBeardRetroTech Жыл бұрын
Thanks for the vid
@user-pj9ck8ef1j
@user-pj9ck8ef1j Жыл бұрын
why this video can not be found on gaurav main page home->system design tag?
@deepak655655
@deepak655655 2 ай бұрын
Gaurav talking about UDP and his voice went berserk, talking about coincidence !!!
@kaustabpaul4514
@kaustabpaul4514 4 ай бұрын
thanks for sharing it was awesome
@kamalsmusic
@kamalsmusic 2 жыл бұрын
Why do we need to do range queries within the redis instance in order to find dashers? If each shard of redis handles a range of geohashes, then can each redis shard just have a hashmap which maps a geohash -> [list of dasher ids]? You can use consistent hashing to locate the shard, then just do a hashmap lookup within that shard. As long as given a geohash, I can calculate the 8 surrounding box geohashes, this should be sufficient right? I think most libraries for geohash can calculate this quickly
@jordanhasnolife5163
@jordanhasnolife5163 2 жыл бұрын
I think it really just depends on the size of the geoshard that you use. If they are big, then you'll have to do a range query to quickly find a nearby driver to a given geohash. If they are the perfect size, then you could just select any of the dashers located in that node.
@abhineetsingh6720
@abhineetsingh6720 Жыл бұрын
great video
@nitishkumarsingh6020
@nitishkumarsingh6020 Жыл бұрын
Which tool has been used for drawing?
@constantinekhvalin6038
@constantinekhvalin6038 Жыл бұрын
is it secure to have peer-to-peer connections between dashers and clients?
@mansivishwakarma361
@mansivishwakarma361 2 жыл бұрын
💕💕💕💕
@joaoluismoraes7215
@joaoluismoraes7215 9 ай бұрын
Hi Gaurav, this interview feature by Interview Ready looks nice. How did you guys build it? WebRTC? Have you written any tech deep dive on implementing it?
@gkcs
@gkcs 9 ай бұрын
We wrote the code in Golang, and accept http requests only. My next video is a deep dive on the architecture, stay tuned!
@chinmaysharma1867
@chinmaysharma1867 2 жыл бұрын
Do I need to know Web development before study system design ???
@poshettiakhil2664
@poshettiakhil2664 2 жыл бұрын
Hi Gaurav, This question is not related to doordash system design, Please can you explain the difference between message bus ,message queue and message broker
@protyaybanerjee5051
@protyaybanerjee5051 Жыл бұрын
Message broker can be a distributed system which provides a reliable and fault tolerant way of delivering messages b/w multiple producer and consumer. A message broker can handle many to many mapping between a producer and consumer
@hazemabdelalim5432
@hazemabdelalim5432 Жыл бұрын
Isn't MYSQL harder to shard ? i believe no-sql is easier to shard in that case so i would choose it .
@gkcs
@gkcs Жыл бұрын
It has to be sharded manually, yes.
@jackkohnjj
@jackkohnjj Жыл бұрын
FYI, lat/lng to geohash is simple arithmetic. It would be ridiculous to call a service to do that.
@joaoluismoraes7215
@joaoluismoraes7215 9 ай бұрын
I've worked at a company that solved a similar problem but at a smaller scale (it had 200k users and more than half weren't active). Instead of Geohashing, they did it with MongoDB Geospatial queries. Why is Redis geohashing better in this case? When does geospatial queries start to become a problem?
@mhmdshaaban
@mhmdshaaban 8 ай бұрын
MongoDb supports both 2d Index where it uses b-tree and 2dsphere Index where it uses similar approach "geo hasing" I'm curious why do I have to implement that from the get-go and It's already implemented
@rkara2
@rkara2 2 жыл бұрын
Why can't you just have all the actors (drivers, customers, businesses) subscribe (websockets) to the system and ping their location (lat/long) in real-time? Then when a customer / business needs a driver, they are getting the real-time picture of whos active in the system and where they are. So at the end of the day you don't care where drivers / customers are in the world, you just care about their availability. So you can have one table called Availability that you match everybody through.
@fabioschubert
@fabioschubert 11 ай бұрын
Lots of good stuff here but should've at the very least *mentioned* storing the Restaurant's data, menus, building the order, etc.
@tusharabbott9946
@tusharabbott9946 2 жыл бұрын
now i know why swiggy was not able to find a delivery partner for my order.
@protyaybanerjee5051
@protyaybanerjee5051 Жыл бұрын
Interview Ready UX could do with some improvement
@gkcs
@gkcs Жыл бұрын
Yes. Thanks for the feedback. Any suggestions for improving the UX?
@VireshGehlawat
@VireshGehlawat Жыл бұрын
The content was great, the tool needs to be improved. I can see both of them struggling with the interface. @Gaurav - As the interviewer in a real world scenario, the interviewer won't really be drawing as much as you were in this case.
@sipwhitemocha
@sipwhitemocha Жыл бұрын
Aye shoutout to J. Epstein
@KiritiSai93
@KiritiSai93 2 ай бұрын
The comment about geohash being ordered and doing binary search is a bit flawed. Remember that geohash has this property: 'abc' is ALWAYS closer to 'abd', but 'def' for example might actually be also closer to 'abc' as well. So closer in edit distance is a sufficient condition but not a necessary one. Read System Design book by Alex Xu for more info on how this could happen.
@gkcs
@gkcs 2 ай бұрын
kzfaq.info/get/bejne/hcmFfql6z86vpWQ.html
@KiritiSai93
@KiritiSai93 2 ай бұрын
Thank you! Looks like you've already covered in the topic :)
@anirudhsilverking5761
@anirudhsilverking5761 Жыл бұрын
I think I gained 30 IQ points just listening to this
@gkcs
@gkcs Жыл бұрын
Damn!
@timurakhalaya6289
@timurakhalaya6289 3 ай бұрын
poor performance i would say, in term of technology picked and in terms of planning as well ...
@aparnaprabhu
@aparnaprabhu Жыл бұрын
I read this as system design of Doordarshan 🤦‍♀️🤦‍♀️
@gkcs
@gkcs Жыл бұрын
Lol
Choosing a Database for Systems Design: All you need to know in one video
23:58
What it feels like cleaning up after a toddler.
00:40
Daniel LaBelle
Рет қаралды 41 МЛН
Зачем он туда залез?
00:25
Vlad Samokatchik
Рет қаралды 2,7 МЛН
아이스크림으로 체감되는 요즘 물가
00:16
진영민yeongmin
Рет қаралды 58 МЛН
Google system design interview: Design Spotify (with ex-Google EM)
42:13
IGotAnOffer: Engineering
Рет қаралды 1 МЛН
System Design: Design a URL Shortener like TinyURL
16:00
Code Tour
Рет қаралды 81 М.
System Design Interview: TikTok architecture with @sudocode
45:35
Designing a location database: QuadTrees and Hilbert Curves
22:22
20 System Design Concepts Explained in 10 Minutes
11:41
NeetCode
Рет қаралды 919 М.
System Design: TINDER as a microservice architecture
36:41
Gaurav Sen
Рет қаралды 1,2 МЛН
UPI System Design Mock Interview with Gaurav Sen & @sudocode
37:10
What it feels like cleaning up after a toddler.
00:40
Daniel LaBelle
Рет қаралды 41 МЛН