Grokking the Uber System Design Interview - Ride Sharing Service Design

No video

Grokking the Uber System Design Interview - Ride Sharing Service Design | OLA System Design

Рет қаралды 51,841

Күн бұрын

This is the sysem design video about Uber System Design. In this video we are discussing how to tackle the system design interview question about designing Uber or Lyft ride sharing services. In this video, we will be discussing the micro-services architecture of Uber and will discuss design of several micro-services that Uber is comprises of. E.g.
1. Map Service
2. User Service
3. Routing Service
4. Driver Location Service
5. Trip Service etc.
Instead of skipping the video, you can actually play the video at 1.25x speed and in this way avoid missing any important points. However, if you like to overview of what I am discussing in the video then please check:
0:00:00 - Introduction
0:01:00 - Functional Requirements of Uber System Design
0:06:15 - Non-Functional Requirements of Uber System Design
0:08:50 - API Specs
0:16:15 - High-level Architecture of Uber System
0:18:50 - Design of Map Service
0:25:50 - Design of User Service
0:27:35 - Design of Routing Service
0:37:05 - Design of Driver Location Service
0:49:50 - Design of Trip Service and very important discussion sharding of secondary indices
1:01:05 - Final Remarks
Distributed System Design Interviews Bible | Best online resource for System Design Interview Preparation is now online. Please visit: www.thinksoftw...?
Please follow me on / think.software.community if you like to get notified about new course chapters getting added or when we will start another round of mock interviews and you want to participate in mock interviews or any other updates. I will also take your suggestions there about the course and the channel.
Please check my other videos for more information about the following topics:
1. How to generate a unique id: • System Design Intervie...
2. Distributed Cache Design: • Distributed Cache Syst...
3. The right way to tackle the system design interviews: • The Format of Distribu...
Check out our following articles:
- How to Ace Object-Oriented Design Interviews: / how-to-ace-object-orie...
- Elevator System Design - A tricky technical interview question: / elevator-system-design...
- System Design of URL Shortening Service like TinyURL: / tinyurl-design-from-th...
- File Sharing Service Like Dropbox Or Google Drive - How To Tackle System Design Interview: / how-to-tackle-system-d...
- Design Twitter - Microservices Architecture of Twitter Service: / design-twitter-microse...
- How to Effectively Use Mock Interviews to Prepare for FAANG Software Engineering Interviews: / how-to-effectively-use...
- Payment Gateway System Design - How does the Stripe work: / payment-gateway-system...
- Selecting the best database for your service: / selecting-the-best-dat...
#SystemDesign #DistributedSystems #FAANG #Facebook #Google #Amazon #Apple #Microsoft #Uber #Netflix #Oracle #Lyft #Interview #ComputerProgramming

Пікірлер: 217

@ThinkSoftware 4 жыл бұрын

Thanks for watching the video.If you need to, you can play the video at 1.25x speed instead of skipping the video and in that case skip some important point. Please let me know in the comments below if you like this video. Thanks.

@winning_aadict 4 жыл бұрын

This is awesome. Thank you sir

@motivationmentoring2298 4 жыл бұрын

Thank you it is amazing work keep it up

@vishalsarda9764 3 жыл бұрын

Does the map service store some data? If yes, what data and in what format?

@ThinkSoftware 3 жыл бұрын

Map Service stores map data. There are different formats of map data that you can find the details in the course :)

@namduong1580 3 жыл бұрын

Great tip, thank you!!

@neosapien247 4 жыл бұрын

This is the best video for this topic. A lot of videos describe how Uber is designed, not how you, the "candidate" would design Uber. They are not looking for your understanding of Uber's real architecture. They want to know how creative and logical YOU are if given the task. This is the perfect tutorial for that.

@ThinkSoftware 4 жыл бұрын

Thanks for the comment 🙂

@snehalchavan5796 3 жыл бұрын

The only reason I watched this video was because I wasn't satisfied with the solution in 'grokking the system design interview'. I must say that you have done a very good job at explaining the ride share system design and it'll definitely help me in my interviews.

@ThinkSoftware 3 жыл бұрын

Thanks for the comment 🙂. You should also check my course on System Design Interviews Bible then.

@TarunPrasadKottary 4 жыл бұрын

Finally, some fantastic videos on System design!

@ThinkSoftware 4 жыл бұрын

Thanks for the comment

@bostonlights2749 3 жыл бұрын

For those who made it till the end, This is quality stuff

@ThinkSoftware 3 жыл бұрын

Thanks for the comment 🙂

@zvezdalion 2 жыл бұрын

Learning so much about how to think about system design. Your style of teaching really emphasizes that, rather than trying to cram all the specifics of how this or that software was made. Thank you!

@ThinkSoftware 2 жыл бұрын

Thanks for the comment

@Brandonstiles 4 жыл бұрын

Great video! Good call on pointing out the need to pre-plan analytics and monitoring

@ThinkSoftware 4 жыл бұрын

Thanks for the comment.

@binitasinha2695 4 жыл бұрын

I have gone through many system design videos but only subscribed to your channel as I never have seen such an informative video ever. Thank you for sharing your knowledge. kudos

@ThinkSoftware 4 жыл бұрын

Many thanks for the nice words 😊

@deepakchhabra5932 3 жыл бұрын

To answer your question in the end which Partition approach is preferable for secondary indexes: Local vs Global - then the obvious choice is global because if we want to query by tripId we already got a PK based partition, and when we need trips for driver or passengers then we should have Global index so that we don't do scatter gather, idea is to get to the details quickly. Now it also means my writes to trip table will be slower because of additional global indexes but Trip table is not so time critical like driver location or pricing etc, as long as trip creation is successful we can tolerate little bit of delay here, also, we picked relational DB as a choice for Trips because we want consistency so if we made that decision right then little bit of slowness is acceptable. I'm not sure if at DB level some kind of optimization is possible where secondary indexes could be built async- could be explored. Nevertheless, great content as always, tips you discussed are nice. Amazing work. Thank you.

@ThinkSoftware 3 жыл бұрын

Thanks for the comment 🙂

@sivakumaresan9602 3 жыл бұрын

Excellent video! Thank you for the detailed explanation. I do have one comment. At 59:30 you say that the 'Global index' method of maintaining secondary indexes would allow all records belonging to 1 driver to exist within 1 partition. That may not be the case since the data is partitioned based on the primary index, i.e., Trip ID. Hence, there is no way for us to enforce all data belonging to a driver to exist in the same partition. What makes 'global index' method different is that it would store the secondary indexes in a separate location (outside the data partitions) that would allow the queries to be routed to the exact partition(s) where the data exists. In other words, the mapping between each driver id and the location within the (Trip ID-based) partitions is stored separately as a Global index. Similarly there will be another global index for passenger ID. These global secondary indexes can be further partitioned based on it's values, i.e., based on Driver ID or Passenger IDs. I guess you were referring to my last point in your video, but it came out different. :) Also, note that 1 driver ID could exist across multiple (Trip ID-based) partitions. The global index for driver ID will let us know what partitions to look for, instead of doing a scatter-gather approach with Local indexes.

@ThinkSoftware 3 жыл бұрын

Thanks for the comment 🙂

@RY-it3de 3 жыл бұрын

LMAO the shade thrown at the Grokking course about the Driver Location service discussion. I too was confused about the QuadTree / HT suggestion

@ThinkSoftware 3 жыл бұрын

thanks for the comment :)

@mohajeramir Жыл бұрын

Really appreciate you doing this. There is a wealth of information here.

@ThinkSoftware Жыл бұрын

Thanks for the comment 😊

@tarun29061990 3 жыл бұрын

Wow!! Such a clear explanation about each and every service. Great work mate :) I have watched lot of videos on Uber system design but your video made it super clear. Great part is you compare the pros and cons of each approach. Please keep doing this in your future videos. Thanks :)

@ThinkSoftware 3 жыл бұрын

Thanks for the comment :) More videos will be coming soon. Till now I was busy in releasing my course.

@tarun29061990 3 жыл бұрын

Think Software share the course link as well

@ThinkSoftware 3 жыл бұрын

@@tarun29061990 I have my latest video discussing the course with some discount coupons.

@kaushalrjain6326 4 жыл бұрын

Fantastic Explanation. Was waiting for this. Really helpful. Thnx.

@ThinkSoftware 4 жыл бұрын

You are welcome 🙂

@arpitamathur4859 3 жыл бұрын

In the Grokking the system design course, i think it will get the initial positions of the drivers from the quad tree and would then subscribe the customer to those drivers. Hence later it can use the DriverLocation HT to get the updated locations.

@ThinkSoftware 3 жыл бұрын

There are many loose ends. First of all the main purpose of keeping DHT was that it is slow to update/query the Quadtree. Then it means Quadtree may have stale data. Then secondly how the subscription will work? And then thirdly how to find nearby drivers to send trip requests?

@Funsiestype 2 жыл бұрын

Initial set of drivers don’t matter. You can show some random fake cars. What’s the purpose of it other than exposing useless privacy info. You have to ask clarifying questions for requirements. Just because you can doesn’t mean you should.

@ThinkSoftware 2 жыл бұрын

Exactly ^

@brgowd Жыл бұрын

Very informative

@shabnamkoosha 2 жыл бұрын

By far the best design video I have watched. Very detailed and organized. Thank you.

@ThinkSoftware 2 жыл бұрын

Thanks for the comment 🙂

@MrManjax 2 жыл бұрын

Excellent design video with a number of useful details covered 👌 Thanks for your efforts. In regards to your question about partitioning Trip info, I would suggest your approach wherein you use a combination of {Trip id, driver id} to identify the shard/partition. Since it's more likely to query all trips related to a driver, it would enhance the locality of data lookup and have the potential of improving System throughput. As regards your other question about handling hot zones in a busy metropolitan area, it would be great to incorporate something like this - Define a certain threshold of the number of drivers within a certain area at the passenger's request. If the lookup discovers a greater number of drivers, it should automatically sub-divide that particular cell into a finer granularity (microcell?) in order to find the most optimal driver match to provide a response to the Dispatch service. BTW, would it be possible to publish a picture with all the services (along with their sub-services) called out in one place along with the cache/datastore they are tied to?

@ThinkSoftware 2 жыл бұрын

Thanks for the comment 🙂. More details are in the course.

@npriyanka89shivu 3 жыл бұрын

Amazing explanation. Thank you!

@ThinkSoftware 3 жыл бұрын

Thanks for the comment 🙂

@chanm2147 7 ай бұрын

Hello, what is the algorithm you mentioned at 53:28 to find the global optimized solution? Is it K-nearest neighbor?

@AkashJadhavIT 4 жыл бұрын

Thank you for all these amazing videos on systems design. We have learned great concepts from you sir. One request from my side. Would it possible for you to make a series on "object oriented design interview". eg. 'Designing the parking lot'. Any resources you would recommend for the same?

@ThinkSoftware 4 жыл бұрын

Thanks for the comment. I didn't find any good single resource to cover all topics and that is why started this channel. Right now you can find the information spread all over internet and even not complete. I have also noted your request and will make video on object oriented design soon.

@AkashJadhavIT 4 жыл бұрын

@@ThinkSoftware thanks. looking forward to hear from you

@pujabhandari5471 Жыл бұрын

Very informative and interesting video I really like it

@rng19 2 жыл бұрын

Thanks a lot. Your videos are very helpful.

@ThinkSoftware Жыл бұрын

Thanks for the comment :)

@manju528 3 жыл бұрын

I liked your system design videos! They all are explained in very simple language. I yet to finish watching all of them though. Thanks for making these videos.

@ThinkSoftware 3 жыл бұрын

Thanks for the comment 🙂

@dhwanip1000 2 жыл бұрын

Thanks for the wonderful video. My question here is what areas to focus on in actual interviews?. Suppose, I am asked this question in the interview, how would you suggest to start and dive deep into each part? Also, not everyone would know everything about these services, in that case, how would you advice a candidate to tackle that situation?

@ThinkSoftware 2 жыл бұрын

I have discussed this in some of my other videos and course. You should also look at mock interview videos that I have uploaded in this channel and the course.

@dhiwakarna8509 Жыл бұрын

Highly appreciate your content, learning tons! I have a question around the routing/proxy service. Shouldn't its responsibility be very high level? (like authenticating all requests, rate limiting and routing requests to appropriate services). Here it seems to do very specific things like create a user connection object and maintaining it which is needed for a specific use case (trip matching). can you shed more light on this?

@ThinkSoftware Жыл бұрын

I believe this is discussed in the course. The routing service is nothing but an API gateway. It performs all the functionality of an API gateway. However, not all API gateway support websockets or long pull. In order to support them, the architecture discussed in routing service is required.

@dhiwakarna8509 Жыл бұрын

@@ThinkSoftware That makes sense. In a way we are doing this. In order to maintain a websocket, the request needs to stick to a specific instance of the api gateway and that is where all this book keeping comes in. The api gateways are not stateless here.

@TheRohkan 3 жыл бұрын

the right amount of depth .. thanks for this

@ThinkSoftware 3 жыл бұрын

Thanks for the comment 🙂

@KangJangkrik 3 жыл бұрын

I would like to do correction for map service. This is simplified steps of finding closest route: 1. Calculate angle of attack (AoA) from A to B with atan(y/x) * (180/PI) formula 2. Calculate each reachable points' AoA 3. Compare it by substracting each of those point's AoA with A-B AoA, results angle delta 4. Go forward to point with smallest angle delta 5. If there is no B point on current closest points, then repeat step 1 In reality, we have to deal geometric equations unless we use Google's Route API which is costs a lot of money. Cheers!

@ThinkSoftware 3 жыл бұрын

Thanks for the comment 🙂

@KangJangkrik 3 жыл бұрын

@@ThinkSoftware you're welcome! I found this channel useful, so +1 subscriber for you

@OmkarGhanekar 2 жыл бұрын

Once the websocket connection is established, for every new piece of data that the client is sending to the service, how does the LB ensure that the data gets sent to the specific server which is holding the websocket connection for the user? I'm assuming a dumb RoundRobin LB can't be used in this situation.

@ThinkSoftware 2 жыл бұрын

Thanks for the question. Short answer: yes a LB need to support websocket. If a LB does not support websocket then it cannot be used in this scenario. Long answer: for this I may have to make a video :)

@wayneli7814 4 жыл бұрын

Hi @Think Software, thanks for your great video! I want to share some of my thoughts about the drive location service, you make an assumption before you calculate the grid id from Long/lat, which is the grid size is fixed. But in practical, the grid size is dynamic(because the density of different area is different), so it should be very hard to get the gridId by long/lat in constant time. I would say the best way of the lookup service is still to transverse the quadtree, but what we can do is to find a way to optimize the update issue. What is your thoughts?

@ThinkSoftware 4 жыл бұрын

Thanks for the comment. Usually the grid size will remain the same but the search size would be increased in that case and please note the grids which are empty will not have any record in the in-memory datastore/cache.

@wayneli7814 4 жыл бұрын

Think Software I see, thanks for your reply. That makes sense, we can have the grid to cover a very small area, and don’t store the grids that locate in ocean. BTW, I just checked more info about google s2, looks like they are using the “Hilbert curve” algorithm to convert the coordinations into a 64 bit number, then they can find the nearby grids by just simply increment/decrement the number.

@navoriion 4 жыл бұрын

Takes lot of effort & time to make videos. Keep up the good work. Do you read company blogs to understand the architecture of the system ?

@ThinkSoftware 4 жыл бұрын

Yes making a video usually takes atleast 3-4 weeks.. 2/3 weeks for all the research. 1/2 weeks to come up with my own design and solution for the problem and then 1 week for video production/editing/upload etc.

@priyavenkatesan1804 4 жыл бұрын

Excellent explanation! One small improvement I would do wrt to driver location service is to use slippy tiles for id calculation. No biggie. Learnt a lot. Thank you very much

@ThinkSoftware 4 жыл бұрын

Thanks for the comment. How would you figure out the tile or grid Id if using slippy tiles? And how would you know the id's of neighboring tiles if using sleepy tiles?

@priyavenkatesan1804 4 жыл бұрын

wiki.openstreetmap.org/wiki/Slippy_map_tilenames#Tile_servers. Key in the cache will be the tile number and value will be a list of drivers in that tile. Same as you have, except that the key is a number. Does that work?

@ThinkSoftware 4 жыл бұрын

Is there a way to calculate tile Id from location without using some lookup table? Also i think all the tiles should be assumed to be at the same zoom level which is then nothing but equal to what I have suggested?

@priyavenkatesan1804 4 жыл бұрын

Yes, we can write a small function that returns the tile id based on lat, lon. True, the zoom level will be the same. Like you say .7 * .7 sq m might map to zoom 15 or so. The approach you have is great. The main advantage of this calculation is that you can use a number as the key. May be this helps with partition. I can't tell. The other advantage is offers is in a situation where you have dense operation, you can dynamically recalculate the tiles based on zoom level. Say you decrease the zoom level, then you might have more drivers mapped to that area. The negative to this approach could be the time taken for dynamic recalculation. I haven't determined the cost in this case. However, in your approach you are quering for more neighboring tiles. This is not bad either. Just a different approach.

@ThinkSoftware 4 жыл бұрын

OK thanks for clarification 🙂

@YegorChumakov 3 жыл бұрын

Great videos! Thank you. I've subscribed.

@ThinkSoftware 3 жыл бұрын

Thanks 🙂

@preetiduhan9937 3 жыл бұрын

A big dislike to those who disliked the video.If you can't appreciate someone's effort atleast don't discourage them.

@ThinkSoftware 3 жыл бұрын

Thanks for the comment 🙂

@user-oy4kf5wr8l 3 жыл бұрын

Buddy, keep doing these amazing videos plz on low level design too. A question plz, for sd1, i think those concepts covered by your video are enough, right? for sde 2, i will buy your course ^ ^

@ThinkSoftware 3 жыл бұрын

Thanks for the comment :)

@anuragkothare6181 4 жыл бұрын

Good Explanation.

@ThinkSoftware 4 жыл бұрын

Thanks

@privacywanted434 4 жыл бұрын

This has been a great explanation video!

@ThinkSoftware 4 жыл бұрын

Thanks for the comment 🙂

@sthirumalai 3 жыл бұрын

Hi, The presentation and thought process was excellent. On scaling the driver location module wrt. the grid service. In case of hot nodes which are partitioned based on the grid id. I think we can go for the auto scaling on the compute layer where as on the data layer which serves the grid vs drivers key/val can't we shard the grid by adding more precision to the lay/long pairs? By that way the hot data will be distributed even to more shards and the app may scale well. Please correct me if I am wrong.

@ThinkSoftware 3 жыл бұрын

Thanks for the comment

@karanbhatia2834 Жыл бұрын

I have a question regarding the routing service. We have 2 types of servers: proxy servers and the dispatch servers. What is the reason for having these 2 types? What is the difference in their role that they are categorized separately> Can't we just have one layer of servers that basically talk to other services in the system?

@ThinkSoftware Жыл бұрын

Proxy servers receive request from the client. They are sitting behind load balancer and forwards the request to the internal microservices based on the endpoint used in the API call. Dispatch servers, on the other hand, receive requests from the internal services (when internal services want to send message to some client app). A request from an internal service goes to a dispatch server behind load balancer (load balancer on the internal side facing internal services). The dispatch server need to dispatch the request to the right proxy server where a websocket connect or a long pull request is held for a particular client.

@amolnagotkar3037 2 жыл бұрын

thnx for this video

@ThinkSoftware 2 жыл бұрын

You are welcome 🙂

@xinlongzhang1187 4 жыл бұрын

Very nice video!

@ThinkSoftware 4 жыл бұрын

Thanks. Appreciate your feedback.

@RaunakShrivastava 3 жыл бұрын

Why don't the passenger or Driver communicate to User and Map services via the Routing service?

@ThinkSoftware 2 жыл бұрын

Both solutions are possible. I am using different approaches in different videos for curious minds like you to ponder upon :)

@amit1114 4 жыл бұрын

Awesome explanation of the system. Thank you for sharing your knowledge! Regarding Routing Service: How would you make routing service highly available if you are using long polling or websocket and your passenger/driver is connected to 1 App Server. If that app server were to go down, all passengers/drivers connected to that App Server will loose connection and the data in Global Cache will be incorrect.

@ThinkSoftware 4 жыл бұрын

Thanks for the comment. Regarding your question: there are two things that will be happening. Once an app server goes down then the client app will see that the web socket or long poll connection has been terminated so in that case will retry to reconnect and this time the connection request will go to some other app server by the LB and that app server will update the cache. If the app does not come to connect then there will be a stale entry in the cache. When dispatch servers will try to use that entry to send notification to the client app, they will know that since the app server is down so the entry is stale and at that time entry will be removed from the cache.

@ankuj2004 4 жыл бұрын

@@ThinkSoftware what happens in a scenario, where the proxy server connected to client dies. The entry in cache is stale. The dispatch server comes back before a new connection is established .

@ThinkSoftware 4 жыл бұрын

when the dispatch server will contact the proxy server which has just come up then the proxy server will return connection not found for that client and then dispatch server will invalidate the cache entry if it is same as what it read before contacting the proxy server.

@ankuj2004 4 жыл бұрын

@@ThinkSoftware but dispatch server needs to respond back to the client. Even if it invalidates the cache entry, how will it communicate back the request response to client. It cant keep holding the response till the time entry in the cache is updated

@ThinkSoftware 4 жыл бұрын

In that case it will just send a push notification to the client app that there is new event.

@kamalsmusic 4 жыл бұрын

So regarding using the global in memory cache to store transient information... we would have to make the cache quite large to ensure that the data is not evicted a lot, right? If there are frequent evictions, then we may have to re-determine the drivers location from the client and then insert again into the cache?

@ThinkSoftware 4 жыл бұрын

Yes. The other option would be store this in database but that would put a lot of pressure on the database. So basically, the distributed in memory case here is used to store transient information as there is no point storing them in the database and so the purpose of using cache here is not to remove pressure from DB as there is no DB but cache here itself is being used as transient data store.

@metallaser4409 3 жыл бұрын

Nice video! Can you explain the difference between an API gateway and routing service? Would you still have an API gateway in your architecture to route requests to either map/user/routing service?

@ThinkSoftware 3 жыл бұрын

You can consider them as same.

@metallaser4409 3 жыл бұрын

@@ThinkSoftware Thanks for responding. If I were to use Amazon's API gateway, would that be able to handle sessions/websocket connections for me? I thought API gateways were stateless. Is that incorrect?

@ThinkSoftware 3 жыл бұрын

It depends on the implementation. I think Amazon API gateway supports websockets. Also what do you mean by stateless here in case of API gateway? To me it seems like you have wrong understanding of what stateless mean.

@franciscog3378 3 жыл бұрын

very helpful. Thank you

@ThinkSoftware 3 жыл бұрын

Thanks for the comment 🙂

@sunny-14689 2 жыл бұрын

Nice video, can a driver be assigned to multiple nearby gridids, may help avoid larger area query if none found in first attempt since a driver can belong to multiple lists thereby increasing chances of finding nearby driver.

@GokhanArik 3 жыл бұрын

I don't think they proposed distributed hash table to be used in the lookup. Their argument is the driver reports location every 3 seconds, but we update QuadTree every 15 seconds. 15 seconds is still a short time and drivers' location will not change much. To get nearby drivers we will still use QuadTree and hash table will only be used to keep QuadTree up to date. Correct me if I'm wrong

@ThinkSoftware 3 жыл бұрын

I don't know if they have updated the text in between. The video is several months old now. Right now if they are storing in DHT to be update Quadtree every 15 sec later then it would be better to let the driver send the update every 15 sec directly to Quadtree. The reason for keeping to a DHT would be if they want to batch update the Quadtree but they didn't talk about batch update.

@MrAnshulji 3 жыл бұрын

Also the sharding for quadtree was based on driver_id (stored in leaf nodes), so it won't be like one node of the quadtree is on one server and the child node is on another. It was more like, the quadtree would exist in its totality on all servers, but some servers would have a subset of the driver ids at the leaf node. And eventually scatter-gather could be used to retrieve the drivers. In any case, that approach has been rendered inefficient by this grid-based solution that you described. I wonder why no one thought of this or suggested this C-Squares based approach in any of the other design videos.

@taishancorp7720 3 жыл бұрын

To solve hot partition problem of grid key ranges, can we first shard by key ranges and then and have a mapping of partitions where this key range could land ? Majority of cases there will be single partition, but for hot partitions we may need to scan more than one.

@ThinkSoftware 3 жыл бұрын

what advantage we would get with this?

@taishancorp7720 3 жыл бұрын

@@ThinkSoftware I mean for hot partitions , we can have finer precision shards on those. Key ranges don't need to be uniformly sized.

@letstalkcareerwithradhika9028 3 жыл бұрын

Could you help me understand the function of dispatch server farm in the routing service? I know you said that it will send the info from trip and other service to proxy farm. Why don’t have only proxy servers?

@ThinkSoftware 3 жыл бұрын

This is discussed in more detail in the course

@cbest3678 3 жыл бұрын

@Think software.. thanks for this video. I have few doubts about this: - So after this flow passenger -> trip dispatcher -> Driver location service ->trip dispatcher will apply matching -> will send those matched driver notification about the matcher passenger. How driver will get this notification about the matched passenger for the trip request ?

@ThinkSoftware 3 жыл бұрын

This is discussed in the course. The short answer is that the Trip Dispatcher service will call dispatch servers in the Routing service that will look up the address of the host in the cache where the WebSocket connection is held for the particular driver and then forward the request to that proxy server. The proxy server will then forward the request via WebSocket.

@cbest3678 3 жыл бұрын

@@ThinkSoftware Thanks for the response.. wondering how all service will call dispatch service .. I mean dispatch is in middle layer..so you mentioned when any service done processing the request it will send that response to dispatch service and then the dispatch service will see the cache and will contact appropriate app server in routing server. why do we need extra dispatch service is it adding more complexity ? I mean how other service will call dispatch service to send the response back ?

@user-gc5kv3cd7j 2 жыл бұрын

Nice! For the driver location service, when grid id is used as key, list of drivers used as value, it relies on driver client to remove driver from list if location changes grid, but if the driver client dies, we might get stale data i.e. return driver who might be off work. How to avoid this issue?

@ThinkSoftware 2 жыл бұрын

We can have last heartbeat time and then if it is more than some threshold then we ignore and remove those driver entries.

@harrylin6282 4 жыл бұрын

Hi Think Software, when the DB is partitioned by trip id and a driver wants to list the trip histories, it is slower since it needs to go to all the partitions to grab data. But I am a little bit confused why it is slower, we can create multiple threads and query each partition in parallel ( each thread can go to each partition ), so the query time should be the same as querying in one partition. is it correct?

@ThinkSoftware 4 жыл бұрын

Thanks for the comment. There will not be a single customer query that you can parallelize. Think about thousands of customers querying their trips. Now if for all the customers you will scatter/gather on all partitions then it will be way slower than different customers queries going to different partitions.

@mehranbehbahani3050 2 жыл бұрын

There is absolutely no way that this level of complexity is expected in a 45-minute interview. This is a solution that a team of engineers would work on for a week at least. Correct me if I am wrong.

@ThinkSoftware 2 жыл бұрын

For L6+ definitely. The interviewer though may not cover everything but could go in depth in one or two things.

@mehranbehbahani3050 2 жыл бұрын

@@ThinkSoftware I see. You know what would be a killer video? Why don't you make a video that explains the complexity and details expected by the interviewer based on the level of the position? Like if you are interviewing for an l5, talk about this and that. If it's l6, talk about this and that in more detail. If l7, you need to have a deep understanding of topics A B and C for example.

@user-nt9dk5xl5z 10 ай бұрын

Thank you very much for your videos. Good explanations. Just One thing you are not discussing about selection type of db as part of design. Why?

@ThinkSoftware 9 ай бұрын

There are other videos discussing the DB selection. Thanks

@winning_aadict 4 жыл бұрын

Sir, can you also add a transcript of the video for people who prefer reading. Also, this would enable us to look up words that are new to us and may not be clear in the audio for various reasons.

@ThinkSoftware 4 жыл бұрын

Thanks for the comment. I am going to soon launch a book which will have this and more.

@Claudius025 4 жыл бұрын

why would you use a double for the trip price and not a float? I have the same question for the latitude and longitude.

@ThinkSoftware 4 жыл бұрын

Float can be used as well. Thanks for the comment.

@shikhil1 3 жыл бұрын

This is great video and explained the approach in depth, but you could improve/explain in depth about searching of location in 2D Grid using two decimal points and finding the near by grid id.

@ThinkSoftware 3 жыл бұрын

Thanks for the comment

@shikhil1 3 жыл бұрын

@@ThinkSoftware can you please answer my query also? how your are going for finding location in 2d grid using decimal point and near by location?

@iamavinashdaipule 4 жыл бұрын

Thanks for this amazing video. It answered so many of the questions I had about uber design or design in general. What happens in the driver location cache when the list of drivers is huge for id_latitue_longitude-->[list of the driver's id is 50,000 or more] does the cache have the capacity to store that many drivers ids in a field?

@ThinkSoftware 4 жыл бұрын

Thanks for the comment. The grid size is way smaller to have 50k drivers in it.

@iamavinashdaipule 4 жыл бұрын

@@ThinkSoftware Thanks for the reply, yes 0.01 *0.01 is a small area.

@RahulSingh-ln7bg 4 жыл бұрын

Bro, R Tree is hierarchical. How will you apply that here in distributed system? Matching based on this would be inefficient? You need better data structure to do it in real-time.

@ThinkSoftware 4 жыл бұрын

I didn't discuss using R-trees for this reason. There are other online resources that discuss using R-trees, quad trees etc. but I have already pointed out issues in that approach in this video.

@rishabhsharma1053 4 жыл бұрын

Awesome tutorial and explanation of the system!!! I have a doubt regarding the partitioning based on both driver & rider ids to be able to trip history for drivers & riders from a single partition. Will this implementation mean that entire trip history data would be duplicated across shards?

@ThinkSoftware 4 жыл бұрын

Thanks for the comment. What do you think?

@andrearmstrong3715 3 жыл бұрын

It seems to me that there will indeed be replication, but that's totally fine. Its a classical case of time vs. space complexity. It should be straightforward for a user, be it a rider or passenger, to find all of his trip history within the same shard. That being said, another option is to shard by timestamp, which could prevent data duplication. However, shards holdinh newer trips would get most of the load, so it is not a great design. I would combine the first sharding approach (duplicating data) with a local LRU cache of trip history, which benefits users who check it out constantly.

@taishancorp7720 3 жыл бұрын

In 35:20, you mentioned that dispatch servers need to find app server in routing service before it can send data back . Couldn't those routing service's app servers just used a long-poll request on the dispatch server ? Why we need the additional complexity here ?

@ThinkSoftware 3 жыл бұрын

Thanks for the comment. The driver apps even if using long poll they are waiting for trip requests and that request will be parked on a machine. How would you know in which machine the request is parked?

@taishancorp7720 3 жыл бұрын

@@ThinkSoftware Do you mean a request parked on a Dispatch server machine ? I feel this may be an implementation detail. From the HTTP frameworks I've worked with, when a HTTP request is received (here it's Dispatch server), it has the connection info in an object already. If it wants to support long-polling, it can "pause" the response for the connection and then when a driver has acknowledged (data is ready) "resume" the connection with the data in HTTP response. From client's point of view (Routing Service), I don't think it knows that this was a long-polled query versus a normal slow query.

@rmadhavmca1 4 жыл бұрын

could you let me know the tree name again?

@ThinkSoftware 4 жыл бұрын

Are you asking about Quad Tree, Kd-Tree etc?

@Bommutiwariplayz0170 3 жыл бұрын

Can you make video on map service and especially how type ahead location suggestion works

@ThinkSoftware 3 жыл бұрын

Thanks for the comment

@ChintanShah22 4 жыл бұрын

Excellent explanation, thank you ! Is it expected to come up with this level of detail in a 50 minute interview ? It seems if one is not already very familiar with a particular system, thinking and brainstorming itself will take up a majority of the time.

@ThinkSoftware 4 жыл бұрын

In an interview, you might not be able to cover everything and it is expected but the interviewer can direct you to go deeper in the internal design of one or two components. This also becomes very important at higher engineering levels like L6+ or architects. These interviews at these levels can very quickly validate whether a candidate is suitable for L6+/architects or not.

@gdthegreat 4 жыл бұрын

Will be this advantage if interviewee is with 1 year ex? For sde1 or L3?

@ThinkSoftware 4 жыл бұрын

More knowledge is always useful.

@ameyasawant5663 3 жыл бұрын

For trip service, can we use bloom filter for searching for trips?

@ThinkSoftware 3 жыл бұрын

Searching for which trips?

@ameyasawant5663 3 жыл бұрын

@@ThinkSoftware I meant with approach 1 we can use bloom filter to figure out which partition might have the trip-id. Just a suggestion. Your videos are awsome!

@ThinkSoftware 3 жыл бұрын

Please elaborate how?

@uditagrawal6603 3 жыл бұрын

Can we use geohash in place of quadtree?

@ThinkSoftware 3 жыл бұрын

Yes. This is discussed in the course as well.

@rishabhdugar1180 3 жыл бұрын

rateUser me trip id nahi li ? kyun ?

@ThinkSoftware 3 жыл бұрын

Why do you think rating a user require trip ID?

@rishabhdugar1180 3 жыл бұрын

because rating system works per trip not driver to rider or vice versa just once.

@ThinkSoftware 3 жыл бұрын

It depends on the requirements. If the requirement is to derive the driver rating from trip rating and also store the trip rating then yes. Otherwise both driver and passenger can be provide option to rate each other at the end of the trip and may not require trip ID in that case.

@abhishekjain96 3 жыл бұрын

@@ThinkSoftware That would be correct from the UI perspective where user has to act as per what is available on the screen. But from API perspective, not taking a tripId in the request would mean that a user can rate any driver(and vice versa) even if they haven't done a trip together(which shouldn't be allowed). And also a user can rate the same driver again and again

@rahulsharma5030 3 жыл бұрын

1. in design of routing service, once you upgrade http to websocket, after that let us say a client request, then the load balancer will forward to the backend server with which connection is established? 2. If answer to 1 is yes, then will load balancer store this mapping, like which client is mapped to which backend node for web socket connection? 3. Also i dont understand like if we upgraded to websocket, then what is the current type of connection between client and load balancer?If it is TCP or HTTP?As you mentioned in video i think its TCP, then how can we have millions of TCP with load balancer ,it is not possible.

@ThinkSoftware 3 жыл бұрын

Load balancer is only used for initial connection setup when the client made the initial http request. Once the connection is established, the load balancer is out of the picture and client directly talk with webserver with which it has TCP connection

@rahulsharma5030 3 жыл бұрын

@@ThinkSoftware thank you!!!:)

@rahulsharma5030 3 жыл бұрын

@Think Software. If i understand correctly: Then client sends request to load balancer. Then load balancer will decide which server can handle request and then it sends server name back to client.Now client directly sends HTTP upgrade request to the server(that is returned by load balancer)?

@ThinkSoftware 3 жыл бұрын

No. This is not correct.

@rahulsharma5030 3 жыл бұрын

@@ThinkSoftware then what will be flow you tole me previously? The other way can be the client sends upgrade request to load balancer and then load balancer sends same request to server ?and then server response send back to client via loadbalancer? I cant think of any other way.Can you please explain?

@rahulsharma5030 3 жыл бұрын

let us say that first time customer logsin it gets some list of drivers near it: 1. How will we maintain info that when the driver sends new lat long then we have to send to some customers which can track driver on their app? 2. How do we update that info when a new driver comes in same area or current driver leaves?

@ThinkSoftware 3 жыл бұрын

The initial 4/5 drivers that Uber app shows in the map has nothing to do with the functionality of the Uber service. Uber can show random drivers there and for booking a trip it does not matter.

@rahulsharma5030 3 жыл бұрын

@@ThinkSoftware it has to right?It tells me where the drivers on my screen moving,how much is their ETA to me.

@ThinkSoftware 3 жыл бұрын

Are you talking about initial 4/5 drivers or the driver that is assigned to your trip? For the first don't care but for the later you will poll the status periodically.

@rahulsharma5030 3 жыл бұрын

@@ThinkSoftware initial.When i open app,it shows me movement of nearby drivers on app(prior to booking also)

@ThinkSoftware 3 жыл бұрын

There is no ETA associated with them. Servers can you an initial list of 4/5 drivers and then you can poll them periodically

@Funsiestype 2 жыл бұрын

Get map service? Get ready to be put on spot on the need for this service. You need to speak about the choice you made for a single user service service instead of driver vs rider users. Need to speak about trade offs and reasoning. In the high level diagram, be ready to be picked on by users talking to user service directly. Depict Load balancer. Again, most interviewers are a-holes. Rambling begins at 37:10 without setting the context for how the quad tree is sharded based on which this homie is crapping on Grokking course. Even before that , I challenge the assertion that quadtree is not highly available. Be ready to defend why we can’t have replicas for quad tree. (If you say yes, be ready to explain how would you ensure replication works) For love of god, I can’t decipher what you are saying from 40:40 to 41:24. 🤷🏽‍♂️ (next time I suggest you write better notes for you to refer when making videos. I notice you did that here. Just make sure you formulate your thoughts. Your design has grid ID’s pre broken partitioned by grid ID either by hash or range. You don’t really provide an answer after all that crapping on the other author. Slick. If hash partitioned by grid ID , won’t you need to scatter gather for neighbors? I’m some areas like Nebraska where it’s less dense it’s Inefficient . If partitions are hot, you need a way to re partition . Consistent hashing (add new nodes) and replication could be options. 49:18 yup.. nice work. See you sound coherent when you check notes. Good job champ. 49:28 after all the crapping on other author you come back to the original problem and mention that we need to add more replicas in hot partitions. Ohhh Bhai… Maro mujhe. Mujhe maaro. Mazaak ho Gaya yaar. You need to talk about what happens if cache servers crap out. How would you build the quadtree again during crash stop faults? Be ready to talk about this. I went and read the grokking’s design. I don’t think you understood what that authors proposal was clearly. Good job on talking about partition by document And partition by term indexing. I am not saying anything negative about your videos, your videos are great. Just trying to question as you requested. Bye boo.

@Funsiestype 2 жыл бұрын

On a serious note, anyone who is watching this, my two cents, prepare well and make sure your fundamentals are strong. You need to discuss trade offs in an interview. Nothing is perfect and there are no silver bullets for problems at scale.

@ThinkSoftware 2 жыл бұрын

Thanks for the comments.

@curiousspirit3947 2 жыл бұрын

@@Funsiestype i think it's clear what he says from 40:40 to 41:24. If you shard your distributed cache by driver id it makes write easy, but the reads need to ask every node (scatter gather), which is inefficient. "Your design has grid ID’s pre broken partitioned by grid ID either by hash or range. You don’t really provide an answer after all that crapping on the other author. Slick. If hash partitioned by grid ID , won’t you need to scatter gather for neighbors? " => the difference is that now he only does a scatter gather for neighbors IF not enough drivers are found in the grid of the user. and the scatter gather is only on the neighbors, not all the shards.

@umer7936 3 жыл бұрын

Can your company or someone who you may know develop a software like Uber for my rideshare company? please reach out to me.

@syedabdul1168 4 жыл бұрын

hi sir..u have mentioned that you like to mentor..please let me know is there anyway to contact you..

@ThinkSoftware 4 жыл бұрын

Hi. You can contact me through the KZfaq website contact button. It will show my email address.

@bostonlights2749 3 жыл бұрын

6th Sept 2020 6,836 views 1.87 K Subs

@ThinkSoftware 3 жыл бұрын

are you trying to track the progress of this video? :D

@bostonlights2749 3 жыл бұрын

:) Brilliant Content btw, I've seen other channels which just post directly the content from the company's engineering blog . This was unique and from scratch.. Wish you all the best!

@ThinkSoftware 3 жыл бұрын

Thanks for the nice words 😊

@ThinkSoftware 3 жыл бұрын

Thanks

@kiaradas8355 2 жыл бұрын

Does anyone know his name? :)

@theghostwhowalk 4 жыл бұрын

24.14 sec Q how can we use Dijkstra to find shortest path. We can compute S to D total straight line distance. Say it comes to 20Km, then we will have a threshold say 1.5, all the nodes which are 30Km from S or D will be considered and fed into Dijkstra to find the optimal path. At 46.35: about hot partitions: We can pre identify hot spot areas (say downtown or sport arena) and split those areas into even smaller grid for use. That way we can make sure limit is not exceeded. Similarly if there is some mountain or forest then we can combine multiple grids into one. At 59: it depends on if writes of trips are more or reads for driver ids are more. I would think #1 is more frequent. So writes may need to be fast. So prefer option 1 as slower reads for driver may be OK. Most Welcome for any comment to improve/correct me 😀

@ThinkSoftware 4 жыл бұрын

Thanks for the comment 🙂

@theghostwhowalk 4 жыл бұрын

Think Software thanks for all the great content! It’s precious. I can see years of experience and weeks of efforts to come with 1 hour content. Also If you think there is flaw in above, please do let me know. 😀

@ThinkSoftware 4 жыл бұрын

Thanks for the nice words🙂