Cracking the Dropbox / Google Drive System Design Interview Part 1 of 2

Рет қаралды 14,860

Күн бұрын

As part of our systems design series, this week Liz kicks off a comprehensive deep dive on designing file upload and sharing systems such as Dropbox or Google Drive. She walks through how to scope down the problem by thinking about specific components and use cases, demonstrates how to use capacity estimation to inform her database design, and touch on topics such as ACID, meta servers, and iterative design.
The Challenge:
Design a system such as Dropbox or Google Docs in which a user can store their data on remote servers in the cloud. The remote servers should store files durably and securely, and these files should be accessible anywhere with an Internet connection. The users should be able to make changes to files locally and have the changes be automatically reflected in the cloud and on other clients.
Video Overview:
2:01 The challenge
2:55 How to approach system design questions
8:10 Scoping the problem
13:18 ACID requirements
21:00 Capacity estimates
23:14 Diagraming an iterative component interview
31:10 Client usage
36:06 Meta server
38:39 Block server
41:08 Not server
45:40 Iterative database deign
53:40 References
If you have any recommendations for videos you’d like to see, please comment below.
Reference Links:
* Scaling Dropbox Lecture at Stanford -
• How We've Scaled Dropbox
*Dropbox System Design Article
systemdesignprimer.com/dropbo...
*Magic Pocket - file content storage
dropbox.tech/infrastructure/i...
*Dropbox Architecture Overview
www.dropbox.com/business/trus...
Additional Resources:
* Practice hundreds of real coding challenges at coderbyte.com/
* Need more practice? Check out our channel for more videos on preparing for a coding interview / coderbytedevelopers

Пікірлер: 23

@ralucaioan7609 2 жыл бұрын

This is simply awesome. Thanks a lot!

@RakeshYadav-jc2nr Жыл бұрын

Thanks a lot! This was simple, comprehensive and most helpful, as compared to others. The ref links are awesome too!

@durgadeep4988 2 жыл бұрын

Lots of very good information !. thank you!.

@rishisimply 5 ай бұрын

This was immensely useful. Good that you took ideas from Stanford presentation.

@nehasht2 2 жыл бұрын

It was really good video 😊 learnt alot

@Kamrun4U Жыл бұрын

Thanks for sharing this video.

@miriyalajeevankumar5449 2 жыл бұрын

Great work

@sivakumartm 2 жыл бұрын

nice. thanks for sharing this.. when is the part-2 video will be published?

@AM-uc1sw Жыл бұрын

Thanks a lot, it was great!

@_jatin_mittal 6 ай бұрын

amazing video! 😇

@wuaaron662 2 жыл бұрын

thank you so much. The best tutorial i can find for designing dropbox The solution is logically correct while other video only gives you a general intro. But if you pay attention, their solutions is not even workable, due to missing pieces. One KZfaqr says there should be request queue and response queue, blablabla. Compared with your solution, that one is a piece of 💩. A queue cannot send msg to client, without a consumer

@8888vampire8888 21 күн бұрын

did we forget an arrow, for feeding information, from the S3 cluster to the Queuing service? How will a new client, that just logged in, will get the actual modified chunk locally on his machine?

@bhavyabansal1143 Жыл бұрын

Thanks for the amazing video, I have a question if you can please clarify. When user lets say create a new file, I have this workflow in mind (please correct if wrong): 1. Chunker create chunks 2. Watcher notices and updates indexer 3. Indexer updates DB on client and notify synchronizer 4. Synchronizer informs block server to upload chunks to S3 5. block server post upload informs meta server to update meta DB. Given this, looks like a client will never interact with meta server directly but in digram i see that connection. Am I missing anything here?

@yuhengcai5600 7 ай бұрын

1. client will need to interact with metadata server to create/update metadata, with the status of file being "syncing" 2. then it will get a token to interact with block server, to upload file chunk by chunk 3. once that is done, block server will update metadata db. Status will be "synced", and we should also have the reference of the chunk passed into the db

@yuhengcai5600 7 ай бұрын

why do we need a block service to chunk file when there is already a chunker in client side?

@dnavas7719 2 жыл бұрын

that primary key change was genius

@totsubo2000 2 жыл бұрын

Some questions/feedback: 1. In the diagram that shows the meta server communicating with the not server using a queue, the queue has out-arrows to both the not server *and* Metada DB. I'm pretty sure you don't want two systems processing off the same queue. When one systems pops an item of the queue, it will no longer be available for the other system to process. 2. Curious why the not server uses HTTP long polling instead of Web Sockets. I would think web sockets would be more resource efficient than long polling.

@pawankishorsingh 2 жыл бұрын

At timestamp 40:07- why is there a need for block server to connect to metadata database? Isn't metaserver is a facade to this DB and all requests should go via this facade?

@varshard0 2 жыл бұрын

Answer to your question is kzfaq.info/get/bejne/hqtkmtqk2b3MnZc.html.

@varshard0 2 жыл бұрын

This diagram is based on their design back in 2008. They had to make the block server call Meta DB directly to reduce the number of round trip, because the block server and S3 are in a different region. They even had to copy a lot of APIs from the Meta Server into the Block Server.

@pawankishorsingh 2 жыл бұрын

I feel at 25:17 timestamp and an many other places you are displaying "Client" inside a cloud kind of logo and server in a square box. I am ok with servers but "client" should not be drawn like this. Clients are generally user devices and hence that cloud thingy around it is not right.

@pawankishorsingh 2 жыл бұрын

At 42:52- why is there a need to close the connection & then reestablish? Why can't stay open till client is connected (like in case of websockets)?

@varshard0 2 жыл бұрын

That's because Dropbox use long polling, that's how long polling work. I wonder why they use long polling over websocket also. Probably for simplicity and the websocket doesn't reestablish convection automatically.