Design Google Drive or Dropbox (Cloud File Sharing Service)

Design Google Drive or Dropbox (Cloud File Sharing Service) | System Design Interview Prep

Рет қаралды 80,610

Күн бұрын

Visit Our Website: interviewpen.com/?...
Join Our Discord (24/7 help): / discord
Join Our Newsletter - The Blueprint: theblueprint.dev/subscribe
Like & Subscribe: / @interviewpen
This is an example of a full video available on interviewpen.com. Check out our website to find more premium content like this!
Problem Statement:
Design a simple cloud file-sharing (& backup) service (analogous to Dropbox, Google Drive, etc).
*Starting Requirements/Information:*
- Users will have a desktop client
- All files in a specific folder will be synced to the cloud
- Other users need to be able to see changes to files
- Users will pay for storage (on a metered basis)
*Rough Starting Assumptions:*
- Userbase Size: 100M users (1M Daily Active Users)
- *Average File Size:* 10MB
- *Average Files Stored (per User):* 10
- *Average Clients (per User):* 2
- *Average Edits/Day (per User):* 100
*Resources*
- Binary Large Object (BLOB): en.wikipedia.org/wiki/Binary_...
Table of Contents:
0:00 - System Requirements
0:57 - Assumptions/System Statistics
1:40 - Visit interviewpen.com
1:59 - Naive Implementation
4:51 - Issues: Bandwidth Usage
5:19 - Issues: Network
5:40 - Chunking Updates
8:09 - Rough Calculations
10:21 - Summary: Rough Calculations
12:05 - Handling Subscriptions
13:45 - Database I/O
15:22 - Scaling the Database
17:09 - Options for Scaling
18:37 - High-Level System Diagram
20:19 - Update Notifications
22:05 - Database Sharding
24:13 - Further Considerations
26:21 - Key Moments Recap
28:04 - Visit interviewpen.com
Socials:
Twitter: / interviewpen
Twitter (The Blueprint): / theblueprintdev
LinkedIn: / interviewpen
Website: interviewpen.com/?...

Пікірлер: 85

@interviewpen Жыл бұрын

Thanks for watching! Visit interviewpen.com/? for more great Data Structures & Algorithms + System Design content 🧎

@shemleong7571 Жыл бұрын

Great overview. There's a few tweaks I would make: 1) Have ingest service return a presigned url so the client can directly upload to s3. This offloads the bandwidth problem to the client. 2) Tap on event triggers to handle the post-upload activities. 3) Instead of that first queue, rate limiting or API throttling might be a more appropriate way to manage the load.

@interviewpen Жыл бұрын

Agreed, using presigned S3 URLs is a great solution to manage load on the ingest API. That would also potentially eliminate the need for the queue in front of that API. Thanks for watching, stay tuned for more!

@meprateek24 7 ай бұрын

If the file gets uploaded directly to S3 then we might face the same issue that was explained in the beginning of the video about if the connection breaks then the whole upload has to start again. Will S3 upload also happen in chunks?

@drhdev 6 ай бұрын

@@meprateek24 Not our problem

@justinchan4810 Ай бұрын

@@meprateek24 It's safe to assume the file upload can also be done in chunks, especially since the design stores the file in chunks in S3 (shown at 7:58)

@buntysingh7315 Жыл бұрын

thanks for taking the effort!

@interviewpen Жыл бұрын

sure!

@teetanrobotics5363 11 ай бұрын

I hope this message finds you well. I wanted to take a moment to express my sincere gratitude for the exceptional content you've been sharing on your KZfaq channel. Your recent series of five top-notch and in-depth system design videos have been an absolute treasure trove of knowledge. The clarity and depth with which you explain complex concepts are truly commendable. Your videos have been instrumental in helping me grasp the intricacies of system design and architecture. The practical examples you provide, along with your lucid explanations, have made learning a pleasure. I want to encourage you to continue creating such invaluable content. Your unique ability to break down complex topics into understandable components is a true gift. If possible, I would love to see more of these insightful system design videos from you in the future. Additionally, it would be fantastic if you could consider curating these videos into a playlist. Having them organized in one place would be tremendously helpful for both newcomers and those looking to revisit certain concepts. Once again, thank you for your dedication and hard work in sharing your expertise. Your contribution to the learning community is truly appreciated. I eagerly await more of your enlightening videos.

@interviewpen 11 ай бұрын

Yes, we do have a “System Design” playlist on this KZfaq, as well as more videos on interviewpen.com Thanks for the kind words & thanks for watching 👍

@dd-qz2rh 5 ай бұрын

bro went straight ahead and utlizied that sweet chatgpt power

@yxawp 9 ай бұрын

IOPS: (1M)(100) / 86,400 is ~ 1150/sec. Not clear why it was calculated to 115,000/sec in Handling Subscriptions section.

@interviewpen 9 ай бұрын

Good catch, thanks!

@mecury007 3 ай бұрын

Yeah I lost 20 minutes trying to understand why 1M * 100 = 10B and not 100M

@kumar_gautam24 Жыл бұрын

Thanks, great content

@interviewpen Жыл бұрын

Glad you liked it, more content is on the way!

@khanhtoanle8396 Жыл бұрын

Nice video!

@interviewpen Жыл бұрын

sure!

@Oz1111 8 ай бұрын

These system design vids are great. Given your expertise and how well you cover these topics, can you do a basics of system design explaining different services and common parts of system design? I know there are other channels that do this but I'd love to have you do one as your content is super clear and easy to follow. Thanks.

@interviewpen 8 ай бұрын

Thanks! If you're looking for a full course, check out interviewpen.com!

@hackaholic01 Жыл бұрын

For the Storage usage validate, you can remove the all overhead by below client, will have the file stats, client can request user metadata and check is there any storage available before uploading the file.

@interviewpen Жыл бұрын

Thanks for watching. I might be misunderstanding, but it sounds like you're suggesting having the client itself validate whether it has bought enough storage. You're right that doing this could reduce some overhead, but it would defeat the entire purpose of that step since the client could simply lie to the service about how much storage it has when uploading a file. It's important to make sure logic like this happens on the server side since clients are inherently untrusted.

@firezdog 9 ай бұрын

security was not mentioned at all, nor anything about concurrent writes -- but much better than anything i could have done, that being said.

@interviewpen 9 ай бұрын

There’s a limited amount of information we can convey in one video, but yes-security and concurrency are both super important things to consider here! Thanks for watching.

@user-jz9fe6bz1x Жыл бұрын

Thanks for the video love it

@interviewpen Жыл бұрын

Thanks for watching!

@AdarshMadrecha Жыл бұрын

Good insights

@interviewpen Жыл бұрын

thanks for watching!

@sivam5204 Ай бұрын

Chunk concept could be explained more.:)

@harshraj22_ Жыл бұрын

Assuming by Queue you meant the message queue, I would like to know your thoughts about using kafka instead of queue for notification service, with their pros and cons. Btw, great video :)

@interviewpen Жыл бұрын

We never specified specified what platform we would use for queues, but Kafka is a great choice for a system like this. The distributed nature of Kafka queues means they can be horizontally scaled to handle an extremely high load, and that would enable the system to handle the high traffic requirements. Glad you enjoyed the video!

@nealpan 10 ай бұрын

Great

@interviewpen 10 ай бұрын

Thanks!

@amirafshari1613 Жыл бұрын

@interviewpen what do you think of mentioning managed solutions instead. so for example instead of a manually sharded DB, a cosmos DB managed Postgres that autoshards or a Citus distributed SQL cluster that auto shards?

@interviewpen 11 ай бұрын

Totally! There's usually managed solutions for most of the services that we discuss in these designs, but we try to keep the videos general so you can understand the concepts regardless of how they're deployed. Thanks for watching!

@hfspace Жыл бұрын

one thing that has not been touched and comes to my mind immediately, is that the way chunking is handled here has room for improvement. because what if someone changes a file in the middle and adds loads of data to it (which would result in multiple new chunks in multiple different locations in the file). then you could reload the complete file or you implement some more complex indexing for the chunks, i guess and do a reindex operation.

@interviewpen Жыл бұрын

Yes this is correct - we just skimmed over it and said "do chunking", but the chunking itself is a mini-research paper in itself. We find this is the case with a lot of concepts we cover! So we just try out best to hit the major details! Thanks for watching - more coming!!!

@Wei-up2jn 2 ай бұрын

Great Video! Thanks for sharing! One question about the sharding: if we are sharding by UserId + FileId, doesn't it mean we still have to do scatter-gather if we want to get the full file list of a user?

@interviewpen 2 ай бұрын

Yes, that is correct. There's never a perfect way to shard a DB, so that's the tradeoff with this approach--we'd have to fetch files from every node to get all of a user's files.

@gordonli4946 9 ай бұрын

18:55, client can directly write into a queue? Not upload chunks to a service first then service will process with S3/db ? Wondering what queue is that in in front of backend service; and 23:00 why userid + fileid won’t have the scatter and gather issue as fileid alone?each user has lots of fileid/chunkid and we need at least a table for user/fid/cid anyway

@interviewpen 9 ай бұрын

For the first point, you're absolutely right. We'd need some sort of interface between the client and our queue to enable this. For the second, we need to shard on both user ID and file ID separately (user ID could be a global index), enabling us to query on the field we're looking for. Thanks!

@ravikant-hi8mz Жыл бұрын

What softwares do you use? Including the grey board thing to draw. Please suggest what you are using🙂

@interviewpen Жыл бұрын

GoodNotes. thanks for watching - more coming

@semenivanoff8615 Жыл бұрын

How do you update zip archive by chunks? Or encrypted file? DB is sharded, ok. Why isn't S3 sharded and Geo replicated? Also you rely on S3 provided by AWS and will be paying for virtual capacity of 10 PB, when it could be more practical and cheaper to have own servers and storage collocated in several DCs which do compression and deduplication which can provide alot more of virtual capacity and have less running costs in 2-3 years. But that is arguable You mentioned queue to manage chunks, but those are 1MB chunks. Which queue can use message of that size? Or it should be own developed queue?

@interviewpen Жыл бұрын

Yep, chunking absolutely breaks down with certain file types...but for others it can be very helpful. That said, we can still upload zips, etc. in chunks, even if we do have to upload every chunk. Under the hood, S3 is absolutely distributed and georeplicated. With 10PB of storage, S3 would cost $210k/month...so an on-prem object store would likely be a better option at this scale. Good point! Kafka can technically manage 1MB messages...but it's a bit of an anti-pattern, so there might be better ways to manage congestion in this system (perhaps something custom developed). Thanks for watching!

@Pebblejo 8 ай бұрын

if you use "user+fileID" as the shard key, doesn't that mean you still need to query multiple nodes to retrieve all the info of all the files belong to the same users? how's that better than using only the fileID?

@interviewpen 8 ай бұрын

Yep, since file IDs are already unique, adding the user ID to the shard key has very little effect. Thanks for watching!

@Tony-dp1rl 9 ай бұрын

With the latency and buffering inherent in the queue usage and file IO and user notifications, I doubt there is a need to shard the database at all, and if there was due to load, then SQL isn't a good choice, but storage-backed Redis would be much better. SQL is a terrible choice for generic metadata.

@interviewpen 8 ай бұрын

Well, we'd likely have pretty high error rates if we tried to send that many writes to a single shard. On the second point, you're right that SQL isn't ideal in many use cases; it's hard to shard due to its relational model. There's tons of options for NoSQL sharded databases that could be used in this system. Thanks!

@ebu7 Жыл бұрын

Please make a video about NAS(Network Attached Storage) system design.

@interviewpen Жыл бұрын

We'll add it to the list. Thanks for watching, more content is on the way!

@semenivanoff8615 Жыл бұрын

NAS is a storage accesible by IP (CIFS or NFS) what is so special about it? Or you mean any specific model of a storage system like NetApp?

@fatcat22able 11 ай бұрын

I feel kind of dumb - what is meant by "edit" in this context? Great video!

@interviewpen 11 ай бұрын

I'm not sure which part of the video you're referring to specifically, but an edit is just a single change to a file that triggers a chunk of data to be updated in the system. Thanks!

@fatcat22able 11 ай бұрын

@@interviewpen Thank you for the response! I guess I'm having trouble understanding how a file would be changed in the context of this application? My immediate thought was that a change to a file would entail a full reupload. But I could understand it if the service were such that, if I've uploaded an image to the service, and then I make a change to that image locally, then those changes would be uploaded as chunks in order to update the image in the system as opposed to reuploading & replacing the full image, correct? And this change is what we call an edit? Please let me know if I'm understanding this correctly. Thank you!

@gxo-mt5vo 11 ай бұрын

Useful video but focused too much on back of envelope calculations, and we have 100 mil writes per day, not 10 bil

@interviewpen 11 ай бұрын

There's 100 million users, each performing 100 edits per day => 10B edits per day. The back of the envelope math might seem grueling, but it's really important to make sure we choose the right solutions to scale the system. Thanks for watching, and for the feedback!

@vinaychavadi7411 6 ай бұрын

@@interviewpen DAU is 1 million users, 100 edits per day per user => 100 Million edits perday.

@sagarmantri4743 6 ай бұрын

At 28:39, the calculation of IOPS is seems wrong. (1M)(100)/86900 => 115000/sec? It should be roughly 1e6 * 100 / 1e5 = 1000/sec, am I missing something?

@interviewpen 6 ай бұрын

Yes, you're right. Should've been 1150, not 115000. Good catch :)

@nvskiran 9 ай бұрын

S3 already provides option to upload in chunks. Why are you not using that?

@interviewpen 9 ай бұрын

Yes, manually chunking our files gives us some more control (especially around updating pieces of the file), but multipart uploads could certainly work in this same design. Thanks for watching!

@ashiquehoque762 Жыл бұрын

Could you please share "how QR CODE WORKS?"

@interviewpen Жыл бұрын

Thanks for watching! We'll add that to the list of things to cover. But from a basic perspective, a QR code reader looks for predefined patterns in the image; then it reads the black/white squares in a specific order. Each square is read as a bit, 1 or 0, and all together they form a binary representation of a URL or other message.

@vinayak6564 15 күн бұрын

I feel it doesn't make sense to put chunks in queue, direct client having access to a messaging-queue-system is not practically good idea from security perspective. Also it doesn't reduce load anyhow as messaging queues also need to be scaled if not injestion servers, so it is just adding extra layer just for the sake of adding. Correct me if I am wrong.

@vinayak6564 15 күн бұрын

Only messaging queue for notification service makes sense.

@interviewpen 15 күн бұрын

The idea behind this was that if there are bursts of load, it wouldn't slow down users uploading their data. But I fully agree with you that it doesn't make sense for a client to have direct access, so it's not a very useful solution in this case. A better solution might be to use a tiered storage system behind our BLOB store which can provide very fast reads and writes for frequently accessed data while moving older data to cheaper storage mediums. Thanks for watching!

@vinayak6564 15 күн бұрын

@@interviewpen Thanks for the prompt response and answer! Great content btw finished watching blob storage system design after this.

@zuowang5185 Ай бұрын

Is this prep for a new grad level?

@interviewpen Ай бұрын

System design questions are more likely to be asked for more senior level interviews, but companies are more and more starting to ask these types of questions in more junior roles as well! Either way, it's a good idea to have some understanding of these concepts for any role.

@pradgarimella Ай бұрын

Too much emphasis on calculations. In a real system design interview , candidates will spend 2 mins max on calculations. Anything more you are screwed

@interviewpen Ай бұрын

I don’t agree-one of the most important parts of the system design interview is showing that you can translate product requirements into a solution that fits the use case. This means understanding the load that will be placed on each part of the system. Thanks for watching!