12: Design Google Docs/Real Time Text Editor | Systems Design Interview Questions With Ex-Google SWE

No video

12: Design Google Docs/Real Time Text Editor | Systems Design Interview Questions With Ex-Google SWE

Рет қаралды 17,458

Күн бұрын

I swear Kate Upton and Megan Fox wrote I was handsome and sexy, you guys just didn't use two phase commit for your document snapshots and version vectors so you never received those writes on your local copy (since your version vector was more up to date than the document snapshot)!

Пікірлер: 62

@felixliao8314 18 күн бұрын

Thanks for the great content Jordan. I have a thought and I want your input: Since your text CRDT basically eliminates conflict, you don't need a versioned vector for resolving conflict again. But I think we can still use it for: 1. avoid unnecessarily merging the CRDTs (i.e. if two nodes have the same VV, then they don't need to merge, or if one vector is strictly smaller than the other, then it can just simply discard itself and the other's CRDT values) 2. use the VV to filter out the unnecessary writes. (i think you covered this implicitly) 3. we use VV to create a partial order and thus achieve causality, although I can't be sure whether we need causality? (conflict free should already promise convergence, we might not care about how we eventually get there, except that we want document versioning?)

@jordanhasnolife5163 17 күн бұрын

So this isn't a state based CRDT (sending the whole document), it's an operation based CRDT (just sending incremental changes), which is why I believe the version vector is important here, operational based CRDT messages are not necessarily idempotent. It enables us to 1) Make sure we don't duplicate apply any messages 2) Make sure we apply all messages

@AP-eh6gr 5 ай бұрын

this is production level detail - definitely requires a second sweep to memorize better!

@jiangnan1909 6 ай бұрын

Hey Jordan, just wanted to drop a huge thank you for your system design videos! They were crucial in helping me land an E4 offer at Facebook Singapore (I did product architecture instead of system design). Really appreciate the knowledge and insights you've shared. Cheers!

@jordanhasnolife5163 6 ай бұрын

That's amazing man! Congrats, your hard work paid off!

@hl4113 6 ай бұрын

Are you able to give me some guidance on what to expect and the aspects that you felt was important to cover for the Product Architecture interview? I have one coming up and I'm at a complete lost as to where they'll steer the conversations.

@jiangnan1909 6 ай бұрын

@@hl4113 1. Contact your recruiter for a detailed outline of the interview's structure, focusing on timelines and key areas. 2. Use Jordan's channel and Grokking the API design course for preparation All the best!

@venkatadriganesan475 2 ай бұрын

Excellent detailed coverage of online text editor. And you made it easy to understand the concepts.

@gangsterism 6 ай бұрын

writing an ot has operationally transformed my free time into wasted free time

@jordanhasnolife5163 6 ай бұрын

Sounds like a solid operation to me!

@Crunchymg 6 ай бұрын

Huge help in landing L4 at Netflix. Much thanks!

@jordanhasnolife5163 6 ай бұрын

Legend!! Congrats :)

@renzheng7845 6 ай бұрын

dang this guy is really good! Thanks for making the video!

@nowonderwhy200 6 ай бұрын

Got an offer from LinkedIn. Your videos were great help in system design interview ❤.

@jordanhasnolife5163 6 ай бұрын

Legend!! Congratulations!

@DevGP 6 ай бұрын

Jordan ! Great video as always 🎉. I have a question , have you considered expanding into maybe dissecting an open source product in a video explaining why certain design decisions were made & discuss maybe how you would alternatively try to solve them ? Once again love all the work you put in, this is GOLD. Thanks !

@jordanhasnolife5163 6 ай бұрын

That's an interesting idea! To tell you the truth, while I'm curious about doing this, the truth is that the amount of time that I'd probably have to put into looking into those codebases would be pretty wild haha. Not to mention that the guys working on open source software are a lot more talented than me!

@khushalsingh576 2 ай бұрын

great video and the information at 07:54 (Fortunatly there are engineers who has no life ... 😂😂) made the practicle touch

@VijayInani 22 күн бұрын

I was searching for a comment quoting this. What subtle low-profile sarcasm! :D ... Loved it!

@soumik76 Ай бұрын

Hands down the most in-depth coverage of the topic! One question that I had - is MYSQL a good choice for write db considering that they will be write-heavy?

@jordanhasnolife5163 Ай бұрын

Well, maybe not, just since I wonder how good of a write throughput we can get with an acid database using b trees, that being said I'm sure it's fine realistically

@fluffymattress5242 3 ай бұрын

The level of detail in this video makes me want to burn all those stupid superficial bs i have been reading all these years. Imma name my 3rd kid after your channel dude ;).... the 2nd one is gotta be martin tho

@jordanhasnolife5163 3 ай бұрын

"Jordan has no life" is a great name for a kid, I endorse

@user-tm5uo5vf6k Күн бұрын

What if we used some kind of gossip protocol to send writes to client? ie. - Each server that receives writes from clients sends it to x random servers (and those random servers sends those writes to the connected clients for those servers). This is useful we anyway want a persistent connection with the client to send them data and each server doesn't have to know about all clients? - This obviously means we need to know about other servers using something like ZK which we anyway likely will do because we want all writes to reach other servers? (may be not in the latter section where all writes go to a DB)

@user-vz3zp2qg9q 4 ай бұрын

Thank you for this video! Pretty cool

@priteshacharya 3 ай бұрын

Great video Jordan. Two questions on final design screen: 1. Write DB sharding: What is the difference between sharding by DocId vs DocId+ServerId? 2. Document Snapshot DB: We are sharding by docID and indexing by docId+character position, is this correct?

@jordanhasnolife5163 3 ай бұрын

1) We now become bottlenecked by a single database node. If we shard by doc and server id each server can write to a close by database. 2) Yep!

@user-wj1wy6ph5q 6 ай бұрын

🙇 interesting concepts covered. Thank you

@sarvagya5817 6 ай бұрын

Thank you amazing video 🎉

@antonvityazev4823 3 ай бұрын

Hey Jordan! You did a great job with this one, thanks for you hard work! After watching this video and looking at the final design I didn't quite get to which place would a reader connect to receive updates about new changes in the document? I see that there are arrows to cache, to vectors db, to snapshots db and to write db, but don't see any WS server or something Could you clarify please?

@jordanhasnolife5163 3 ай бұрын

Leader first gets a snapshot with a version vector from the vectors and snapshot db, and from there subscribes to changes on document servers, applying any subsequent updates

@antonvityazev4823 3 ай бұрын

Much appreciate it

@user-id1sf2ib3s 6 ай бұрын

Hi Jordan! Just watching the CRDT part of the video where you mention giving fractional ids to the characters, between 0 and 1. I was wondering how/at what point these ids are assigned. For instance, if you create a blank document and start typing, what would it look like? And if you then add a few paragraphs at the end, how would these new indexes be assigned? The example you gave (and that I've seen in other places) treat it as an already existing document with already assigned indexes and you just inserting stuff in between. I was thinking it might be a session thing - i.e. the first user that opens a connection to the file gets these assigned and stores in memory or something, but I watched another video where you mention it being indexed in a database. I'd love to know!

@user-id1sf2ib3s 6 ай бұрын

I think I understood in the end, maybe? indexes 0 and 1 don't actually exist - your first character will be around 0.5, second character around 0.75, and so on... you're only going to get indexes < 0.4 if you go back on the text and add characters before the first character you added. If you write without stopping or going back, you'll get 0.5, 0.75, 0.875, 0.9365 and so on?

@jordanhasnolife5163 6 ай бұрын

Hey! I think this is probably implementation dependent, but I imagine the idea here is that there's some frontend logic to batch quick keystrokes together so that they're all assigned similar indices as opposed to constantly bisecting the outer characters (see the BIRD and CAT) example.

@hl4113 6 ай бұрын

Hey Jordan, is there anyway you can make some content regarding how to tackle product architecture interview? I have one from meta coming up and couldnt find many sources of examples for content more focused on API design, client server interactions, extendibility, etc...? There are no examples I can find related to this on youtube. Thank you for all your content!

@jordanhasnolife5163 6 ай бұрын

Hey! I've never done this interview myself so perhaps I'm not the most qualified. But considering that I've had multiple people on here say that they've passed meta interviews, I imagine it's pretty similar to systems design.

@joshg7097 6 ай бұрын

I wonder why you would use another db plus two phase commit for the version vector table, instead of using the same db and use transactions instead.

@jordanhasnolife5163 6 ай бұрын

If I have to partition the data for a big document over multiple tables I need a distributed transaction

@jordanhasnolife5163 6 ай бұрын

If we assume all documents can fit on a single database totally agree that's a much better approach

@joshg7097 6 ай бұрын

The version vector for a document can exist on the same partition as the documents partition. If we assume a document can only reach megabytes and not gigabytes it's safe to assume a single document can exist on a single partition. Even if a single document has to be chunked, then we can still colocate the version vector for that chunk.

@jordanhasnolife5163 6 ай бұрын

@@joshg7097 Hey Josh, you can co-locate it, but it still becomes a distributed transaction which needs to use 2pc. Also, ideally, we don't have to be too rack aware in our write storing I feel like, because if we were to use something like AWS we don't necessarily have those controls. I agree with your point though, probably 99.9% of the time a document won't span multiple partitions and in such an event you should store the version vector local to its partition and don't need 2pc.

@joshg7097 6 ай бұрын

@@jordanhasnolife5163 I accepted an L5 meta offer a few months, I watched every single one of your videos, huge thanks to the gigachad 😁

@rjarora Ай бұрын

I guess Cassandra is a good choice for Snapshot DB since we can use the character position as the clustering key. WDYT?

@jordanhasnolife5163 Ай бұрын

I think it's an interesting idea, though my thinking was we really want a single leader here so that snapshots are consistent with the entry in the version vector DB

@rjarora Ай бұрын

@@jordanhasnolife5163 Would you also use something like s3 to store big docs' snapshots in your system?

@levyshi 3 ай бұрын

Great video! just curious what might be different if this was for a google sheets like product, rather than a document.

@jordanhasnolife5163 3 ай бұрын

Frankly I think you'd have less collisions which probably means you can get away using a single leader node and not be that bottlenecked. If for some reason you did need to do this, you'd basically need a way of combining writes to the same cells, which doesn't really make much sense intuitively. I'd say if you want to do multi leader you should probably at least incorporate a distributed lock so that if two people decide to edit cells at the same time, we conclusively know which one came first.

@levyshi 3 ай бұрын

@@jordanhasnolife5163 Was thinking the same thing, have them write to the same leader, and let the leader's own concurrent write detection decide.

@asyavorobyova2960 5 ай бұрын

Hey Jordan, first of all, thnx for the great video! I have a question: can we use event-sourcing design approach instead of CDC? Meaning that using Kafka topics as the main source of truth instead of the writes' DB. We can consume from Kafka and build snapshots DB, and also users can consume from the needed Kafka partition to get the latest document changes. Thus we automatically get an order for writes inside any single partition and have persistence for writes. WDYT?

@jordanhasnolife5163 5 ай бұрын

Absolutely! Keep in mind though that this implies that the single kafka queue becomes a point that all writes need to go through, which we want to avoid. If we do event sourcing with multiple kafka queues and assign ids to each write based on the queue id and the position in the queue, then use the vector resolution logic that I discuss, I think that this would be better!

@asyavorobyova2960 5 ай бұрын

Thnx, of course I have in mind using separated Kafka partitions for each document (or set of documents), and using topic's offsets to store for using with snapshots. I'm not sure although if we can use the only one topic with multiple partitions for all writes, because if we have too many partitions for one topic it can increase latency. Maybe it's better to somehow split the incoming data and use many topics, to avoid this problem.@@jordanhasnolife5163

@firezdog 4 ай бұрын

I’m 30 minutes in. Got the sense each client just gets all these messages from other clients and applies them using some merge function that guarantees the result of applying messages in the order received makes sense - with a little bit greater consistency (via version vectors) for writes from the same client. But I’m wondering - is there any sync point at which all users are guaranteed to see the same version of the document? Because if not clients could just diverge more and more over time…

@jordanhasnolife5163 4 ай бұрын

Yep - no there is not any sync point. If we wanted to, we could occasionally poll the db on an interval to ensure we don't get too out of wack.

@evrard90 Ай бұрын

Easy peasy

@LeiGao-im7ii 4 ай бұрын

Beautiful！！！

@jasdn93bsad992 4 ай бұрын

19:15 the result of interleaving of "cat" and "bird" should be "bciartd", right?

@jordanhasnolife5163 4 ай бұрын

Ah yeah maybe a typo on my end

@jasdn93bsad992 4 ай бұрын

@@jordanhasnolife5163 Yeh right. No worries. Great video, thanks man

@RobinHistoryMystery 5 ай бұрын

Dayum boi

@ShreeharshaV 2 ай бұрын

Thanks for the video Jordan. At kzfaq.info/get/bejne/j6maiax125begY0.html How does the new client that has no content fetched so far get the content from Snapshot DB directly? What does it ask the Write DB or Document DB at this point?

@jordanhasnolife5163 Ай бұрын

You go to the snapshot DB, get a snapshot at some time T, and then poll the writes db for writes beyond that snapshot until you're caught up (e.g. incoming writes to the document are the next ones on top of what you already have).