time for a change
8:48
21 күн бұрын
The Coding Influencer's Dilemma
6:44
Пікірлер
@sachinreddy2836
@sachinreddy2836 13 сағат бұрын
You should start a discord one day
@innazhogova3621
@innazhogova3621 16 сағат бұрын
I got a bit confused by the Compression & Compression cont. part on 5:10, do I actually need to memorize / know this algorithm? Or can I just let it slide? (also, loving your videos man, thanks!)
@rembautimes8808
@rembautimes8808 23 сағат бұрын
Great video thanks for sharing
@amazhobner
@amazhobner Күн бұрын
Starting with this playlist today. Plan on doing 5 videos per day. Lessgoo❤
@manojgoyal-y3k
@manojgoyal-y3k Күн бұрын
keeping a in-memory heap for priority queue may be not durable when node make crash recovery. what about if we make index on priority column of the table in sql and it will be stored in disk memory itself?
@jordanhasnolife5163
@jordanhasnolife5163 18 сағат бұрын
I think your solution works. I also think a normal write ahead log with background thread checkpointing may work a bit better.
@davidcswpg
@davidcswpg Күн бұрын
I love ur jokes WAY more than what u have to say about system design 😂 please go get some sleep
@jordanhasnolife5163
@jordanhasnolife5163 Күн бұрын
I wish
@SystemDesign-pro
@SystemDesign-pro 2 күн бұрын
This is it! This is the video you surpassed Stefan
@jordanhasnolife5163
@jordanhasnolife5163 Күн бұрын
It took this long huh
@subee128
@subee128 2 күн бұрын
thanks
@subee128
@subee128 2 күн бұрын
thanks
@gokukakarot6323
@gokukakarot6323 2 күн бұрын
How to handle updates, when its reverse of popular users? Like 1 person following a million users.
@jordanhasnolife5163
@jordanhasnolife5163 2 күн бұрын
The same way as normal
@saileshsirari2014
@saileshsirari2014 2 күн бұрын
Awesome as always!
@SystemDesign-pro
@SystemDesign-pro 2 күн бұрын
Hey Bro, I'm a big fan of Stefan and their channel. However, your channel actually complements his a lot. He has good depth but doesn't cover all aspect of SD, your videos is very easy to watch and beginner friendly. So don't get discouraged. Hey you have way more followers than his. He said that because he wants bring his value proposition so that he can charge. You're the true philanthropist. Something for you to learn is how he market himself and monetize his channel. Love you bro!
@jordanhasnolife5163
@jordanhasnolife5163 2 күн бұрын
Yep - I definitely think I understand how to monetize this channel further, however I'm not enamored by the idea of selling courses in a topic where I don't really feel like I'm contributing that much, feels sleazy to me. Would prefer to continue to post for free and help people out, and when the time comes that I'd like to monetize, hopefully I have an audience that has open ears for a potential product.
@thunderzeus8706
@thunderzeus8706 2 күн бұрын
Hi Jordan, thanks for another great video! This one is really interesting and beefy. I have a question about the "hot comments" section. It is clear to me that when there is a new upvote, the CDC triggers the +1 to corresponding nodes in the db. However, how do we know when it is time to expire some upvotes? Keep monitoring the end of each linkedlist is expensive and wasteful, while getting rid of stale upvotes only when new upvotes comes in cause inaccurate stale "hot" comment ranking. Neither of them look good to me.
@jordanhasnolife5163
@jordanhasnolife5163 2 күн бұрын
I really don't think it's expensive to monitor each linked list, it's a single background thread that can be run on any configurable interval of your choosing.
@jordiesteve8693
@jordiesteve8693 2 күн бұрын
you inspired me a lot. I'm preparing ML system design interviews, and I found so many gaps in the resources out there that I decided to fill the gaps. I'll start small and hopefully someone will benefit
@iknowinfinity
@iknowinfinity 3 күн бұрын
Hey Jordan, how will you modify the priorities of the existing elements with the way that you have partitioned the db? It seems like we will have to search through all the partitions to find the element and then update its priority!
@jordanhasnolife5163
@jordanhasnolife5163 2 күн бұрын
Rather than doing load balancing, you can just hash some aspect of the data and send it accordingly. That way to modify a priority you can go find the hash of it and get to the proper node.
@chandlerbing8164
@chandlerbing8164 3 күн бұрын
15:00 when batman visits his grandma
@timothyh1965
@timothyh1965 3 күн бұрын
Jordan when you mention snapshot versioning right at the beginning are you talking about vector versioning? Or are you simply referring to how a db may store the previous write along the current write?
@jordanhasnolife5163
@jordanhasnolife5163 2 күн бұрын
Would you mind giving me a timestamp
@timothyh1965
@timothyh1965 17 сағат бұрын
@@jordanhasnolife5163 Around 2:12. I think the main idea is that every write has a version number associated with it. In DDIA, they give an example on the leaderless replication chapter (pg 189) between two clients trying to write to a single replica. I think vector versioning (later on, in that same chapter) is this same concept but applied to multiple replicas. Does that make sense?
@williamzhang5336
@williamzhang5336 3 күн бұрын
Hi Jordan, thanks for the amazing videos, really learned a lot from you. One question here I feel confused is that there are many nodes in the design like task queuing node, scheduler node, for me they are just some services, like task enquing service or scheduleing Service, is there any reason you draw them as "node"? Thank you!
@jordanhasnolife5163
@jordanhasnolife5163 3 күн бұрын
I think that service is a fine word to use here
@popricereceipts4279
@popricereceipts4279 3 күн бұрын
This video really helps with understanding the need for stream processing frameworks.
@prakharrai1090
@prakharrai1090 3 күн бұрын
Chose this playlist over netflix and enjoying this so far!
@ShortGiant1
@ShortGiant1 3 күн бұрын
Top notch! Although I’m not sure the 1B number is accurate. For example just Wikipedia has about 60M websites! Not just trying to find a fault in your video, just trying to reason about the design. With such a high number, it might not be possible to crawl all of Wikipedia from just 1 host..
@jordanhasnolife5163
@jordanhasnolife5163 3 күн бұрын
Yep! Definitely the case that for certain hosts we might have to repartition them even further!
@slover4384
@slover4384 3 күн бұрын
About final architecture diagram: Where do you store the actual data about which URLs were processed, or failed to process, and which S3 results links go with each URL? I'd think there needs to be some datastore to maintain all this. Also, if a crawler dies, how do you recover the robots.txt information to prevent any new crawler from breaking the rate limits?
@jordanhasnolife5163
@jordanhasnolife5163 3 күн бұрын
Oof jeez yeah lazy diagram. Will certainly have to be the case that we store S3 links to each URL. Technically, you could store those per flink instance and then access them directly, but agree that writing them to a DB upon completion is preferable in order to have views per crawler iteration. If a crawler dies, we ideally should have its state checkpointed in flink. This either means that we already have the robots.txt info pre-cached, or in the event that we don't, we just go fetch it again on the first hostname load.
@slover4384
@slover4384 3 күн бұрын
The anti-entropy mentioned in 16:54 doesn't seem to be actual anti-entropy... Anti-entropy implies that data on each node should almost always be in sync, but occasionally goes out of sync - so we detect out-of-sync state with a "cheap" detection protocol, and only need to occasionally follow that up and fix mis-aligned state using an expensive operation. In this case, the data on each node is is almost always way out of sync... so anti-entropy wouldn't be what we use. We'd literally just share state without any cheap detection mechanisms. So maybe I'd call it "gossip-based sharing of hashed state" instead. Because what you are really implying (I think) is that you will share hashed values between the nodes using an epidemic protocol versus a broadcast or centralized. This is orthogonal to the entropy comment which isn't really what is going on here.
@jordanhasnolife5163
@jordanhasnolife5163 3 күн бұрын
It's not anti entropy in the dynamo sense, you're right. This is just normal CRDT behavior.
@raghavedrag1527
@raghavedrag1527 3 күн бұрын
12%4 should be 0 rather than 2
@jordanhasnolife5163
@jordanhasnolife5163 3 күн бұрын
Oof yeah lol nice catch.
@gokukakarot6323
@gokukakarot6323 4 күн бұрын
Could you also discuss a raffle system, in any system design video like this later? Its something nike does to handle bots. I mean a third party service does it for them, but anyway seems alright.
@jordanhasnolife5163
@jordanhasnolife5163 3 күн бұрын
I probably won't make a video on this in a while but I'd just batch everything in kafka, and then have some consumer that after a certain point in time picks the winners and then notifies the winners via some sort of fan out (email, notifications, etc).
@shubhamjain1153
@shubhamjain1153 4 күн бұрын
How does receiving of message happens, once it reaches to Flink node. Is someone subscribing to this Flink node? You mentioned load balancer receiving the message, then isn't that going to cause thundering herd problem?
@jordanhasnolife5163
@jordanhasnolife5163 3 күн бұрын
Flink sends it to the chat servers associated with (actively connected to) the recipients of the messages For q2, in theory a load balancer can cause a SPOF, or you could run many in an active-active configuration, listening to zookeeper.
@prohitsaichowdary5966
@prohitsaichowdary5966 4 күн бұрын
Great video! In the final design, where is the rate limiting logic implemented? Is it in between the load balancer and the redis database? Or is it implemented on the backend services? Or is it just for us to decide based on the constraints?
@jordanhasnolife5163
@jordanhasnolife5163 3 күн бұрын
I'd say mostly for you to decide based on constraints. There are tradeoffs to each approach between how independently our rate limiting service can scale and the latency at which we can perform rate limiting.
@shubhamjain1153
@shubhamjain1153 4 күн бұрын
Hi Jordan, since you have talked about showing messages of particular chat to the user. If we have sorting of the messages done on the server-side, then instead of returning all the messages and sorting on the device, we could have lazy-loading of the messages. Server can send paginated kind of response for all the messages and load as user scrolls. Does it makes sense?
@jordanhasnolife5163
@jordanhasnolife5163 3 күн бұрын
I believe I'm proposing the same thing. That being said when you're at the bottom of the messages you need to see them come in real time.
@shangma9176
@shangma9176 4 күн бұрын
I dont think you can do pre-materalize for the idempotency. If a user click the pay button twice, the second pay request will ask your "pre-materialize key service" for a new idempotent key. Now the problem goes back to the original one, how can you generate a idempotent key for the request in the first place.
@jordanhasnolife5163
@jordanhasnolife5163 3 күн бұрын
The idempotency key is generated on page load, not on user click. If they reload the page that's a different story.
@dirty-kebab
@dirty-kebab 4 күн бұрын
Had me in the first half 😭
@slover4384
@slover4384 4 күн бұрын
The calculation of runtime at 13:40 is wrong. Runtime is O(log(n) + m) where is n is total words in dictionary, and m is total words matched by the query
@jordanhasnolife5163
@jordanhasnolife5163 3 күн бұрын
I assume you're using this to account for the actual computation cost of finding the start and end of where we're reading. Good point. That being said, for the m piece I think it would be m * log(k), where k is the number of typeahead suggestions to return, as we need to formulate our heap.
@fallencheeto4762
@fallencheeto4762 4 күн бұрын
Bro looking nice and chubby today, Jkjk nice video 😂😅
@jordanhasnolife5163
@jordanhasnolife5163 3 күн бұрын
I'm at the lowest point of my cut :cry:
@fdddd2023
@fdddd2023 4 күн бұрын
Hi Jordan, can you do design for Price Drop Tracker (like camelcamelcamel), Privacy & Visibility Controls, Designing Internationalization. Those are popular questions recently and almost no materials about it
@jordanhasnolife5163
@jordanhasnolife5163 3 күн бұрын
Can you generate functional requirements for each of these?
@Osama-k7q
@Osama-k7q 4 күн бұрын
Great Video please keep it up
@JuliaT522
@JuliaT522 4 күн бұрын
Oh ! I thought “chubby” is self criticism but it’s a system name lol
@jordanhasnolife5163
@jordanhasnolife5163 3 күн бұрын
It's both :)
@hazemabdelalim5432
@hazemabdelalim5432 4 күн бұрын
I like chubby
@jykimmm
@jykimmm 4 күн бұрын
Hi Jordan thanks for the video - I feel like we haven't touched on how the producer to broker portion of streaming is fault tolerant? As in what if the producer goes down before receiving acknowledgement from the broker, or if the broker goes down. Or any failure scenario in general - how can we ensure the message was processed?
@jordanhasnolife5163
@jordanhasnolife5163 3 күн бұрын
Flink isn't really responsible for making producers fault tolerant. It just ensures that once the message hits kafka, it will be processed. If you want to make a producer fault tolerant, you can do the same things we normally look to do (replicate it, have a mechanism for failover).
@rahul10anand1
@rahul10anand1 4 күн бұрын
Pretty solid video. Really loved the optimisation you discussed.
@lagneslagnes
@lagneslagnes 4 күн бұрын
At 22:52, you showed a CDC from "user-following" table (so deriving a table from an already derived table). But your final architecture diagram does not have this. That whole section of final architecture is a bit hard to understand and relate to the rest of the video.
@jordanhasnolife5163
@jordanhasnolife5163 3 күн бұрын
It does have it (see user follower table middle left and user following table top right)
@lagneslagnes
@lagneslagnes 3 күн бұрын
@@jordanhasnolife5163 I don't see a CDC off user-following table like in 22:52 .
@jordanhasnolife5163
@jordanhasnolife5163 3 күн бұрын
@@lagneslagnes it's off the last replica and into flink
@lagneslagnes
@lagneslagnes 3 күн бұрын
@@jordanhasnolife5163 From what I see, the earlier slide had the CDC off the ING table (deriving data from already derived data as you say during that slide) to generate the verified cache, but we only have a CDC off the ER table in final diagram :) Not a big deal, your video had novel ideas and deserves 2 likes.
@jordanhasnolife5163
@jordanhasnolife5163 3 күн бұрын
@@lagneslagnes Haha thanks, perhaps a typo on my end, I don't really think which one is the "derived" vs the "source" matters in this particular case, nice catch!
@connornusser-mclemore631
@connornusser-mclemore631 4 күн бұрын
good stuff dad
@parteeks9012
@parteeks9012 4 күн бұрын
Good stuff daddy*
@jordanhasnolife5163
@jordanhasnolife5163 3 күн бұрын
that's father to you
@Balaji-uz8kp
@Balaji-uz8kp 5 күн бұрын
Thanks Barry Keoghan for taking time out of your busy acting job to teach us LLD
@jordanhasnolife5163
@jordanhasnolife5163 3 күн бұрын
Haven't heard this one before
@youtube10yearsago22
@youtube10yearsago22 5 күн бұрын
How flink will send the messages to the load balancer or the server ?? Aren't we suppose to use kafka here between the flink and the load balance ?
@jordanhasnolife5163
@jordanhasnolife5163 3 күн бұрын
1) HTTP 2) Why?
@youtube10yearsago22
@youtube10yearsago22 3 күн бұрын
@@jordanhasnolife5163 You mean we'll send each and every messages to the user using http going through load balancer, not even websocket ???
@miry_sof
@miry_sof 5 күн бұрын
What device do you use for the hand drawing?
@jordanhasnolife5163
@jordanhasnolife5163 3 күн бұрын
iPad, oneNote
@felixliao8314
@felixliao8314 5 күн бұрын
Thanks for the great content Jordan. I have a thought and I want your input: Since your text CRDT basically eliminates conflict, you don't need a versioned vector for resolving conflict again. But I think we can still use it for: 1. avoid unnecessarily merging the CRDTs (i.e. if two nodes have the same VV, then they don't need to merge, or if one vector is strictly smaller than the other, then it can just simply discard itself and the other's CRDT values) 2. use the VV to filter out the unnecessary writes. (i think you covered this implicitly) 3. we use VV to create a partial order and thus achieve causality, although I can't be sure whether we need causality? (conflict free should already promise convergence, we might not care about how we eventually get there, except that we want document versioning?)
@jordanhasnolife5163
@jordanhasnolife5163 3 күн бұрын
So this isn't a state based CRDT (sending the whole document), it's an operation based CRDT (just sending incremental changes), which is why I believe the version vector is important here, operational based CRDT messages are not necessarily idempotent. It enables us to 1) Make sure we don't duplicate apply any messages 2) Make sure we apply all messages
@krish000back
@krish000back 5 күн бұрын
Thanks a lot for great contents and mainly to continue your dedication to keep producing these. Quick question: you mentioned Riak can be used, so that if conflict occur, we will assign one the next available sequence. But, at that point of time, we might have already provided the short urls to both the users, that will still be same issue as Cassandra? Am I missing anything here to use Riak?
@jordanhasnolife5163
@jordanhasnolife5163 3 күн бұрын
Can you give me a timestamp here? In retrospect I don't think Riak is a good choice here lol, would recommend the 2.0 version of this video
@MohamedAzmyazmy92
@MohamedAzmyazmy92 5 күн бұрын
Thanks for the video!! I have one question, why don't we partition the chat servers by combination of video-id/user-id or even just by user-id?, this way we won't have an overloaded chat server for popular streams
@jordanhasnolife5163
@jordanhasnolife5163 5 күн бұрын
Because we only want to have to read from one place for a given chat
@Anonymous-ym6st
@Anonymous-ym6st 5 күн бұрын
Modern system more CPU bound instead of network bound -> not sure I understand it correctly. If it is about latency, network is definately taking more time. QPS wise, CPU bound can be solved by adding more nodes, but network bandwidth will be just like that much? (open to discuss, I don't have any experience on storage on my own..)
@jordanhasnolife5163
@jordanhasnolife5163 5 күн бұрын
It basically means that in something like AWS, if we want to perform a large analytical query, the main thing slowing us down is the ability of CPUs to parse through the data, as opposed to actually moving data from host to host over the network in order to parse it.
@Anonymous-ym6st
@Anonymous-ym6st 5 күн бұрын
I am curious if it is common to use two type of DB in real use case (of course for big company as KZfaq it's worth, but considering we are designing for a team / an org tech work), maybe compared with adopt cassandra, optimize based on mySQL will be more like a real case?
@jordanhasnolife5163
@jordanhasnolife5163 5 күн бұрын
Fair enough consistency of DB choice can be a real draw in some places.
@AnkitaNallana
@AnkitaNallana 5 күн бұрын
Excited for you! And excited for what comes next! I will await those deep dives so I can CRUSH my next sys design interview (my one motivation to consume what you're planning next is that one interview where I failed where as a junior SWE I was asked to reason along the lines of a research paper -.- yeah im petty i just wanna get back) Good luck!!! And we'll see you again!!! :)
@Anonymous-ym6st
@Anonymous-ym6st 5 күн бұрын
if we use video id + ts as index for comment, will it be case that some comment are posted at the exact the same ts?
@jordanhasnolife5163
@jordanhasnolife5163 5 күн бұрын
I mean you can always add the user id of the comment poster if you're afraid that duplicates are going to overwrite one another.