Twitter/Instagram/Facebook Design Deep Dive with Google SWE!

No video

Twitter/Instagram/Facebook Design Deep Dive with Google SWE! | Systems Design Interview Question 2

Рет қаралды 11,824

Күн бұрын

I could def do a better job designing and running Twitter than Elon, just saying.
Timestamps:
00:00 Introduction
00:52 Functional Requirements
02:43 Capacity Estimations
05:22 API Design
07:09 Database Schema
09:22 Architectural Design

Пікірлер: 83

@Garentei Жыл бұрын

Thank you for putting out all of these videos. I will watch most of the videos on your channel for my system design preparation since other KZfaqrs seem to just present the final architecture "as a matter of fact" and fail to explain the nuances. I think you do an extraordinary job at explaining the justification for every decision you make, unlike most channels that just seem to re-explain whatever they read in a book.

@jordanhasnolife5163 Жыл бұрын

Thanks Thomas! That's why I made the channel :)

@ahnjmo Жыл бұрын

Hey Jordan! Thanks for the video. I had three questions: 1. Why do we have to split up the following / followers table in Cassandra? What's the problem with keeping them as just 1 usersFollowers table? 2. How would we scale the "feed cache" per user? I think I'm still a bit confused, because given our requirements, we may have 200 million active users. Do we create a partition for all 200 million users in our cache? 3. What if we didn't do with Flink? Couldn't we have some sort of pub/sub method with kafka to directly notify our directly to our feed caches to be updated once a user follows someone? Once again, thank you so much for the video, the content is super helpful!

@jordanhasnolife5163 Жыл бұрын

1) I think it is useful to be able to quickly query to find all of a users followers and all of the people they are following - in Cassandra you cannot have multiple different sort orders (for an index), in a relational db your solution would be better. 2) You can have many different feed caches, and use consistent hashing to split the range of userIds over caches to keep the load on each relatively constant. 3) I think that there are Kafka only solutions but you'll have to do some amount of stream processing to figure out a user's list of followers when they tweet so that those tweets can be sent to all of the proper caches.

@axings1 10 ай бұрын

@@jordanhasnolife5163 Can you elaborate on how to "split up" the following / followers table in Cassandra? Is there going to be a single table? If so, would each user have a partition key, and then each partition contains a full list of that user's followers and followees? In other replies you mentioned that there is no transaction in no-SQL database to ensure multiple records are updated atomically, so how would you add a follower-followee relationship to this Cassandra table in a consistent way?

@huangethan2993 Жыл бұрын

I'm one of those who don't usually comment but I'd say your System Design video, Jordan, is the most "normal" and hence the greatest of all these System Design videos out there IMO.. I've watched at least 10 Twitter System Design videos and I'm enough of "Fan-out" and "Celebrity issue" stuff... Somehow I feel others didn't really have real experiences and just put together materials out there - the way you talked through things is more like how experienced engineers in real life would talk about systems and architectures. So thank you very much! Just one question for this particular one, is there any specific consideration for the "Post Service" path to first write to a in-mem queue, instead of directly writing to the "Post Table/DB" and then use Change Data to trigger the following path to Flink? I'm thinking you have to store these post in DB anyway so it could possibly get rid of the in-mem queue and the complexity along with it.

@jordanhasnolife5163 Жыл бұрын

Seems reasonable to me!

@pashazzubuntu 2 жыл бұрын

Google SWE in an Amazon T-shirt? Woohoo! Seriously though discovered u through Blind. I'm preparing for uber and amazon onsite so your info is extremely helpful! Will eagerly wait for further vids.

@jordanhasnolife5163 2 жыл бұрын

Haha that's where I worked first, glad to hear it! Good luck!!

@semperfiArs 2 жыл бұрын

Brother keep up the quality work. Don't bother about the views right now since you are still a growing channel

@jordanhasnolife5163 2 жыл бұрын

Appreciate it Prath!

@semperfiArs 2 жыл бұрын

@@jordanhasnolife5163 I love the way you talk Jordan with confidence. And I hardly see any cuts while you are talking. Are you reading from somewhere or do you practice this over and over because you barely fumble. Asking because I want to get better at this, talking in front of the camera.

@jordanhasnolife5163 2 жыл бұрын

@@semperfiArs Nope I just have some notes that I look at! You'll get better with practice! The more prepared that you are as well (you feel like you really understand what you're presenting), the easier it is to drone on about

@StevenCodeCraft Ай бұрын

Jordan thank you man! You built different for real tho // I also have no life and will be going through all of your videos and taking notes

@jordanhasnolife5163 Ай бұрын

Lfgo!

@Dun1007 2 жыл бұрын

Another quality video. Wish me luck for Amazon L5 onsite next friday 😝

@jordanhasnolife5163 2 жыл бұрын

You got this beast!

@ROFEL Жыл бұрын

How did it go?

@fokkor 10 ай бұрын

great content, very different from other youtubers out there. can you make a video on extended requirements - 1. How to design reddit like comments(nested comments) 2. How to design something like privacy filter for a post in FB.(Only friends, private,public etc.)

@jordanhasnolife5163 10 ай бұрын

I definitely like the reddit comments remark, I'll do this one eventually!

@jordanhasnolife5163 10 ай бұрын

Probably same for privacy filter

@TheMr9414041667 2 жыл бұрын

Hello Jordan Can you explain why you have used flink consumer ?

@jordanhasnolife5163 2 жыл бұрын

Any fault tolerant, stateful stream processing technology would work - I just know Flink the best. It is useful here because by storing follower relationships locally you can have greater data locality and reduce the number of necessary network calls to the DB.

@Roshmore7 Жыл бұрын

Http streams might be a better option instead of websocket as the latter have scalability challenges, and we don’t need a bidirectional connection, just one request and a stream of responses.

@jordanhasnolife5163 Жыл бұрын

I'll have to look into these, but they sound similar to server sent events, thanks for the feedback

@kamalsmusic 2 жыл бұрын

For the following relation, why not just have a mapping of userId -> array of userIds they follow instead of having a table wiht userid1/userid2? This can be achieved with some document based DB so you don't need to have some additional index on a table to grab all the friends for a given userid

@jordanhasnolife5163 2 жыл бұрын

Because not only do you want to see who someone follows, you also want to be able to quickly figure out all the followers of a given user, which is hard that way. You could maintain a second document DB table that has all the followers of a given user, but then for each follow or unfollow operation you'd have to modify both tables, which keep in mind is in a NoSQL database so there are no transactions to guarantee that both operations succeed or fail together.

@ahnjmo Жыл бұрын

Hey Jordan! I got a question - Would there be an instance where Apache Flink, since it's not a persistent data store, would not have the information about the user's followers? For example, after the first time user A follows user B, apache flink would have this information. However, lets say 3 months later, user B makes a post - would it be possible that Apache Flink would not have this data? In this case, how is Flink going to know which tweet is associated with which follower to send to the feed cache?

@jordanhasnolife5163 Жыл бұрын

Hey Stephen! As long as the message is being sent to the right partition, the data should be there! Flink can replay the messages from an incoming log based message brokers in order to reassemble a dataset and make sure that things are up to date :)

@arovitnarula4113 2 жыл бұрын

Good video. What if in-mem broker crashes ? The tweet never makes to PostDB, but user think tweet was successful ?

@jordanhasnolife5163 2 жыл бұрын

In memory queue is still replicated, so you shouldn't have to worry about that

@ShivangiSingh-wc3gk Ай бұрын

Do you have any place where you have documented this, I am getting confused on how you suggested we could handle the verified user with millions of followers. What I understood 1) Move work to read - where for the followers get posts from their cache and do a get for some ??

@jordanhasnolife5163 Ай бұрын

What do you mean document? You can also watch the newer version of this video in my 2.0 series, perhaps the iPad might be easier to understand.

@julianosanm 8 ай бұрын

Great video dude. Question, what about sharding/partition the posts DB by time of creation? That seems to be a common property of both user feeds and user profiles (reading one single users posts). We usually want to see more recent posts first in both scenarios, and navigate reverse-chronologically. What do you think?

@jordanhasnolife5163 8 ай бұрын

I think based on what you're saying here the move is to shard by userId and then locally index on timestamp, which I'd agree with

@prabhatsharma284 2 жыл бұрын

Btw in this whole design I didn’t get the flink consumer part. Seems like a very specific technology being used here instead of a general conceptual component satisfying a particular technical requirement

@jordanhasnolife5163 2 жыл бұрын

Yes there are multiple stream consumer technologies, as long as it supports holding state that should work - it doesn't have to be Flink you're right I just chose that one out of popularity

@fartzy 2 жыл бұрын

@@jordanhasnolife5163 Which other ones would work? I kow I could google it but I just interested to hear your thought

@jordanhasnolife5163 2 жыл бұрын

@@fartzy spark streaming (although it runs in microbatches), samza, storm

@ninad11081991 8 ай бұрын

How do we deal with causality in terms of our follow and unfollow operations if we're using Cassandra?

@jordanhasnolife5163 8 ай бұрын

I'd say probably best efforts here, agreed that to be perfect we'd need single leader replication

@maxvettel7337 11 ай бұрын

This channel is underrated

@jordanhasnolife5163 11 ай бұрын

I dunno ppl have been a little too nice to me in the comments recently it might be overrated now, I'm mid

@maxvettel7337 11 ай бұрын

@@jordanhasnolife5163 I'm preparing for an interview and I have combed all youtube for system design videos. I can say that your explanation is short and at the same time is meaningful. It's much better than 90% of the others.

@rishirajsingh688 7 ай бұрын

Hi @jordanhasnolife5163 , thanks for offering a design solution that looks 'real' and follows sound logic behind each decision. I have a question though on follower followee relation table. Wouldn't it be more simple to keep the user and follower-followee tables in a sql. If we have indexing on top of both the columns (follower and followee) and a cache in front of the sql - would that not simply the design for update & query. Or do you see I/O scale problem if the Cache goes down?

@jordanhasnolife5163 7 ай бұрын

Yeah I think I agree with you here actually (and in my remake of this one I changed this), however do note that follower and followee tables would probably have to be on different nodes and you'd likely have to derive one from another if you want to shard in the best way possible

@Piyush-ky9ee 2 жыл бұрын

Great video! I am still not quite convinced why you chose Cassandra over something like a sharded Mysql DB which has better read performance than cassandra because of B-Tree based indexing. Since, the system is going to be very read heavy Isn't B-Tree based DB a better choice ? Or are you saying that we're gonna cache almost every thing and DB choice is relatively insignificant? Could you please elaborate more on that ? Thank you

@jordanhasnolife5163 2 жыл бұрын

I think that in theory you're right MySQL would be better for read performance, however my logic here was to maximize the write throughout because I figured that would allow the DB to ingest document changes more quickly. I could see making the argument for either though, and being able to argue for your choice is exactly what these interviews are about! Great point!

@Piyush-ky9ee 2 жыл бұрын

@@jordanhasnolife5163 I see, Appreciate your reply. Other question I have is why you don't have an arrow from the "Feed Service" to the "Posts Cassandra DB" ? Assuming not all of the feed is cached and if I scroll down to see older posts on my feed, how would the feed service get it ?

@jordanhasnolife5163 2 жыл бұрын

@@Piyush-ky9ee Actually you know what I thought you were commenting on the Google docs video lmao MySQL totally makes sense my bad sorry about that

@jordanhasnolife5163 2 жыл бұрын

@@Piyush-ky9ee As for Q2 most servers will hit the cache first and if the cache doesn't have the item it will load it from the DB, using LRU as the eviction policy, and then return the result

@Piyush-ky9ee 2 жыл бұрын

@@jordanhasnolife5163 I see. I forgot about the write through cache. Thanks

@bet_more_pls Жыл бұрын

Another clarification about the purpose of the flink DB to hold follower / followee relations - it seems like this is acting like a cache to hold the follow relationships and the only reason why we even have a cassandra table for follow relationships is for durability. Is this the case? If so why not use a distributed cache (redis kv-store) instead of something like flink? Is it because flink supports "set operations (e.g. is this user_id in my set of followers)" vs. redis?

@jordanhasnolife5163 Жыл бұрын

The reason we're using flink is because we can store a copy of the follower relationships on it. You are correct that this is effectively like a cache. It should be faster than a cache however, because it is literally located on the nodes doing the computation.

@pl5778 Жыл бұрын

Hey Jordan - Great Video and thanks for explaining this. A question I have with the feed cache, is it going to be cached for all the DAU with all of their follower's posts? And if so, that is what Flink is updating? If the feed cached is sharded by userId, how will flink push the new tweet of all a userId's followers?

@jordanhasnolife5163 Жыл бұрын

Yep you're correct - the point is we will have to pus the tweet to many places. For tweets that from users with millions of followers, a hybrid approach where we fetch the tweet from a posts database will serve us better.

@squeakyymouse Жыл бұрын

Hey Jordan, Thanks for the video! I actually had a question about your architecture diagram. I thought that for the verified users case, we'd use a hybrid approach using SQL and noSQL as our solution, where we would do a join and query the SQL DB to obtain the Verified User's tweet so we don't have to fan out the Verified User's tweet to everyone's newsfeed cache. However, in the architecture diagram you drew, I didn't understand how obtaining the Verified User's tweet was addressed. Would you just do a simple query by User ID on the Cassandra Post table to get a Verified User's tweet? Hopefully that makes sense!

@jordanhasnolife5163 Жыл бұрын

Yep basically! Just querying the posts table of the verified users you follow should do it. You may be right in the sense that you want to store those relationships in a relational db.

@rajrsa Жыл бұрын

Thanks for this! In the diagram, it shows that the client is directly hitting the cdn instead of even going to the LB. Does it actually work like that? I think the service request would give the s3 link from the db or something and then it will be shown to the user? (Yes, I am a noob.)

@jordanhasnolife5163 Жыл бұрын

I believe that typically the CDN link would be in the HTML page when you load the video, so yeah at some point when you click to open the video you'll have to get that link first

@franklinyao7597 Жыл бұрын

can you explain the use of two different types of message queue? you said "The reason why we want a log-based is that we can replay the changes if it were to go down." and "Once we post a tweet, it goes to a in-momory message broker. The reason why we use it this time because we want faster processing and we don’t care aobut order of messages." Why you were'nt worried the "in-momory" message queue goes down? If that does down, you will lose posts. You want fast processing because posts are much more than user relations?

@franklinyao7597 Жыл бұрын

The only thing I can think of is that you need to keep user relationship data in Flink for a long time and only keep each post data for a short time. If the user relation data is lost in Flink, you need to replay these messages. But if a post is processed and then lost, you don't need to do anything because you only need to process one post one time. is my understanding correct?

@jordanhasnolife5163 Жыл бұрын

I don't really care if we lose a post while it's in the message queue, the user could see it's not delivered and just tweet again. I have a couple of videos explaining the difference of message queues, but the point is that log based ones are slower, but persist the messages in a log and maintain ordering within a single queue. In memory ones do the opposite

@jordanhasnolife5163 Жыл бұрын

User relationship data needs to be kept locally in the tweet consumers as it makes life a lot easier to have locally cached data so we don't need to poll a db Everytime to get that information

@NikhilKekan 2 жыл бұрын

you mentioned instead of doing heavy joins we have multiple cache and when a tweet arrives we write that tweet into all the caches(for normal user with not many follwers). are we talking about one cache per user here?

@jordanhasnolife5163 2 жыл бұрын

Yeah I guess it really depends on our data estimates. I think in reality, Twitter may actually does something like that, but if you have enough space to store many feeds on a single machine, you can just use consistent hashing to spread things equally.

@prabhatsharma284 2 жыл бұрын

Dude, I had the exact same thought of sharding by Tweet/PhotoId is just stupid while reading Grokking. In the case where we’ve to find all or say recent 100 posts of a user then it’ll be a cross shard query which will be highly expensive in terms of latency. My opinion is to shard on UserId as the partitioning key and make photoId as the range key. Problem: we’ll have hot shards due to few popular users Solution: Do a logical sharding which allows us to scale physical shards easily. For eg. initially have say 1000 logical shards with only 100 physical shards. UserId%1000 will give logical shard. Then logical shard%100 will give physical shard. This is a naive way of deciding shard but you can use consistent hashing for more even distribution. Once we see our physical shards starts to get overwhelmed then we can increase number of physical shards and then we just have to change the mapping between logical and physical shards Problem: We need to be able to sort tweet/photos by time Solution: Use snowflake kind Ids for photo/tweet and make it as a range key

@jordanhasnolife5163 2 жыл бұрын

Seems that we agree on that, I'll have to look into a snowflake style key though

@fischlump 2 жыл бұрын

I don't understand the flink consumer part either. do you keep the state of the user follows relation as flink state? why not just do a lookup on the cassandra followers table instead?

@jordanhasnolife5163 2 жыл бұрын

Yep - the whole point of using something like Flink is so that you can basically replicate the state of another table into it for stream joins. This heavily decreases the latency of a service because it reduces the amount of database calls that you have to make to an external table like Cassandra. You could definitely do it, it would just be slower and put a lot of load on the DB.