No video

Typeahead Suggestion Design Deep Dive with Google SWE! | Systems Design Interview Question 6

  Рет қаралды 6,245

Jordan has no life

Jordan has no life

Күн бұрын

Imma need me some typeahead real soon that's for sure
00:00 Introduction
01:15 Functional Requirements
02:15 Capacity Estimates
03:21 API Design
04:13 Database Schema
05:32 Architectural Overview

Пікірлер: 30
@senatusconsultumultimum7815
@senatusconsultumultimum7815 2 жыл бұрын
What a cool Data structure. I love the bottom up update, very clean.
@OKJazzBro
@OKJazzBro Жыл бұрын
At around 18:00 you state that the client can itself compute the hash of the string in order to prevent the linear time (linear in the length of the string) look-up from the server's local cache. I raise 2 problems with this: 1. It doesn't really make a difference where the hash is computed as far as E2E latency is concerned. After all, hashing is a prerequisite to a hash table look-up, so it needs to happen sequentially before the look-up. 2. With hash tables, hashing is just the first step. After a hash match you have to do an exact key match to rule out the possibility that it was a hash collision / false match. So no matter what, it will not be constant time look-up and will depend on the length of the string. Another thing, your approach of storing just references to leaf nodes to reconstruct actual full words instead of storing full words at each trie node is very interesting. But I'm curious, do you know if it's done this way in a real world application or is it something you came up with? Because, I think it could be costlier than we might think to walk up a trie like that because each node is a different hash table and it would amount to a ton of CPU cache misses. I'm inclined to thinking that simply storing full words at each trie node would be more congenial to the latency requirements (and this is how it's described in Alex Xu's book). With this approach (where we discard your "store trie leaf references" idea), the only way the trie would help us is by keeping the data structure more compact when compared to a full blown hash table which maps each possible prefix to the top K words. Sorry if these are aspects you discussed in the video. Since most of the video is talk without much illustration (not complaining or anything, this is free stuff after all :) ) , I end up having to replay bits and pieces of it, so it's plausible I missed the bigger picture.
@jordanhasnolife5163
@jordanhasnolife5163 Жыл бұрын
1) Fair point! Looking back on it I think I was just referring to reducing load on the cache but you're definitely correct here sometimes I don't even fully understand what I was saying. 2) Yeah but that's usually bound by a constant factor based on the load factor of the map. 3) This is the solution from Grokking actually. I haven't read Alex Xu's book, but I'm glad that you told me so that I can check it out! I don't think that you would even be using the CPU cache here because while you need to store hash tables to go down the trie, going up the trie just requires a single pointer to a parent node in memory. Agreed with your point that storing all the top suggestions directly is ideal, but I believe one of the challenges of the problems is that all of the terms are too much to store (at least in memory).
@OKJazzBro
@OKJazzBro Жыл бұрын
@@jordanhasnolife5163 The CPU cache comes into the picture when you dereference those pointers to get the characters, right? It's in the same sense that a linked list is known to perform worse as a FIFO queue than a circular array. The data are scattered all around memory (and not in contiguous blocks of memory), so the CPU cache is often missing what you want, so the look-up goes to memory.
@scbs2007
@scbs2007 Жыл бұрын
++ Can clarify requirements: touch upon bloom filters to forget about one-hit-wonders.
@jordanhasnolife5163
@jordanhasnolife5163 Жыл бұрын
Sounds good will do next time around
@ryandsouza9093
@ryandsouza9093 10 ай бұрын
Great video, love the way yout articulate your thoughts and your quick capacity estimates! Any reason why we are using REST? REST cannot stream data. We can use a WebSocket connection instead for bi-directional streaming minimizing latency even further than the overhead of establishing a new TCP connection for each character.
@jordanhasnolife5163
@jordanhasnolife5163 10 ай бұрын
I think that's reasonable! Though I guess it also does mean that you're maintaing a persistent connection to the server as long as you have a text field clicked, totally see your point though!
@De1n1ol
@De1n1ol Ай бұрын
you don't establish a new TCP connection for each new character because of the keep-alive header, which included by default since HTTP 1.1, that keeps TCP connection open. Websockets are redundant here
@yuwang3736
@yuwang3736 17 күн бұрын
Kinda confused about the cache part and the HDFS, how often do you expect to update the cache? Is it the only way to change the HDFS by adding query?
@jordanhasnolife5163
@jordanhasnolife5163 16 күн бұрын
The cache is updated like once per day. I don't know what you mean "change the HDFS", but I'm assuming you mean add data to it, in which case yes a new query to our service will go through our stream processing solution and eventually be sunk to HDFS.
@yuwang3736
@yuwang3736 16 күн бұрын
@@jordanhasnolife5163 Cool, thanks. That is exactly the point. I thought some new queries could come from the input of the users.
@SureshBabu-wb1dg
@SureshBabu-wb1dg Жыл бұрын
Why do we need to compute the hash of the query string while we are using range-based partitioning(dynamic) for trie? Could you explain a lit bit more? Is it only required for caching layer?
@jordanhasnolife5163
@jordanhasnolife5163 Жыл бұрын
Yep, exactly - just for caching the partitioning is probably best kept by range. We can cache based on a hashed string to avoid hot caches if need be.
@raj_kundalia
@raj_kundalia 7 ай бұрын
thank you!
@cambriandot3665
@cambriandot3665 Жыл бұрын
14:56 Architecture
@neek6327
@neek6327 2 жыл бұрын
Good stuff, man. Wanted to get your thoughts on an idea I had. Since the amount of data is so small, do you think it would be advantageous to store the tries (or the prefix->suggestions map) directly on the fetch servers to avoid a network call to Redis? I think we could apply the same ideas wrt to replication/sharding/load balancing to the tries stored on the fetch servers.
@jordanhasnolife5163
@jordanhasnolife5163 2 жыл бұрын
Seems pretty reasonable to me!
@blumaukko1790
@blumaukko1790 Жыл бұрын
How do you store a trie on Redis? Afaik that data structure is not supported
@jordanhasnolife5163
@jordanhasnolife5163 Жыл бұрын
Tries are implemented with nested hashmaps which I believe you can have on Redis
@nomnom8692
@nomnom8692 6 ай бұрын
@@jordanhasnolife5163 I know this video is old, but redis you also can't store nested hash, you can if you store it as a serialised string, but then your application will have to deserialise. I guess this doesn't change much the design, but instead of fetching from redis, we could just store in the (application) server itself, and every some time the server can re-fetch the redis to update the trie (re-starting the serialization/deserialization).
@samatzhussipov1139
@samatzhussipov1139 Жыл бұрын
Good job! Can you release it in practice ?)
@jordanhasnolife5163
@jordanhasnolife5163 Жыл бұрын
Sorry can you elaborate on that?
@ryan-bo2xi
@ryan-bo2xi 9 ай бұрын
@@jordanhasnolife5163 lol 😂😂
@allaboutsystemdesign
@allaboutsystemdesign Жыл бұрын
Why Hadoop File System for add query service?
@jordanhasnolife5163
@jordanhasnolife5163 Жыл бұрын
Just so you can periodically run batch jobs on the data held there
@yuwang3736
@yuwang3736 17 күн бұрын
By adding query, do you mean adding a term into the trie?
@erythsea
@erythsea Жыл бұрын
Stop letting the ladies sit on your face 🙄
@jordanhasnolife5163
@jordanhasnolife5163 Жыл бұрын
I prefer when the dudes do it tbh
Я не голоден
01:00
К-Media
Рет қаралды 10 МЛН
Каха заблудился в горах
00:57
К-Media
Рет қаралды 11 МЛН
Советы на всё лето 4 @postworkllc
00:23
История одного вокалиста
Рет қаралды 5 МЛН
SPILLED CHOCKY MILK PRANK ON BROTHER 😂 #shorts
00:12
Savage Vlogs
Рет қаралды 42 МЛН
Я не голоден
01:00
К-Media
Рет қаралды 10 МЛН