System Design for Recommendations and Search // Eugene Yan // MLOps Meetup #78

Рет қаралды 63,823

Күн бұрын

Join us at our first in-person conference on June 25 all about AI Quality: www.aiqualityconference.com/
MLOps Community Meetup #78! Last Wednesday we talked to Eugene Yan, an Applied Scientist at Amazon.
//Abstract
How does system design for industrial recommendations and search look like? In this talk, Eugene Yan shares how its often split into:
- Latency-constrained online vs. less-demanding offline environments, and
- Fast but coarse candidate retrieval vs. slower but more precise ranking
We'll also see examples of system design from companies such as Alibaba, Facebook, JD, DoorDash, LinkedIn, and maybe do a quick walk-through on how to implement a candidate retrieval MVP.
//Bio
Eugene Yan designs, builds, and operates machine learning systems that serve customers at scale. He's currently an Applied Scientist at Amazon. Previously, he led the data science teams at Lazada (acquired by Alibaba) and uCare.ai. He writes & speaks about data science, data/ML systems, and career growth at eugeneyan.com and tweets at @eugeneyan.
// Relevant links
eugeneyan.com
applyingml.com
www.oreilly.com/library/view/...
-------------- ✌️Connect With Us ✌️ ------------
Join our slack community: go.mlops.community/slack
Follow us on Twitter: @mlopscommunity
Sign up for the next meetup: go.mlops.community/register
Catch all episodes, Feature Store, Machine Learning Monitoring and Blogs: mlops.community/
Connect with Demetrios on LinkedIn: / dpbrinkm
Connect with Eugene on / eugeneyan
Timestamps:
[00:10] System Design for Recommendations and Search
[01:37] Why: Batch vs. Real-time
[02:05] Batch
Recommender (key-value DB)
Recommendations refreshed periodically
[02:21] Real-time
Recommender (REST/gRPC)
Recommendations generated in real-time
[02:37] Batch benefits
Pre-computed
Decouple compute from serving
Lower operational load
[03:25] Real-time benefits
Responsive to time-sensitive context
Reduce cost on non-visiting users
[06:50] Focus on real-time aka on-demand
[07:00] Offline vs Online aspect
[07:11] Offline aspect
Host batch processes such as training, index/graph building
Load data into feature stores
[07:23] Online aspect
Uses artifacts from the offline environment to serve requests
Candidate retrieval and ranking
[07:40] Retrieval
Fast but coarse
Searches millions of items to get hundreds of candidates
Approx NN. Graphs, etc.
[08:05] Ranking
Slower but more precise
Ranks hundreds of candidates
Adds more features
Classification or learning to rank
[08:49] Online Retrieval
[09:37] Offline Ranking
[10:50] Online Retrieval
[11:15] Offline Retrieval
[12:25] How: Industry Examples
[12:45] Building item embeddings for candidate retrieval (Alibaba)
[15:31] Building a graph network for ranking (Alibaba)
[17:06] Building embeddings for retrieval in search (Facebook)
[19:10] Building graphs for query expansion and retrieval (DoorDash)
[22:32] Unnecessary real-time over-engineering
[25:05] Real-time timely decision
[26:27] How: Industry Examples (Retrieval)
[26:43] Collaborative Filtering
[30:32] Candidate Retrieval at KZfaq (via penultimate embedding)
[32:06] Candidate Retrieval at Instagram (via word2vec)
[33:53] How: Industry Examples (Ranking)
[33:56] Ranking at Google (via sigmoid)
[35:00] Ranking at KZfaq (via weighted logistic regression)
[35:31] Ranking at Alibab (via Transformer)
[36:16] How: Building an MVP
[36:22] Training: Self-supervised Representation Learning
[37:20] Ranking: Logistic Regression
[37:21] Retrieval: Approximate nearest neighbors
[38:40] Ranking: Logistic Regression
[39:00] Serving: Multiple instances + Load Balancer (or SageMaker)
[39:38] From two-stage to four-stage
[41:54] Further reading
[43:44] Applied ML page
[52:52] Keeping the habit
[55:26] Recommended books for machine learning

Пікірлер: 39

@MLOps 2 жыл бұрын

sorry for my audio quality I had the nice mic set up and was talking into it the whole time but zoom was set to receive audio input from my earpods....🤦‍♂️

@ankitbhatia6736 2 жыл бұрын

Great content, no distractions, to the point. Thanks a lot.

@fuzzywuzzy318 5 ай бұрын

this is a singaporean channel! nice to see singapore high quality youtube content!!!!!!!!!!!!

@leoxiaoyanqu 2 жыл бұрын

Very great talk, lots of great explanations and diagrams all-in-one! Thanks for sharing!

@ahsanshafiqchaudhry 2 жыл бұрын

Very interesting talk! I like how questions are answered based on evidence/use-case i.e. how real time recommendation is a bit of an overkill.

@Public_Daniel 2 жыл бұрын

Eugene is a legend, great interview!

@MLOps Жыл бұрын

yes he is!

@WangRuinju 2 жыл бұрын

Great talk! Thanks for sharing!

@danielhe539 2 жыл бұрын

Great details and examples, Eugene.

@Rbtamaki 5 ай бұрын

Really insightful. Thank you very much for putting the time and effort on the presentation. I really appreciated and learned from the video

@shilinwang1847 Жыл бұрын

IT WAS SO COOL AND INSIGHTFUL! MANY THANKS!

@RenZhang88 2 жыл бұрын

@31:39 On this. I think, there is the last linear layer project the data into the number of videos to do the softmax. The weights of that layer associated with each video is the vector for each video. Intuitively, if the user vector has large dot product with this video vector, it will have large logit for the softmax thus most probably a match.

@MLOps 2 ай бұрын

Join us at our first in-person conference on June 25 all about AI Quality: www.aiqualityconference.com/

@50sKid 6 ай бұрын

This was an amazing presentation and there's a reason it's your most popular video now. Thank you.

@maryamaghili1148 2 жыл бұрын

very interesting talk! thanks for sharing.

@gpprudhvi 2 жыл бұрын

Pretty clear and interesting!

@madhubagroy 2 жыл бұрын

This is gold!

@goelnikhils Жыл бұрын

Hi Eugene, Thanks for the great video. One question has been troubling me is that for recommendation engine why we can't simply use a GNN to generate user and item embeddings and then use a similarity method such as cosine or dot product to rank items vis a vis a classical two tower model. For all the user, item meta data and other user-item implicit interactions (click, purchase etc.) and other contextual ranking signals embeddings can be generated. These embeddings can be concatenated and then do a dot product with item to rank and serve online. Do you see any challenges in this. Pls advise on priority as I am preparing for an int. Thanks in advance.

@TheEmanrese 2 жыл бұрын

Great content!

@bharatsharma2907 2 жыл бұрын

Great! Thanks for sharing

@MLOps 2 жыл бұрын

Thanks for watching

@Fordance100 Жыл бұрын

Great overview.

@bowang1825 2 жыл бұрын

Great talk

@hby4pi Жыл бұрын

Great Content Man

@advaitdubhashi9825 Жыл бұрын

Great session !!

@apekshapriya1650 Жыл бұрын

Thanks for this wonderful talk! There is one point though which I would like to clear. At 14:50, when you talk about the request coming from a user, the user's browser history items is also seen to get the candidate sets. At that point of time, is the present item that a user is currently looking at is also being seen as the input?

@chineduezeofor2481 Жыл бұрын

Awesome interview

@TheSiddhaartha Жыл бұрын

Which type of databases can be used for storing vetted content and ranking done through Deep Learning? Any video/article which recommends databases?

@ApdullahYAYIK 2 ай бұрын

A minor correction: Skipgram already uses Negative Sampling @MLOps

@ApdullahYAYIK 2 ай бұрын

Sum of user scores for CFI2I and SWINGI2I should be at the nominator, please correct me if I am wrong.

@ray811030 Жыл бұрын

You put the candidate retrieval and ranking model in the same machine(For example, using SM) Under the SM, user_id -> invoke ANN(db) to get candidates(a bunch of item_ids) -> invoke FS with item_id and user_id to get features separately -> invoke ranking model -> return a bunch of items with score in the sorted manner descendingly. Everything should be done within 200 ms p99

@ray811030 Жыл бұрын

Also, how can we expose our candidate generation and ranking services via generic APIs, so other users can mix-and-match as required? We’ll want to consider these in the long-term roadmap. I'm wondering sh