No video

Advanced SQL Questions From Amazon (Handling complex logic in data science interviews)

  Рет қаралды 30,674

StrataScratch

StrataScratch

Күн бұрын

Пікірлер: 94
@vibhutikathuria7857
@vibhutikathuria7857 2 жыл бұрын
Hi Nate, Don't know if you're gonna see this. But I religiously followed your youtube channel and even took a subscription at StrataScratch for an amazon interview for the role of data engineer. I can't believe that I cracked it, just got the mail that I got selected!!! Thank you so much. Everything you taught was put to right use and came in handy during my interviews! THANK YOU SO MUCH OMG ILY
@stratascratch
@stratascratch 2 жыл бұрын
Congrats on your new role! Very happy for you and I'm glad you found the videos useful. Thank you so much for the kind words. Good luck on the start of your new job!
@priyankalad7789
@priyankalad7789 3 жыл бұрын
that was a great explanation for a tough problem, Really could not think of all scenarios. I agree there isn't much good quality content of these series out there. Thanks for doing these
@stratascratch
@stratascratch 3 жыл бұрын
Thanks so much Priyanka! I'm glad I can add to the technical content out there on KZfaq.
@yashvirriar8833
@yashvirriar8833 3 жыл бұрын
Dude, seriously helpful stuff here to walk through your thought process. Very hard to translate practical experience into words, but it was extremely beneficial into breaking this down into digestible parts!
@stratascratch
@stratascratch 3 жыл бұрын
Thanks so much for watching and for the kind words! Hope you stick around. Will be creating more and more videos!
@LeviewFPV
@LeviewFPV 3 жыл бұрын
Man, thanks for these videos. I'm in an interview process and always get nervous or mix up in tech interviews. These videos are really helpful to structure the questions and the answers! Already subbed!
@stratascratch
@stratascratch 3 жыл бұрын
Thanks for watching and I'm glad you found this channel! I'll keep posting more videos for sure. Let me know if you have any topics you'd like me to cover. I always try to make videos that my audience wants.
@LeviewFPV
@LeviewFPV 3 жыл бұрын
@@stratascratch keep going with your current content IMO. As you said in another video, there isn't much of this on YT! (or good quality at least)
@stratascratch
@stratascratch 3 жыл бұрын
@@LeviewFPV Then I'll keep doing what I do! Thanks for watching and for your input.
@rodrigozuniga9
@rodrigozuniga9 2 жыл бұрын
I find your videos very useful but in this case I think your solution is a little more complicated than needed. You could select the user if and their min purchase date and create a table with just what they bought the first time. You can then join on the min date and eliminate all the records of the first purchase date. You then do the same join but using product id. You are then left with a table that exclude for each user their first date purchase and any latter purchase that includes any of the products purchased the first date. The count of distinct user is gives you the result of how many users were reached by your campaign.
@Manoj-nn8gd
@Manoj-nn8gd 19 күн бұрын
Plz make more videos with advanced SQL querys
@sauravkumar9454
@sauravkumar9454 3 жыл бұрын
These kinda questions will only improve the ability to solve them. Please keep posting advance SQL questions
@stratascratch
@stratascratch 3 жыл бұрын
Thanks for watching! Will keep doing these videos!
@Qornv
@Qornv 3 жыл бұрын
Very clear and nice explanation. I'm watching these out of pure curiosity to discover new ways I can approach problems. If problems/explanations on the platform are just as real-world oriented it might be worth to sign up 😏
@stratascratch
@stratascratch 3 жыл бұрын
Thanks for watching! Yes, those questions on the platform are real interview questions from companies. Many of them are straight out of interviews, while others are re-structured but still test for the same concepts. There are a few difficult ones I put on the platform to show you what a few real world questions would look like on the job. Hope they are helpful! There's 50 free questions on the platform so just do those until you want more=)
@akashchandra2223
@akashchandra2223 3 жыл бұрын
@@stratascratch are those 50 free ones important too?
@stratascratch
@stratascratch 3 жыл бұрын
@@akashchandra2223 Yes, they are. They are/were interview questions from companies.
@Juan-Hdez
@Juan-Hdez 2 жыл бұрын
Very useful. Thank you!
@stratascratch
@stratascratch 2 жыл бұрын
You're welcome. I am glad it helped you.
@jameshizon4861
@jameshizon4861 3 жыл бұрын
This one is pretty good. I needed to improve on understanding the logic of solving difficult SQL questions. Will have to practice more. Recap: 1. Identify all scenarios and edge cases. 2. Individually map out solution for each scenario. 3. Keep the logic separate from each other.
@amadysvlog3449
@amadysvlog3449 2 жыл бұрын
I love how you break the question down and explained it . Thank so for this video and for sharing the link to that question.
@stratascratch
@stratascratch Жыл бұрын
Glad it was helpful!
@vsmanyamvadali7344
@vsmanyamvadali7344 Жыл бұрын
excellent way of explanation👍👌
@stratascratch
@stratascratch Жыл бұрын
Thank you!
@RevenueRocketeers
@RevenueRocketeers 2 жыл бұрын
Thank you so much for this content . Can you start similar series for python interview for data science.
@saikatdasgupta2006
@saikatdasgupta2006 3 жыл бұрын
Easier solution select count(distinct m.user_id) from marketing_campaign m, (select user_id, min(created_at) from marketing_campaign group by user_id) a where a.user_id = m.user_id and m.product_id not in ( select product_id from marketing_campaign where created_at=a.min and user_id=a.user_id )
@stratascratch
@stratascratch 3 жыл бұрын
Much more efficient. Thanks for watching the video.
@tarekelias463
@tarekelias463 3 жыл бұрын
Still trying to understand it but looks really good
@laurak5196
@laurak5196 2 жыл бұрын
I think the reason this is confusing for me to read is that there’s a subquery in the from clause along with a table…..but I’m glad you shared this. It’s always helpful to see the different ways to solve the same problem, keeping in mind efficiency too. I’d also like to know how long it took people like you and Nate to be this good at SQL lol. I’m still new but sometimes it’s frustrating when I don’t learn everything as fast as I’d like to.
@kaushikshankar6051
@kaushikshankar6051 2 жыл бұрын
I have written a similar code but im getting 1 extra user id (24 instead of 23). Could anyone tell me what is wrong. select count(distinct(a.user_id)) from marketing_campaign a inner join (select user_id,created_at,b.product_id from marketing_campaign b group by 1,2,3 )b on a.user_id = b.user_id and a.created_at < b.created_at and a.product_id b.product_id ;
@muzafferckay2609
@muzafferckay2609 2 жыл бұрын
Since you can see multiple solutions for the same problem this is one of my favorite platforms to practice SQL skills.
@stratascratch
@stratascratch 2 жыл бұрын
Thanks for the kind words! It's one of the features everyone loves.
@datalearningsihan
@datalearningsihan 2 жыл бұрын
I will always miss out on many products many days but same as the first purchase scenario. No matter how much i revise this same question and answer.
@majubodas6206
@majubodas6206 Жыл бұрын
Thanks a lot Nate!
@heyMrSIDD
@heyMrSIDD 2 жыл бұрын
this is very helpful thanks
@stratascratch
@stratascratch 2 жыл бұрын
You're welcome. Visit our channel for more helpful videos.
@459B
@459B 3 жыл бұрын
A difficult but inspiring question
@stratascratch
@stratascratch 3 жыл бұрын
Probably one of the more difficult questions you'd see out there. The last part in handling one of the edge cases was definitely difficult so I hope I explained it right.
@debjyotiroy842
@debjyotiroy842 2 жыл бұрын
Man this channel should be given a Nobel prize
@stratascratch
@stratascratch 2 жыл бұрын
=)
@rick2591
@rick2591 2 жыл бұрын
what about using CTEs... with fst as ( select user_id, min(created_at) firstDate from marketing_campaign group by user_id), firstOrders as ( select * from marketing_campaign a where exists (select 1 from fst b where b.user_id=a.user_id and b.firstDate=a.created_at)), newOrders as ( select * from marketing_campaign a where not exists (select 1 from firstOrders b where b.user_id=a.user_id and (b.created_at=a.created_at or b.product_id=a.product_id))) select count(distinct user_id) from newOrders
@stratascratch
@stratascratch 2 жыл бұрын
put it in the platform and see if it validates! =)
@caiyu538
@caiyu538 2 жыл бұрын
Thank you for explanation, this question is quite difficult.
@winnumber101
@winnumber101 3 жыл бұрын
The actual process for uncovering this is so elegant though-the initial word problem made me discouraged tbh
@stratascratch
@stratascratch 3 жыл бұрын
It's definitely not great phrasing on the problem. I may have reworded it a bit on the platform but it's probably still pretty difficult to really understand. This was a last rounder, in-person interview question that's meant to be comprehensive so I tried my best to word the problem as concisely as possible. Thanks for the feedback and thanks for watching!
@yujiefu2251
@yujiefu2251 2 жыл бұрын
I have learned a lot from you. I am trying to use your framework to in my following interviews.
@stratascratch
@stratascratch 2 жыл бұрын
Thanks great. Glad you watched the videos!
@devin12428
@devin12428 Жыл бұрын
the last scenario says multiple products on multiple days, shouldn't 46 not be in included in the marketing campaign as well ?
@himanish2006
@himanish2006 2 жыл бұрын
After getting a job will stratscratch help me to become expert in database or this is just for interviews ?
@shipragupta4382
@shipragupta4382 3 жыл бұрын
Amazing Nate. Thanks for this
@stratascratch
@stratascratch 3 жыл бұрын
Thank you for watching the video!
@DeepanshuKalra
@DeepanshuKalra 3 жыл бұрын
Hi Nate, love your channel, and I have binge-watching your videos for the last 3 days. More power to you, for helping all of us. I came up with this solution using a self join: select COUNT(DISTINCT md1.user_id) from marketing_campaign md1 --day one JOIN marketing_campaign md2 --day two ON md1.user_id=md2.user_id --same user purchasing again AND md1.created_at::date
@stratascratch
@stratascratch 3 жыл бұрын
That's a nice solution. You're missing an edge case where a user cannot purchase the same items as the items in their first purchase. You have user 25 in your output but if you look at the raw data, user 25 purchase both item id 114, 115 during their 1st purchase. And then purchased the same items in the future. So 114 and 115 should not count and thus user 25 should not be counted. My code: concat((user_id),'_', (product_id)) not in (SELECT user_product FROM (SELECT *, rank() over(PARTITION BY user_id ORDER BY created_at) AS rn, concat((user_id),'_', (product_id)) AS user_product FROM marketing_campaign Handles this exact edge case.
@xl2274
@xl2274 2 жыл бұрын
Great video! But, I'm wondering whether there is a candidate who can finish this problem within 30 min during a Amazon tech interview.
@stratascratch
@stratascratch 2 жыл бұрын
This is a question that is likely for a white boarding session that would take place on an on-site interview. I believe you could finish the problem in 30-min on a white board since it's more collaborative with the interviewer than coding on a text editor via Zoom. =)
@laurak5196
@laurak5196 2 жыл бұрын
Nate this is a really helpful video, as are all of these where you work through a real problem. As someone wanting to become a data analyst, I’m wondering how in-depth my SQL interview questions are likely to be, vs those for a data science position. Any idea?
@stratascratch
@stratascratch 2 жыл бұрын
For SQL, there is no difference between a DA and DS. The DS position differs because you need to know much more modeling and statistics.
@laurak5196
@laurak5196 2 жыл бұрын
@@stratascratch thanks Nate!
@gill4u
@gill4u 3 жыл бұрын
With userFirstPurchase(user_id, first_purchase_dt,created_at, product_id) as (select user_id, min(created_at) over (partition by user_id), created_at, product_id from marketing_campaign ) select count(distinct user_id) from userFirstPurchase where created_at > first_purchase_dt and concat(user_id,product_id) not in (select concat(user_id,product_id) from userFirstPurchase where first_purchase_dt=created_at )
@Datapassenger_prashant
@Datapassenger_prashant Ай бұрын
I still don't understand,why 46 is in the marketing campaign.. if on next date he bought the same product?
@aakashbansal314
@aakashbansal314 3 жыл бұрын
QQ Nate: If we remove 1st subsql, then also code should work. What is the purpose to select user_id from the subsql? We should just find those combination of concat(userid,productid) that are bought on 1st day, and just filter out from main table. please suggest.
@stratascratch
@stratascratch 3 жыл бұрын
I believe you can do it that way. My code isn't the most efficient. A lot of that has to do with how I would explain the code to an interviewer. It's sometimes easier to write less efficient code but have the code follow logic step by step, especially if you're trying to walk someone through your solution. The last part of the interviews are usually talking about how to optimize the code, so that's where your version would work.
@aakashbansal314
@aakashbansal314 3 жыл бұрын
@@stratascratch thank you for your advise.
@followmycrafts8811
@followmycrafts8811 3 жыл бұрын
Thanks a lot Sir
@stratascratch
@stratascratch 3 жыл бұрын
Thanks for watching!
@shekharbagwe
@shekharbagwe 2 жыл бұрын
How about this? select count(distinct user_id) from marketing_campaign where concat(user_id,'_',product_id) not in( select usr_prd from (select concat(user_id,'_',product_id) usr_prd, created_at, rank() over(partition by user_id order by created_at asc) rnk from marketing_campaign ) up where rnk = 1 ); subquery creates a list of user_id+product_id for all first transactions. I am using rank window function to identify all first transactions. After that using "not in" clause to select all other combinations.
@stratascratch
@stratascratch 2 жыл бұрын
Looks like it would work! Test it out on the platform!
@shekharbagwe
@shekharbagwe 2 жыл бұрын
@@stratascratch yeah i tested this, works fine :)
@Sarthak631
@Sarthak631 2 жыл бұрын
Will it work if instead of concatenating the user_id and product_id column, we just take product_id from the subquery and do product_id NOT IN subquery?
@stratascratch
@stratascratch 2 жыл бұрын
That sounds like a good approach. You can code it up on the platform and see if it produces the same output.
@Sarthak631
@Sarthak631 2 жыл бұрын
@@stratascratch Yes, I will try it and share the results with you in the comments.
@Sarthak631
@Sarthak631 2 жыл бұрын
@@stratascratch I got your point about why we need to use CONCAT. If we don't create a new column, we will lose all the required rows as wel because they have the same ID's as well. Very nice question and an even better explanation.
@yashsinha16
@yashsinha16 2 жыл бұрын
select count(distinct mc1.user_id) from marketing_campaign mc1 join marketing_campaign mc2 on mc1.user_id=mc2.user_id and mc2.created_at>=mc1.created_at+interval '1 day' and mc2.product_id!=mc1.product_id i am getting 24 and not 23? can anyone tell
@nargisparvin4267
@nargisparvin4267 3 жыл бұрын
Thank You !!!
@stratascratch
@stratascratch 3 жыл бұрын
Thanks for watching. Please let me know if you have any topics you'd like me to cover!
@tahiliani22
@tahiliani22 3 жыл бұрын
@@stratascratch advanced SQL. basically, continue with these videos, please.
@AmanSingh-od2ue
@AmanSingh-od2ue 3 жыл бұрын
Hey Nate, I don't understand why do we have to remove all the users with rank 1 in the second part of the query. In that, not all users buy the same product on a different date!
@stratascratch
@stratascratch 3 жыл бұрын
I'm actually only removing user-product combo for the user's 1st purchase because I don't care about their 1st transaction. If the user purchased the same items on a different date, then those transactions are also removed, which is what we want to do according to the question. What is kept are the products that are different than those the user purchased in a future transactions. This is definitely a difficult question to keep all cases in your head. What I'd do is follow a few of the users that I mentioned in the video. Head over to the platform and for each logic in the query, output the user and products purchased. Then see how specific rules filter out users. Hope that helps!
@AmanSingh-od2ue
@AmanSingh-od2ue 3 жыл бұрын
@@stratascratch Yeah, that helps now. I got confused for a while but I got it now. Thanks!!
@athraaproductions
@athraaproductions 3 жыл бұрын
Do you need video editor? I'll do it for you
@stratascratch
@stratascratch 3 жыл бұрын
I'm okay for now. Thanks man!
@BummerSlug
@BummerSlug 2 жыл бұрын
User 50 bought product 118 x4 at a price of 35 Then next day product 118 x4 at a price of 29 Then three weeks later product 118 x5 at a price of 299 Then next day product 118 x2 at a price of 199 Must have been GameStop stocks.
@goingfurther8092
@goingfurther8092 2 жыл бұрын
My answer; SELECT COUNT(DISTINCT user_id) FROM( SELECT user_id ,created_at ,product_id ,last_order ,DATEDIFF(created_at,last_order) AS date_diff FROM( SELECT user_id ,created_at ,product_id ,LAG(created_at) OVER(PARTITION BY user_id ORDER BY created_at) last_order FROM( SELECT user_id ,created_at ,product_id ,ROW_NUMBER()OVER(PARTITION BY user_id,product_id ORDER BY created_at) AS row_num FROM marketing_campaign)a WHERE row_num = 1)b)c WHERE date_diff > 0
@usmanyousaf841
@usmanyousaf841 3 жыл бұрын
Hi Nate! Hope you are doing fine. Please check your email.
@imvikraman
@imvikraman 26 күн бұрын
WITH CTE AS ( SELECT user_id, product_id, created_at, RANK() OVER (PARTITION BY user_id ORDER BY created_at) AS rn FROM marketing_campaign ), firstDayPurchases AS ( SELECT * FROM CTE WHERE rn = 1 ), subsequentPurchases AS ( SELECT * FROM CTE WHERE rn > 1 ) SELECT COUNT(DISTINCT user_id) FROM subsequentPurchases WHERE (user_id, product_id) NOT IN (SELECT user_id, product_id FROM firstDayPurchases)
@plekkchand
@plekkchand 2 жыл бұрын
Hello. Not "hey guys".
@stratascratch
@stratascratch 2 жыл бұрын
Hello
@vijaycvcU
@vijaycvcU 3 жыл бұрын
Exists - ensures that the current row is not the first transaction for the user Not exists - ensure that the current product in the row is bought previously Select count(distinct user_id) from Market_Campaign MC Where exists (select 1 from Market_Campaign MCI where MCI.user_id = MC.user_id and MC.created_at > MCI.created_at) and not exists (select 1 from Market_Campaign MCI where MC.user_id = MC.user_id and MC.created_at > MCI.created_at and MC.product_id = MCI.product_id)
@prasadmundewadi
@prasadmundewadi 2 жыл бұрын
Nice explanation. But while testing it seems that it is missing one edge case. A user bought same product on Day 1 and Day 2 but he bought different products on Day3 and Day4. To cover all edge cases you can use below query: Here is dataset (check user_id 6): User_id Prod_id Created_at 1 11 1/1/2022 1 12 1/2/2022 2 13 4/1/2000 2 13 2/1/2022 3 14 3/1/2022 4 15 3/15/2022 4 16 3/15/2022 5 17 5/15/2022 5 18 5/16/2022 5 18 5/17/2022 SELECT COUNT (DISTINCT USER_ID) FROM USER3 WHERE USER_ID IN ( SELECT USER_ID FROM USER3 GROUP BY USER_ID HAVING COUNT (DISTINCT CREATED_AT) > 1 AND COUNT (DISTINCT PRODUCT_ID) > 1) AND USER_ID NOT IN ( SELECT USER_ID FROM (SELECT PRODUCT_ID, USER_ID, RANK () OVER (PARTITION BY USER_ID ORDER BY CREATED_AT) RN FROM USER3) A WHERE RN IN (1, 2) --rank 1 and 2 are the 1st and 2nd orders which should be different GROUP BY USER_ID HAVING COUNT (DISTINCT PRODUCT_ID) = 1 )
@techiewithcamera
@techiewithcamera Жыл бұрын
My Solution: with first_user_purchase as (select * from (select user_id, created_at, string_agg(distinct product_id::varchar, ';#') first_product, dense_rank() over (partition by user_id order by created_at::date) rank from marketing_campaign group by 1, 2)x where rank=1) select * from (select t1.user_id, t1.created_at, t1.product_id, t2.first_product from marketing_campaign t1 join first_user_purchase t2 on t1.user_id=t2.user_id and t1.created_att2.created_at)x where first_product not like '%'|| product_id ||'%';
Blue Food VS Red Food Emoji Mukbang
00:33
MOOMOO STUDIO [무무 스튜디오]
Рет қаралды 19 МЛН
ISSEI & yellow girl 💛
00:33
ISSEI / いっせい
Рет қаралды 25 МЛН
Violet Beauregarde Doll🫐
00:58
PIRANKA
Рет қаралды 34 МЛН
SQL Case Statements For Data Science Interviews in 2021
14:44
StrataScratch
Рет қаралды 45 М.
Common Date Manipulations on Data Science SQL Interviews
15:28
StrataScratch
Рет қаралды 24 М.
Advanced Facebook Data Science SQL interview question [RANK()]
24:27
Three Tricky Analytics Interview Questions with Andrew
25:03
Jay Feng
Рет қаралды 80 М.
Blue Food VS Red Food Emoji Mukbang
00:33
MOOMOO STUDIO [무무 스튜디오]
Рет қаралды 19 МЛН