Data Scientist answers 30 Data Science Interview questions

No video

Data Scientist answers 30 Data Science Interview questions

Рет қаралды 26,804

Күн бұрын

Let's look at some data science interview questions!
RESOURCES
[1] Simplilearn's 50 interview questions: www.simplilear...
[2] Approximate Nearest Neighbor (ANNOY) from Spotify: github.com/spo...
[3] What is a p-value? (‪@kozyrkov‬ ) • What is a p-value?
[4] Eigen Vectors and Eigen Values (‪@3blue1brown‬ ): • Eigenvectors and eigen...
[5] Model Calibration - Why logistic regression doesn't return probabilities: • Why Logistic Regressio...
JOIN US ON DISCORD: / discord
SPONSOR
Kite is a free AI-powered coding assistant that will help you code faster and smarter. The Kite plugin integrates with all the top editors and IDEs to give you smart completions and documentation while you’re typing. I've been using Kite. Love it!
Learn more: www.kite.com/g...

Пікірлер: 45

@nay_codes 2 жыл бұрын

I'm not one to write comments on KZfaq, but I have to say I really love your content. And an Interview Questions series would be awesome.

@CodeEmporium 2 жыл бұрын

Thanks a lot! Gonna be making more of these and hope you like the future ones too

@kachrooabhishek 2 жыл бұрын

blessed to like this video , Dude these are some serious scenarios which are not covered by the major channels . Bless you :)

@yk4993 2 жыл бұрын

Feedback mechanism simply refers to the fact that true labels are known and during training the model gets feedback about the error, hence correcting it via gradient descent. Sounds like a tautology as it is just related to the fact that the data is labeled.

@CodeEmporium 2 жыл бұрын

Oh interesting. Never would have really thought to mention that. But that's good to know. Thank you :)

@sirgodricenwardsaier9074 2 жыл бұрын

Aren't there unsupervised models that use gradient descent without the need for labeled data though? t-SNE and node2vec come to mind as examples of cases where SGD doesn't require labels. That said, this is niche enough that it probably doesn't matter for typical interviews.

@user-mn8th3ie1t 5 ай бұрын

A mistake that the majority of data scientists commit is stating that given that the outcome variable is a probability, [0-1], you should automatically use Logistic regression. That’s completely incorrect. Being a probability, [0-1], is just a necessary condition and not necessarily sufficient to be modelled using Logistic regression. There is an other factor that needs to be observed, being that the predictor variable should exhibit a “threshold effect”, hence the reason for the sigmoid shape in response to the change in the predictor values.

@user-mn8th3ie1t 5 ай бұрын

P-value is the probability that your null hypothesis is an extreme event. Let’s say that the p-value of observing the regression coefficient of a predictor (e.g. age as an independent variable to predict income) is 0.03. The latter means that you should have 97% confidence in what the data is telling about your age factor in explaining your expected income, hence you should confidently reject that the age’s regression coefficient is 0, no explanatory power.

@karandoshi115 2 ай бұрын

Thanks for this explanation

@sourajitsaha3845 2 жыл бұрын

Feedback mechanism in this context basically means that you get to compare (think of loss functions) your model's output on data with the provided labels in order to update the weights of your model (a.k.a. learning) in the supervised settings, whereas in unsupervised setting you can't do that given you don't have the labels to compare to and you update the weights of your model without explicitly comparing your model's output with labels.

@CodeEmporium 2 жыл бұрын

Yep! I guess to me, that sounds like a restatement of "has labels" and "doesn't have labels", just in a fancier tone.

@harshparikh5871 2 жыл бұрын

Yooo, imma use this to study for some upcoming interviews. This video really dumbed down some of this stuff for me a lot.

@newbie8051 Ай бұрын

Ah quick refresher Thansk

@zyladd6176 8 ай бұрын

very well made video that adds details onto standard answers for ds interviews Good analysis.

@NaManCoo 4 ай бұрын

very good quality video！

@tejareddy199 11 ай бұрын

Excellent work!

@paragjain2762 2 жыл бұрын

11Oct is too far buddy! I have an interview on Friday! Anyways, better late than never! Thanks for doing this.

@CodeEmporium 2 жыл бұрын

It's here now :)

@pearlmarysamuel4809 2 жыл бұрын

How was the interview?

@paragjain2762 2 жыл бұрын

@@pearlmarysamuel4809 it went well, moved to the next round. Thank you Ajay for the commentary in this video, it provided really useful insights.

@pearlmarysamuel4809 2 жыл бұрын

Congratulations. God bless.

@mapa5000 6 ай бұрын

Great video !

@fahnub 2 жыл бұрын

please do more, and also include case based problems if possible

@mallikarjunshettar7976 10 ай бұрын

bro why are you stressing yourself by simply reading solutions just share the link we will go through the answer. simply a waste of time and bakwas video

@RPiao 2 жыл бұрын

You rock! Dude. Thank you youtube RECOMMENDATION system. Are you using ANNOY, youtube?

@leotrisport 2 жыл бұрын

Maybe the most important thing to keep in mind in order to improve generalisation (avoid overfitting) might be first to check if the validation/and train are coming from the same probability distribution … I mean no amount of regularisation would sort this issue

@CodeEmporium 2 жыл бұрын

Yep. Very true

@clapdrix72 2 жыл бұрын

If you have a sufficiently large sample then random assignment (in non time series problems) will basically ensure they are from the same distribution. I would want to make sure my data sample was generated from one process (or at least sufficiently similar processes so that conditioning on features will reconcile the two).

@2mitable 2 жыл бұрын

make this kind of series

@psiddartha7115 2 ай бұрын

I am non engineer how to prepare

@clapdrix72 2 жыл бұрын

This is the second video in which the creator has emphasized model interpretability as a universal virtue so I have to call this out. While I agree it's nice to have and in cases of causal inference it's all that really matters, in 60-70% of the modeling done in DS we don't care about interpretability AT ALL provided a black box algorithm is statistically significantly better than the interpretable one in predicting or forecasting. Where is this coming from?

@CodeEmporium Жыл бұрын

Sorry. I am late. And you make a good point. My views on this have changed a little over time; so I agree with you more and more. :)

@rohitchan007 2 жыл бұрын

This was really helpful. Can you please make videos on reinforcement learning(MDPs, Model Free Learning, Monte Carlo tree) ?

@CodeEmporium 2 жыл бұрын

Reinforcement learning huh. I haven't used it too much as a data scientist, but I'll think about the kind of content ican create that's useful for everyone. Thanks for the suggestion!

@rohitchan007 2 жыл бұрын

@@CodeEmporiumThank you for that. It'll also help with my master's in AI course too😅

@hellstenlight9454 10 ай бұрын

Zero creativity, 100% copy paste.

@davidcho8877 2 жыл бұрын

This is the best interview question review video.

@CodeEmporium 2 жыл бұрын

Thank you for the kind words! More to come :)

@dr.mikeybee 2 жыл бұрын

Feedback may refer to loss.

@hardikvegad3508 2 жыл бұрын

Hey i have a question... When is an outlier consider as important? If we can't drop it than what Techniques we should use to deal with that outlier.... I hope I'll receive an answer bcz I was asked this in an interview

@CodeEmporium 2 жыл бұрын

Here is an application: Outliers can skew averages. One thing you could do is take the lower 99% of the groups you are comparing (but also be sure to report the outlier case). Typically, you aren't just dealing with numbers. Each number may represent a user. If so, you want to understand why the 1% behaves the way it does. In many situations, the reason these outliers exist is explainable. Note: this answer is purely from a data science standpoint. Not a hardcore stats standpoint. But hope this kind of helps