Fraud models in the energy sector
12 сағат бұрын
Gender Bias in Machine Learning
14 сағат бұрын
Leveraging GANs for Building Synthetic Data
MLOPS in Financial Services
16 сағат бұрын
Data quality for big datasets
21 сағат бұрын
AI, integrity and the academy
Simple and constrained LLM agents
How to increase recycling using data
Fighting Skin Disease with AI
@HKNAGPAL7 6 сағат бұрын
Have a credit risk modelling interview in 20 min.
@dipanjanpalchowdhury6012 11 күн бұрын
Very helpful, thank you.
@howardtaylor9114 22 күн бұрын
Great stuff! A timely and interesting subject, presented clearly - great tempo - and clear, practical advice.
@judithwhite1 29 күн бұрын
Thank you so much (drotsiherbalcure) for the difference you make in the lives of your patients! Your kindness, sincere caring, treatment and concern make everything better and you are a great encouragement to humanity. Keep saving lives Dr. thanks for curing my Herpes virus.
@shailendraawasthi8091 2 ай бұрын
Tha k you lot of insight and practical interpretation.
@souravbarua3991 3 ай бұрын
Nice presentation. Please use Python's faker library to produce fake data in real time.
@n.adityakrishnanneelakanta9083 3 ай бұрын
dataset please
@NugrohoBudianggoro 4 ай бұрын
bookmarking 23:08
@anveshikakamble3717 4 ай бұрын
Without the data, I am unable to see any estimands. For all the 3 estimands it shows no such variables found. How can I know what variables to adjust ?
@user-kr1no4eb7z 2 ай бұрын
Good challenge - you can try to create synthetic data (column names provided) based on your assumptions for distributions/rules and see what will happen ;)
@nallym82 5 ай бұрын
Very useful, thank you!
@sroy2138 5 ай бұрын
This is a highly informative and useful presentation. It is clear, concise, and to the point.
@DataScienceFestival 5 ай бұрын
Glad to hear it! 🎉
@eyemazed 6 ай бұрын
How would you get around this problem - you have 2 sets of results from 2 different search engines - for example, one is vector and the other one is full-text. However, it just so happens that the vector search results are super good but the full-text search results are really crappy for this particular query (Not always). Now you apply the Reciprocial Rank Fusion algorithm and it blends together crap and quality instead of keeping more quality and discarding more crap. Wish there was a way to address this problem other than Elastic's custom "script_score" which is basically a static function and assumes that the same scoring algorithm will be applied regardless of input (results)
@leelasaivoonna1728 Ай бұрын
Better to use a Reranker on top of it
@jimbocho660 6 ай бұрын
Excellent presentation. Thank you.
@chungrandy780 6 ай бұрын
這些是有關資料科學的時間戳記和摘要: 00:47 🧐 這個演示重點在於資料科學中的高級特徵工程,強調策略而不是代碼。 02:08 📈 在分析、大數據和人工智慧方面的投資回報率非常高,引發了對這一領域的興趣。 03:20 📉 據Gartner的炒作週期,目前人工智慧和資料科學領域正處於"幻滅的低谷",這表示存在挑戰。 05:39 📊 據報告,資料科學專案達到生產的失敗率高達87%,突顯了一個重大問題。 07:42 🧮 有效的特徵工程對於實現資料科學專案的價值和克服挑戰至關重要。 10:59 🌟 選擇和處理特徵通常比選擇算法更重要。 14:10 🎯 特徵工程的主要目標包括處理異常值、缺失值、縮放、降維、平滑化以及處理空值或零值。 16:13 🧩 在特徵工程中,簡化對於降低模型的複雜性和潛在故障點非常重要。 19:30 🔄 對於單個變數的不同轉換的自動測試可以簡化特徵工程過程。 21:10 🤖 考慮將每個單獨的變數視為潛在模型,探索不同的特徵工程轉換。 22:29 🎯 在特徵工程中,可以使用目標均值編碼來處理分類變數,用目標變數均值替代類別。 23:09 📊 決策樹可以幫助確定連續變數的最佳分組邊界,改善分組過程。 24:47 🌲 隨機森林可以自動生成特徵重要性列表,有助於降維,也可以視為一組決策樹或特徵工程。 24:59 🌍 位置數據,如緯度和經度,可以是建模中至關重要的特徵,即使它們看似相近。轉換套件可以幫助有效處理這些數據。 25:27 📅 日期對於特徵工程非常有價值,特別是用於理解購買行為。考慮因素包括一周的哪一天、節假日和地區差異。 26:50 💎 有時,二進制編碼(零或一)可以比連續變數更強大,特別是用於檢測特定事件,如高端商店的購買。 27:31 📊 決策樹可用於為直方圖創建分組,每個葉節點代表一個不同的類別。 28:43 🏛 資料科學家和實施團隊之間的合作對於降低資料科學專案的失敗率至關重要。 29:24 🌳 從決策樹開始可以使模型開發更加容易理解,可以更好地與利益相關者溝通。 30:33 📈 無監督學習可以幫助您探索數據並改進特徵工程,特別是在處理時間序列數據時。 32:13 📉 在編碼稀疏、高維、逗號分隔的特徵,如URL訪問歷史時,考慮按層次分組以獲得更好的結果。 34:28 📊 決策樹可用於確定連續變數轉換為分類變數的分組限制。 35:21 🗳 通過對多數類別進行欠取樣來解決類別不平衡問題,以平衡數據並提高模型性能。 50:23 📊 在處理缺失值時,考慮使用不同的模型來處理帶有缺失值的記錄和完整數據的記錄,或者使用一種基於樹的方法,為空值單獨創建一個類別。 52:29 📊 處理許多分類值,特別是高基數的情況,對於大多數算法可能不是一個重大問題,因為在特徵選擇過程中,許多這些值可能會被消除。 這些時間戳記提供了有關資料科學和特徵工程的寶貴信息,強調了策略、挑戰以及有效處理數據的重要性。特別是在處理缺失值和分類變數時,有一些實用的方法和建議可以幫助您在資料科學專案中取得成功。如果您有任何更具體的問題或需要進一步的解釋,請隨時提問。
@chrstfer2452 7 ай бұрын
Whats funny looking back to this now is that moment google stepped back? That was when they first got BERT to a pre-RLHF GPT-3 level of competence, but the rumor is some execs got spooked and backburnered it. And 2.5ish years on people started unironically intentionally using bing for the first time since they downloaded chrome. I expect those execs got canned but i havent followed closely.
@harjassgambhir 7 ай бұрын
that bias issue is pretty interesting on how to know when to retrain the data, I guess the model could be deployed on about 40-50 percent of the current users who are newly signing up. Then do an iteration of an AB test again on those results, and similarly in a loop until a certain threshold has been reached where mostly everyone is getting a sign up post calls. It would still not be 100% as nothing can be 😅 but might provide more data for furrher iterations of the model with less bias than if we just deployed it on all users together and then try to retrain it.
@youtubeuser4878 7 ай бұрын
Awesome presentation. Can anyone suggest resources (books, courses) to upskill in data science, specifically in the marketing related domain?
@rezamahmoudi163 7 ай бұрын
please share slide ?
@user-pw6hk6yf2m 7 ай бұрын
Nice sum up of these packages for feature engineering
@ResilientFighter 7 ай бұрын
Such underrated video
@TommasoFerracina 7 ай бұрын
Thank you Jacqui for this useful and well delivered presentation 😊
@GustavoSuto 8 ай бұрын
Excelent thought: "Visualizations will act as a campfire around which everyone will gather to tell stories."
@mrvincefox 8 ай бұрын
Audio sucks
@TechwithSaad-of4ure 8 ай бұрын
You are my inspiration, Lisa! I have been getting so much following your pathways and your learning resources are extremely helpful. Your dedication and commitment to learning inspired me as well. Keep up the magnificent work in data, and may your journey be filled with continued growth and success!
@deep.extrospection 8 ай бұрын
After following this presentation, I took 3 minutes to run stacknet and it moved me up about 25 positions on the leaderboard. By doing feature engeneering & selection I think performance will increase even more.
@travelsandbooks 8 ай бұрын
@Data Science Festival where is the link to the pack referenced in the talk, please?
@DataScienceFestival 8 ай бұрын
Hey! Julia has kindly supplied us with all relevant resources to this talk to share with our community. You can find these linked on her Summer School session, on our website:
@agnejokubonyte2655 11 ай бұрын
Can you prepare for restaurants options if they are starting straight away prepare food or if they are busy at their own restaurant they will start to work on delivery food after 5min, 10min or 15 min.
@junal27 11 ай бұрын
Excellent, thank you
@Bellis692 Жыл бұрын
What an impeccable presentation! As an expert from deep tech myself, this is simply the best talk among all the sessions I attended on that DSF day.
@andreymelnik384 Жыл бұрын
It would be great to have the speaker's name and affiliation in the description.
@DataScienceFestival Жыл бұрын
Hey Andrey, you can find out about the speaker here:
@andreymelnik384 Жыл бұрын
@@DataScienceFestival Thanks for the prompt response! (And for the event and sharing the talks in the first place) After working out the speaker's name by googling, I noticed that names are included in thumbnails, but trying to read them on desktop is so much pain. I wonder if you could add names or links to descriptions of all videos?
@DataScienceFestival Жыл бұрын
@@andreymelnik384 Thanks, I'll pass this onto the team and suggest it as it would be helpful :)
@anggipermanaharianja6122 Жыл бұрын
Very useful!
@soumilyade1057 Жыл бұрын
If the tutorial could have time stamps
@Andromeda26_ Жыл бұрын
Thank you for providing such informative insights. Undoubtedly, the utilization of graph databases is essential for professionals working in the field of data.
@howardtaylor9114 Жыл бұрын
Excellent. Thank you Nichola. Interesting to hear the practical issues you are encountering and solving.
@howardtaylor9114 Жыл бұрын
Great stuff! Timely, interesting and clear. - Thank you Kris. I experimented with ggerganov/whisper.cpp. In case it helps anyone ... Audacity wav exports have to be signed 16 bit pcm to work. Wav files are quite large, even for small samples. Congratulations on the promotion :)
@rafaelvalerofernande Жыл бұрын
Very interesting!. Old project. Give Feedback. Creation of learning plan. Why rather tan what?. Keeping it relevant. Protect time for learning. Create passion as spark joy.
@lisa_data Жыл бұрын
Thank you for the feedback and summary <3 Great summary!
@urbannomads6485 Жыл бұрын
Thank you for this great video, very informative.
@AndrejAndrejev Жыл бұрын
I kind of disagree about call_item_item() function. Because we search similar items of same kind (candidate and candidate) instead of using dot product we need to use something like tf.keras.losses.cosine_similarity() to find nearest neighbors. For functions like call_user_items() or call_item_users() we can use tf.keras.layers.Dot() because those are query and candidate items.
@karatemoscow Жыл бұрын
terrible indian english 🤮
@attranquoc3999 Жыл бұрын
i really like your project. I want to understand clearly about algorithm. Can you share me some documents?
@mariahameed3386 Жыл бұрын
where is thegithub link..??
@DataScienceFestival Жыл бұрын
Hi Maria here is the link for GitHub
@spicytuna08 Жыл бұрын
thanks. how would i apply L1/L2 to resolve overfitting problem?
@jamesche616 Жыл бұрын
1:22:21 - Some users have more than 1 product. By randomly generating numbers do not guarantee that the products are not owned by an owner. For an example, user 'PIXcm7Ru5KmntCy0yA1K' has 3 products namely [10524048, 9870070, 11574730]. Random generation of indices can unfortunately end up being 9870070. You can code such that these 10 indices do not end up in the situation.
@tomewing2456 Жыл бұрын
The Github repo containing the code from the talk is here: The Slides used in the talk are here: The Dask site is here:
@odilev8315 Жыл бұрын
Great presentation 👍🏾👍🏾👍🏾
@surendrabarsode8959 2 жыл бұрын
Excellent presentation by Ailish. She explained the various measures very clearly with detailed examples. Usually, no one likes to explain such concepts so clearly with examples but these so called 'experts' talk round and round using jargons. However, it is amusing to watch her discomfort when answering questions!! In fact, the questions were very easy to answer compared to what she explained.
@vaish1134 2 жыл бұрын
Wow such an informative video ! 💯🙌🏻
@nicolalee9367 2 жыл бұрын
It helps a lot on my case study of Zopa ! REALLY APPRECIATE 🤩
@soylentpink7845 2 жыл бұрын
Great presentation! Can the notebook + data be found somewhere?
@DataScienceFestival 2 жыл бұрын
Thanks for the positive feedback! We are unable to provide the above information, but you can always try reaching out to the speakers (listed above) on LinkedIn.
@ememobongekpenyong8576 2 жыл бұрын
Very interesting presentation. I am inspired!