No video

Introducing MLflow for End-to-End Machine Learning on Databricks

  Рет қаралды 21,818

Databricks

Databricks

Күн бұрын

Solving a data science problem is about more than making a model. It entails data cleaning, exploration, modeling and tuning, production deployment, and workflows governing each of these steps. In this simple example, we’ll take a look at how health data can be used to predict life expectancy. It will start with data engineering in Apache Spark, data exploration, model tuning and autologging with hyperopt and MLflow. It will continue with examples of how the model registry governs model promotion, and simple deployment to production with MLflow as a job or REST endpoint. This tutorial will cover the latest innovations from MLflow 1.12.
About:
Databricks provides a unified data analytics platform, powered by Apache Spark™, that accelerates innovation by unifying data science, engineering and business.
Read more here: databricks.com...
See all the previous Summit sessions:
Connect with us:
Website: databricks.com
Facebook: / databricksinc
Twitter: / databricks
LinkedIn: / databricks
Instagram: / databricksinc Databricks is proud to announce that Gartner has named us a Leader in both the 2021 Magic Quadrant for Cloud Database Management Systems and the 2021 Magic Quadrant for Data Science and Machine Learning Platforms. Download the reports here. databricks.com...

Пікірлер: 11
@billyreich1121
@billyreich1121 Жыл бұрын
Great lesson! Thanks for taking the time to create it and post it here.
@raghukundurthi1288
@raghukundurthi1288 3 жыл бұрын
Thanks! a comprehensive Databricks Notebooks that serves many purposes - Data Engineering (EDA), Data Cleansing(Impute et al), a 'how-to introduction to ' to pivot to Scala, SQL,Python(~4.11 minute marker), Plotly, & in a D-bricks Notebook and above all, salient features of setting up a model, exposure to delta lakes (bronze, silver, gold), saving data to a delta lake (RDBMS CRUD equivalent) that are the bread and butter of legions of ETL engineers. Joins and Tables (~4.59 MM), Create DB IF NOT EXISTS (6.05 MM), registering a delta table, consistent reads(7.33), handling missing data (8.35), see data via Seaborn(9.43) as part of a EDA exercise, a repeated run of a model to generate a table with data integrity (Gold Table, 11.04 MM), registering (aka Savings, CRUD connoisseurs),setting the stage for modeling (11.14), Koalas, a data manipulation tool (11.23) to distribute across clusters, then the meat of the presentation - building model in parallel across spark clusters and cardinal model features like xgboost, hyperopt, mlflow, building multiple models, lambda, learning rate, et al (from ~13.00 MM) , and THE MOST critical aspect - Parameter Tuning , hyperopt (13.35 MM). The author highlights the three main strengths here - Bayesian Optimizer, Spark integration and logging of the algorithm, mlflow. Actually, a fourth point he nonchalantly drops in is 'best loss' going down as the model works on the data. Logging of the mlflow for in-depth look and analysis of algo steps (~15MM)T and how to interpret the logs - compare 96 runs of the model in example, with focus on lower loss rates(~15.59 MM) , logging the model and feature explanation (~16.54 MM) , possible timeline to get "shap" for free (17.13 MM), declare victory and rope in DevOps /Webhook the pickle model (17.45 MM), managing the promotion to prod deploy via 'registry' (17.58 MM), he practical steps to Dockerize(a new verbiage, if it did not exist), productionalize (registry) and automate via a webhook (18:58 MM), interpreting the shap model visual (19.20 MM) promote to production (19.58 MM) , and wow, roping in the model built as a Spark UDF!(21 MM), deploy in Databrikcs/Azure Ml,kubernetese , REST API etc(22 MM), surface up a consumable Dashboard (22.57 MM), for the business users and subject matter experts. What is compelling is Owen's delivery style and story telling that is lucid and simple! Thank you, Sean!
@NewGirlinCalgary
@NewGirlinCalgary Жыл бұрын
Great Video Sean! Consise and clear! thanks a lot
@MrTulufan
@MrTulufan 22 күн бұрын
Nice video but the first half is just a walk through of a typical data science project, the real MLflow introduction starts on 14:30
@21Gannu
@21Gannu 3 жыл бұрын
Gerat video sean...
@jhngearsns5089
@jhngearsns5089 2 жыл бұрын
Mind sharing a notebook copy? Thanks!
@papachoudhary5482
@papachoudhary5482 3 жыл бұрын
Thanks! Sir
@ankitbarsainya
@ankitbarsainya 8 ай бұрын
Great video. Can you share the notebook link? @Databricks
@sumitbhalla2321
@sumitbhalla2321 3 жыл бұрын
Is there any api code snippets to enable model serving? I want to automate enable model serving. Please help. thank
@siobhanahbois
@siobhanahbois 3 жыл бұрын
This video is 1 hour old, how can it have 5 likes already?
Introducing MLflow for End-to-End Machine Learning on Databricks
40:06
Enable Production ML with Databricks Feature Store
33:12
Databricks
Рет қаралды 10 М.
黑天使遇到什么了?#short #angel #clown
00:34
Super Beauty team
Рет қаралды 41 МЛН
Joker can't swim!#joker #shorts
00:46
Untitled Joker
Рет қаралды 38 МЛН
Gli occhiali da sole non mi hanno coperto! 😎
00:13
Senza Limiti
Рет қаралды 13 МЛН
MLOps on Databricks: A How-To Guide
1:27:43
Databricks
Рет қаралды 55 М.
Learn to Use Databricks for the Full ML Lifecycle
39:47
Databricks
Рет қаралды 29 М.
Tobias Sterbak: Introduction to MLOps with MLflow
1:25:01
PyData
Рет қаралды 5 М.
MLFlow Tutorial | Hands-on | ML Tracking and Serving
13:48
Harsh Kumar
Рет қаралды 7 М.
ML Drift: Identifying Issues Before You Have a Problem
15:25
Fiddler AI
Рет қаралды 16 М.
Learn to Use Databricks for Data Science
38:56
Databricks
Рет қаралды 7 М.