Data quality for big datasets

  Рет қаралды 5,838

Data Science Festival

Data Science Festival

Ай бұрын

A talk by Akshay Dineshkumar Jain from Innovate UK.
The talk will cover automated data quality checks performed by large organisations to execute data reliability checks on big datasets in real time using data profiling and machine learning techniques. The demo will use the open source library Deequ, Spark framework and reporting & notifications tools to enforce data issues in a proactive manner. I will be covering an example of a framework I have developed at Amazon and Visa to validate customer facing data and its integration with notification tools based on the statistical methods.
Technical Level: Technical practitioner
This session was part of the Data Science Festival MayDay event 2024. Find out more at datasciencefestival.com/event...
The Data Science Festival is the place for data-driven people to come together, share cutting-edge ideas, and solve real-world problems. We run monthly events, meet-ups, and the biggest free-to-attend data festivals in the UK. Join the community at datasciencefestival.com/

Пікірлер
7 Learnings from Scaling a Data Science Product in PowerBI
20:20
Data Science Festival
Рет қаралды 34
MLOPS in Financial Services
15:03
Data Science Festival
Рет қаралды 112
I'm Excited To see If Kelly Can Meet This Challenge!
00:16
Mini Katana
Рет қаралды 29 МЛН
World’s Largest Jello Pool
01:00
Mark Rober
Рет қаралды 109 МЛН
🚨 YOU'RE VISUALIZING YOUR DATA WRONG. And Here's Why...
17:11
Adam Finer - Learn BI Online
Рет қаралды 33 М.
Data Analysis with Python for Excel Users - Full Course
3:57:46
freeCodeCamp.org
Рет қаралды 2,2 МЛН
Implementing a Data Quality Framework in Purview
51:07
SQLBits
Рет қаралды 9 М.
Delta Live Tables A to Z: Best Practices for Modern Data Pipelines
1:27:52
Conquering the Next Frontier in Data Science
34:42
Dataiku
Рет қаралды 920
What is Data Pipeline? | Why Is It So Popular?
5:25
ByteByteGo
Рет қаралды 117 М.