AWS Tutorials - Data Quality Check in AWS Glue ETL Pipeline

  Рет қаралды 8,259

AWS Tutorials

AWS Tutorials

2 жыл бұрын

AWS Tutorials - Data Quality Check using AWS Glue DataBrew - • AWS Tutorials - Data Q...
AWS Tutorials - Building ETL Pipeline using AWS Glue and Step Functions - • AWS Tutorials - Buildi...
The Code and Configuration - github.com/aws-dojo/analytics...
Maintaining data quality is very important for the data platform. Bad data can break ETL jobs. It can crash dashboards and reports. It can hit accuracy of the machine learning models due to bias and error. Learn how to configure data quality check in AWS Glue ETL Pipeline.

Пікірлер: 22
@yenbui2697
@yenbui2697 Жыл бұрын
I really love your video & they help me a lot on learning Glue, Amazing works. Thanks a lot for that. If would be great if you can have video about building CDK and how to have IAC/CICD for glue pipeline that we can deploy to different environments. Looking forwards to hearing from you soon.
@prannoyroy5312
@prannoyroy5312 Жыл бұрын
Great demo and example of this type of integration
@AWSTutorialsOnline
@AWSTutorialsOnline Жыл бұрын
Thanks
@MahmoudAtef
@MahmoudAtef 2 жыл бұрын
That was extremely helpful, thank you!
@AWSTutorialsOnline
@AWSTutorialsOnline 2 жыл бұрын
Glad it was helpful!
@arunasingh8617
@arunasingh8617 2 жыл бұрын
You are doing a wonderful job!! Its extremely informative (y)
@AWSTutorialsOnline
@AWSTutorialsOnline 2 жыл бұрын
So nice of you
@manojt7012
@manojt7012 2 жыл бұрын
That's something really more useful content. Thanks a lot have you ever worked on pydeequ and can you make a video on it?
@nlopedebarrios
@nlopedebarrios 4 ай бұрын
Hi, I've noticed that you use the catalogued s3 bucket as the target in the glue job, instead of the actual bucket. Are there any advantages of doing that?
@jeety5
@jeety5 2 жыл бұрын
Thank you so much!. I am trying to build a data quality framework for all our etl pipelines(Batch and real time). Can we hold the rules for dfferent etls in a data store(Dynamodb, s3, etc) and then call those rules based on the pipeline. I thought to use deequ until I came across this video which seems much easier option than handling it in a library as long as it provides most apis as deequ. Kindly advice.
@ShivaKrishnagskrishna
@ShivaKrishnagskrishna Жыл бұрын
Hi Samy, have you used deequ or any other frameworks to test etl pipelines
@terrcan1008
@terrcan1008 2 жыл бұрын
Thanks for such an informative session for this Glue pipeline. But is it possible for you to put the steps like you did for previous videos on your Aws-dojo website(minus code). As this is really helpful to check steps been followed by us against your steps, in case of any error we are facing.
@AWSTutorialsOnline
@AWSTutorialsOnline 2 жыл бұрын
Yes, absolutely. I will plan about it.
@user-jh9ig3sl3h
@user-jh9ig3sl3h 6 ай бұрын
How can I take ruleset defined in DynamoDb items and add to data quality job?
@veerachegu
@veerachegu 2 жыл бұрын
Shall we use same scenario with glue work flow instead of using step function?
@AWSTutorialsOnline
@AWSTutorialsOnline 2 жыл бұрын
You can but then you need to use boto3 API to call reusable Step Functions Workflow in Glue Job. Also build reusable SF WF as express workflow. I would recommend you use Step Functions for ETL pipeline of you intend
@veerachegu
@veerachegu 2 жыл бұрын
@@AWSTutorialsOnline yes but issue is we need to implement all things through CFT only if I can go with step function many lambda functions need to implement so feel little tricky while implementing CFT that's why we are going to preffer glue work flow
@williamlatorre231
@williamlatorre231 Жыл бұрын
Hi, great video, one question, how to pass the params from the StartProfileJob to CheckDQOutput, to read the jobname, filename etc in the lambda function?, tkx
@AWSTutorialsOnline
@AWSTutorialsOnline Жыл бұрын
This video is quite old. I suggest you should now use DataQuality rules in Glue. Here is a video about it - kzfaq.info/get/bejne/o9N8nM2muZjWfHk.html
@williamlatorre231
@williamlatorre231 Жыл бұрын
@@AWSTutorialsOnline I found the issue, the "Resource" must be called with .sync to replicate all the parameteres, now it's working fine
@user-jh9ig3sl3h
@user-jh9ig3sl3h 6 ай бұрын
Can you please provide script of above feature implementation?
@hamzakazmi5150
@hamzakazmi5150 Жыл бұрын
It would be great if you can share slides with us
AWS Tutorials - Using AWS Glue DataBrew in JupyterLab
26:47
AWS Tutorials
Рет қаралды 1,4 М.
Became invisible for one day!  #funny #wednesday #memes
00:25
Watch Me
Рет қаралды 52 МЛН
AWS Tutorials - ETL Pipeline with Multiple Files Ingestion in S3
41:30
AWS Tutorials - AWS Glue Studio vs. Glue DataBrew
28:52
AWS Tutorials
Рет қаралды 6 М.
AWS Tutorials - Using Concurrent AWS Glue Jobs
24:33
AWS Tutorials
Рет қаралды 6 М.
Main filter..
0:15
CikoYt
Рет қаралды 16 МЛН