AWS Glue PySpark: Filter Data in a DynamicFrame

  Рет қаралды 7,946

DataEng Uncomplicated

DataEng Uncomplicated

Күн бұрын

This video is a technical tutorial on how to use the Filter class in AWS Glue to filter our data based on values in columns of our dataset. this walkthrough will demo the filter class applied to a DynamicFrame in aws glue to filter based on numbers in a column, strings in a column, and how to filter based on values from multiple columns simultaneously.
Timelines:
00:00 introduction & configuring data inputs
01:14 Filter by Number Range on a Column
03:25 Filter Columns from values in a list
04:58 Filter on Multiple Columns Values
GitHub Repo link: github.com/AdrianoNicolucci/d...

Пікірлер: 10
@sanjitkhasnobis6265
@sanjitkhasnobis6265 7 ай бұрын
Very nice video
@rolandoperez7688
@rolandoperez7688 Жыл бұрын
Excellent tutorial. Thank you.
@DataEngUncomplicated
@DataEngUncomplicated Жыл бұрын
Thanks Rolando, I got more videos on AWS glue in the pipeline so stay tuned!
@ashishsinha5338
@ashishsinha5338 Жыл бұрын
Really crisp and clean,,,,!!can u make more video like this on Glue ETL transformation with complex data. Thanks..😊
@DataEngUncomplicated
@DataEngUncomplicated Жыл бұрын
Hi Ashish, thanks for your feedback! I'm glad you found this helpful. Yes! I actually plan on covering every single glue transformation with complex data. I'm making a glue transformation series/playlist.
@ashishsinha5338
@ashishsinha5338 Жыл бұрын
@@DataEngUncomplicated Thank you so much.
@joelluis4938
@joelluis4938 Жыл бұрын
Great video ! Did you think about creating a video showing a project from start to end getting data and processing ? I wonder why do do this step in Glue instead of Athena with SQL lenguage .. I'm New on aws so I'm trying to see the difference of both services for this task.
@DataEngUncomplicated
@DataEngUncomplicated Жыл бұрын
Hi Joel, it's on my video list to do a project from start to finish! This video is part of my video series where I'm walking through all the various aws glue transforms which can be used to manipulate your data. AWS Athena can be used to query your data for adhoc analysis you still need to write it out somewhere in your data lake or to your warehouse if your goal is to write it somewhere as part of your pipeline. This is where aws glue comes in.
@joelluis4938
@joelluis4938 Жыл бұрын
@@DataEngUncomplicated It would be a good approach to use Athena to prepare my data for one Adhoc report . I mean (One time report )? But for my weekly and monthly reports I would use Glue to create and automatize my pipeline to have all data ready for those reports ?
@DataEngUncomplicated
@DataEngUncomplicated Жыл бұрын
Yea you got it! Aws glue to bring in the data while Athena is perfect for doing adhoc queries or one time reports.
AWS Glue PySpark: Rename Fields
5:04
DataEng Uncomplicated
Рет қаралды 2,7 М.
AWS Tutorials - Handling PII Data in AWS Glue
35:12
AWS Tutorials
Рет қаралды 4,1 М.
DAD LEFT HIS OLD SOCKS ON THE COUCH…😱😂
00:24
JULI_PROETO
Рет қаралды 14 МЛН
WHO LAUGHS LAST LAUGHS BEST 😎 #comedy
00:18
HaHaWhat
Рет қаралды 23 МЛН
Mom's Unique Approach to Teaching Kids Hygiene #shorts
00:16
Fabiosa Stories
Рет қаралды 14 МЛН
AWS Glue PySpark: Flatten Nested Schema (JSON)
7:51
DataEng Uncomplicated
Рет қаралды 13 М.
AWS Tutorials - Interactively Develop Glue Job using Jupyter Notebook
25:09
AWS Tutorials - Incremental Data Load from JDBC using AWS Glue Jobs
27:31
AWS Tutorials - AWS Glue Job Optimization Part-1
29:34
AWS Tutorials
Рет қаралды 12 М.
AWS Glue PySpark: Upserting Records into a Redshift Table
8:48
DataEng Uncomplicated
Рет қаралды 7 М.
AWS Glue ETL Vs EMR - Which one should I use?
8:05
Johnny Chivers
Рет қаралды 37 М.
DAD LEFT HIS OLD SOCKS ON THE COUCH…😱😂
00:24
JULI_PROETO
Рет қаралды 14 МЛН