AWS Glue Job Import Libraries Explained (And Why We Need Them)

  Рет қаралды 16,792

DataEng Uncomplicated

DataEng Uncomplicated

2 жыл бұрын

This video explains the 6 import statements in a boilerplate glue script to help data engineers understand why we need them and what they do.
#aws #awsglue #pyspark

Пікірлер: 30
@mohammedgt8102
@mohammedgt8102 Жыл бұрын
Perfect and straight to the point. I got in 5 min what I couldn't get in an hour.
@DataEngUncomplicated
@DataEngUncomplicated Жыл бұрын
Thanks Mohammad, That's the style of videos I go for on my channel. I try to make my videos as short as concise as possible.
@sukulmahadik0303
@sukulmahadik0303 Жыл бұрын
Cool explanation. I had never paid attention to these boiler plate statements
@BeABetterDev
@BeABetterDev 2 жыл бұрын
Short and sweet. Thanks.
@DataEngUncomplicated
@DataEngUncomplicated 2 жыл бұрын
I learned from the best 😉
@mickyman753
@mickyman753 2 ай бұрын
Just found your channel. can we have a complete playlist , a type of course or a oneshot video/videos, your explain in depth and I found your videos better than the other tutorials on youtube
@DataEngUncomplicated
@DataEngUncomplicated 2 ай бұрын
Thanks! Check out my playlists I have various ones for each AWS service I have made videos for. It sounds like that's what you are looking for.
@danielchicaiza7698
@danielchicaiza7698 5 ай бұрын
Liked, suscribed and commented! Thank you very much for your help! Greetings from Colombia!
@DataEngUncomplicated
@DataEngUncomplicated 5 ай бұрын
Gracias, amigo!
@sanchitgarg5275
@sanchitgarg5275 Жыл бұрын
Nice Video! I am struggling to find a way how I can set the script location path in the jupyter notebbok. I can see there is no magic command to do that and aws does not allow to make any changes manually under the tab "job details". Can u help me if there is any way?
@nikhilgupta110
@nikhilgupta110 2 жыл бұрын
Loved this video. Just a question, isn't it import * a bad coding practice? If you have already created video on practical implementation of those 24 classes then please share link, if not, I request you to make a video on that. "Took the one less traveled by, And that has made all the difference" .
@DataEngUncomplicated
@DataEngUncomplicated 2 жыл бұрын
Hi Nikhil! thanks for the comment and feedback! Honestly, I wasn't sure if people would find this video interesting or not....These are the boilerplate statements that aws glue provides when you create it from scratch. I guess you can even remove some or modify the statements if you want to keep it more focused or don't need them. I have no videos on the 24 classes already but I'm happy to hear that you think there is value in creating videos on these... I will add it to my video backlog list.
@abdullahkheruwala9910
@abdullahkheruwala9910 5 ай бұрын
I have files in an s3 bucket whose type is gz. The gz file consists of json records (each line is a record in json format). How can I read such file using glue dynamic frame?
@DataEngUncomplicated
@DataEngUncomplicated 5 ай бұрын
If you use the data catalog crawler on this folder, it should add the dataset to the glue catalog, you can then read and write to the dynamic from aws glue. Check out my other videos where I walk through how to do this with other formats
@Scott-s7f
@Scott-s7f 6 күн бұрын
nice video! what's the point of using jobs in notebooks since bookmarks aren't supported there? is there another benefit?
@DataEngUncomplicated
@DataEngUncomplicated 5 күн бұрын
Thanks, the notebook is was just a way for me talk through the content. I would say the benefit of using a notebook is to make the development experience better as you can get feedback after every function you run instead of having to trigger the entire job.
@Scott-s7f
@Scott-s7f 5 күн бұрын
@@DataEngUncomplicated oh thanks but I meant what is the use of the Job import and doing job init and commit in a notebook since bookmarks aren't supported?
@saksheegoel2654
@saksheegoel2654 Жыл бұрын
Can we not create functions (def fn() ) is streaming glue jobs??
@DataEngUncomplicated
@DataEngUncomplicated Жыл бұрын
Hi Sakshee, I haven't worked with streaming jobs yet but I don't see why we wouldn't able to create functions in streaming glue jobs.
@MuhammadImran-lr5tn
@MuhammadImran-lr5tn Жыл бұрын
Hello sir i am facing no module named awsglue.context when i wrote the above imports in aws glue python shell. can you please help. thank you
@DataEngUncomplicated
@DataEngUncomplicated Жыл бұрын
Hi Muhammad, the python shell doesn't come with pyspark, you need to create a job that leverages the spark script instead of python shell
@MuhammadImran-lr5tn
@MuhammadImran-lr5tn Жыл бұрын
@@DataEngUncomplicated Thank you for your reply. Can you please elaborate step by step procedure what i should do in order to execute awsglue.context library in aws glue job python shell.
@DataEngUncomplicated
@DataEngUncomplicated Жыл бұрын
What are you trying to do exactly in your script? If you need to use spark than you shouldn't be configuring a python she'll script. Select the pyspark script option instead.
@MuhammadImran-lr5tn
@MuhammadImran-lr5tn Жыл бұрын
@@DataEngUncomplicated Thank you so much for your quick reply. I understand now what I was doing wrong now because of your guidance again thank you. The only point I want to get clarification on is that please elaborate is awsglue library is something that is used in pyspark context and it is related to pyspark not related to simple python shell am i right ?
@DataEngUncomplicated
@DataEngUncomplicated Жыл бұрын
@@MuhammadImran-lr5tn You're welcome! Yes, that's my understanding. You don't need that library for creating a python shell job.
@AmritAgarwal07
@AmritAgarwal07 Жыл бұрын
Can be update the data in database using glue jobs
@DataEngUncomplicated
@DataEngUncomplicated Жыл бұрын
I think you are trying to ask if we can update data in database with aws glue? Yes absolutely. It's one of the main use cases
@AbhishekChauhan-kv7ds
@AbhishekChauhan-kv7ds 4 ай бұрын
i'm new to aws and i'm working on a project but i'm unable to it. I'm getting Unresolved reference 'awsglue' Can you help me with this?
@DataEngUncomplicated
@DataEngUncomplicated 4 ай бұрын
Where are you developing your glue job?
@Fight3211
@Fight3211 Жыл бұрын
Hi I have a question about the interaction between creating a "normal" spark session and glue, I needed to import a JAR and I got it working with spark = SparkSession.builder\ .appName("my-app") \ .config('spark.jars.packages', 'graphframes:graphframes-0.8.2-spark3.2-s_2.12')\ .getOrCreate() I commented out sc = SparkContext() glueContext = GlueContext(sc) spark = glueContext.spark_session So two things Im missing out is dynamic frames and save job states, how do I modify the original arguments so that I can bring gluecontext back in? Thank you
Orchestrate Glue Jobs With Step Functions
7:34
DataEng Uncomplicated
Рет қаралды 13 М.
AWS Hands-On: ETL with Glue and Athena
22:35
Cumulus Cycles
Рет қаралды 25 М.
MEGA BOXES ARE BACK!!!
08:53
Brawl Stars
Рет қаралды 35 МЛН
ОДИН ДЕНЬ ИЗ ДЕТСТВА❤️ #shorts
00:59
BATEK_OFFICIAL
Рет қаралды 9 МЛН
THEY WANTED TO TAKE ALL HIS GOODIES 🍫🥤🍟😂
00:17
OKUNJATA
Рет қаралды 20 МЛН
AWS Glue ETL Vs EMR - Which one should I use?
8:05
Johnny Chivers
Рет қаралды 36 М.
Mastering AWS Glue Unit Testing for PySpark Jobs with Pytest
11:41
DataEng Uncomplicated
Рет қаралды 4,3 М.
Why Data Engineers Should Develop AWS Glue Jobs Locally
6:45
DataEng Uncomplicated
Рет қаралды 6 М.
Automate your job with Python
6:07
John Watson Rooney
Рет қаралды 375 М.
Top AWS Services A Data Engineer Should Know
13:11
DataEng Uncomplicated
Рет қаралды 154 М.
Practical Projects to Learn Data Engineering On AWS
8:04
DataEng Uncomplicated
Рет қаралды 43 М.
Author AWS Glue jobs with PyCharm Using AWS Glue Interactive Sessions
12:23
DataEng Uncomplicated
Рет қаралды 12 М.
Manage AWS Glue Jobs with Step Functions
19:36
Knowledge Amplifier
Рет қаралды 13 М.
MEGA BOXES ARE BACK!!!
08:53
Brawl Stars
Рет қаралды 35 МЛН