AWS Glue PySpark: Drop Fields
3:15
AWS Glue PySpark: Rename Fields
5:04
Пікірлер
@mikitaarabei
@mikitaarabei Күн бұрын
Thanks for the content
@dinhhoangtu311
@dinhhoangtu311 3 күн бұрын
Well explained video. Thanks!
@DataEngUncomplicated
@DataEngUncomplicated Күн бұрын
You're welcome!
@mummypoko1530
@mummypoko1530 4 күн бұрын
hi, may I ask how to solve remove null value in aws glue studio cuz I tried several ways but cant make it😢
@Scott-s7f
@Scott-s7f 6 күн бұрын
nice video! what's the point of using jobs in notebooks since bookmarks aren't supported there? is there another benefit?
@DataEngUncomplicated
@DataEngUncomplicated 5 күн бұрын
Thanks, the notebook is was just a way for me talk through the content. I would say the benefit of using a notebook is to make the development experience better as you can get feedback after every function you run instead of having to trigger the entire job.
@Scott-s7f
@Scott-s7f 5 күн бұрын
@@DataEngUncomplicated oh thanks but I meant what is the use of the Job import and doing job init and commit in a notebook since bookmarks aren't supported?
@gudguy1a
@gudguy1a 6 күн бұрын
(a bit late to this video) But still - okay, very good - well done. I now wish I had come across you a year ago - I see from your videos, you do a good job. NO massive amounts of annoying face time in the videos as too many others do. Clear and concise info in the videos AND great narrating voice. I went through a lot of pain to first, obtain the GCP Pro Data Engineer cert last year - but because of NOT having the REQUIRED "multiple years of hands-on GCP data engineering work experience" companies want AND not many GCP DE roles near me (not doing full remote yet) AND the fall tech layoffs - I had to rotate back to AWS world (after a decade as an AWS Cloud Architect/Engineer - renewed Architect Pro cert last year). Spent the past 5 months studying my hinie off and then gaining the AWS DE cert (the reason why I wrote my June 15th paper - tinyurl.com/48e6skab to help others out, to 'maybe' expedite their path...). Now the problematic deal is to find a company who will take me on with the double DE certs and without the REQUIRED "multiple years of hands-on data engineering work experience". My thoughts have been, I should be a magnet to multiple companies after gaining these two supposedly difficult DE certs... AWS was difficult and did not seem to be an Associate level cert (I've acquired other AWS Pro certs and this AWS Data Engineer cert most definitely seemed at that level - it pissed me off because I failed the exam back in April...).
@Nayak_Bukya_08
@Nayak_Bukya_08 14 күн бұрын
as I am working on glue, it super argent to me, the question is, how to add the source file name in the dyamaic frame, it would be great if you could respond on a priority basis. Thank you
@DataEngUncomplicated
@DataEngUncomplicated 13 күн бұрын
Hi, this is outside the scope of this video. I don't know if this is even possible to be honest. With dataframes the files are abstracted away from us. Please post on repost your question to see if someone can help you out. Also Google search or reading the docs can be your friend!
@Nayak_Bukya_08
@Nayak_Bukya_08 14 күн бұрын
how to get input file name of a record in spark dynamic dataframe ?
@DataEngUncomplicated
@DataEngUncomplicated 11 күн бұрын
I don't think this method is possible in a dynamicFrame.
@blockchainstreet
@blockchainstreet 18 күн бұрын
Amazing Man!! Good one
@DataEngUncomplicated
@DataEngUncomplicated 17 күн бұрын
Thanks!
@joelayodeji2533
@joelayodeji2533 20 күн бұрын
thank you for making videos like this.
@DataEngUncomplicated
@DataEngUncomplicated 19 күн бұрын
Glad it was helpful!
@torpedoe1936
@torpedoe1936 24 күн бұрын
Thanks, sir !!!!!
@externalbiconsultant2054
@externalbiconsultant2054 24 күн бұрын
wondering if watching costs are really a data engineers activity?
@DataEngUncomplicated
@DataEngUncomplicated 24 күн бұрын
Yes, cost optimization is part of every role when working in a cloud environment. If you work for a large funded organization that isn't coming down on costs you might night feel it as much as a start up that freaks out for an extra $100 in cloud costs.
@bartstough8201
@bartstough8201 25 күн бұрын
Still a great overview. Makes everything a lot clearer. Thank you.
@DataEngUncomplicated
@DataEngUncomplicated 25 күн бұрын
Glad it was helpful!
@dougkfarrell
@dougkfarrell Ай бұрын
This is fantastic! I'm new to AWS Glue and was really struggling to get traction developing an ETL script. Being able to develop locally, I don't really care about the costs, but the ability to debug, get feedback, and just the turnaround time to try things is amazing. Again, thanks. I'd like to ask you more questions, how can I do that?
@DataEngUncomplicated
@DataEngUncomplicated Ай бұрын
Thanks, feel free to post your questions here. Me or someone else might be able to help you out!
@dougkfarrell
@dougkfarrell Ай бұрын
@@DataEngUncomplicated Thanks! I'm using Glue ETL to read two different CSV files into Dynamic Frames, normalize and union them together. I need to write some SQL to an existing RDS MySQL database to query records to figure out if I need to update or insert data. Is there a good (as in fast) way to iterate over the normalized, unioned DynamicFrame and read and write to an RDS MySQL database? Thanks in advance for any help!
@admiralbenbow7677
@admiralbenbow7677 Ай бұрын
i guess you forgot to show how to make a connection in pgadmin first
@DataEngUncomplicated
@DataEngUncomplicated Ай бұрын
Can you explain why you think you need to make a connection in pgadmin first? I walked through how to create the database connection in the glue catalog.
@admiralbenbow7677
@admiralbenbow7677 Ай бұрын
@@DataEngUncomplicated sorry iam new to this so forgive me if i am asking silly questions, isn't the data stored locally on your computer so you have to make a connection there first if not how can glue find where it's and how it automatically recognized etldemo
@admiralbenbow7677
@admiralbenbow7677 Ай бұрын
@@DataEngUncomplicated Oh silly me i got confused with data migration my bad😅
@nlopedebarrios
@nlopedebarrios Ай бұрын
Now AWS includes AWSSDKPandas-Python312 so it's easier to add pandas to your Lambda function. However, I'm getting "Missing optional dependency 'fsspec'. Use pip or conda to install fsspec." I've follow these steps to install the latest version, but failed: "Unable to import module 'lambda_function': No module named 'fsspec'". Any suggestions?
@rahulsrivastava9787
@rahulsrivastava9787 Ай бұрын
The concepts in this video went inside my brain like a hot knife going in butter. Great video for someone like me who comes from a functional background. Great work...really appreciated.
@tejaswi3046
@tejaswi3046 Ай бұрын
I am still facing with the numpy import error and even used AWSLambda-Python38-SciPy1x library , but unable to resolve , kindly let me know if any inputs
@DataEngUncomplicated
@DataEngUncomplicated Ай бұрын
Strange, sorry it worked for me and others from the lambda layer I selected. Try selecting the specific version I had perhaps?
@SonPhan1
@SonPhan1 Ай бұрын
i appreciate the really informative video! I followed everything and i'm stuck on running the pyspark code in the dev container environment. when i launch the dev container in the same/new window, i don't see the extensions in the container environment. The python interpreter doesn't show up either and when i go to the extension tab in the container environment, all the extensions are not installed. Is there additional configuration files in vs code i need to modify to enable the already installed extensions to run from the dev container?
@sylarguo7741
@sylarguo7741 Ай бұрын
I'm a newcomer to AWS a data engineer, Your channels really help me! Appreciate it!
@gilang6128
@gilang6128 Ай бұрын
love this
@onuabah3001
@onuabah3001 Ай бұрын
You're an excellent teacher, keep it up...subscribing now
@DataEngUncomplicated
@DataEngUncomplicated Ай бұрын
Thanks for the kind words!
@asfakmp7244
@asfakmp7244 Ай бұрын
Thanks for the video! I've tested the entire workflow, but I'm encountering an issue with the section on creating a DynamicFrame from the target Redshift table in the AWS Glue Data Catalog and displaying its schema. While I can see the updated schema reflected in the Glue catalog table, the code you provided still prints the old schema.
@user-bu1ct5kz9g
@user-bu1ct5kz9g Ай бұрын
thank you man!!!
@DeepakSingh-of2xm
@DeepakSingh-of2xm Ай бұрын
Can you please make a video showing practical implementation of event driven architecture using event bridge, sns, sqs and lambda? Thank you!
@DataEngUncomplicated
@DataEngUncomplicated Ай бұрын
I'll add it to my backlog. Thanks for the suggestion.
@DeepakSingh-of2xm
@DeepakSingh-of2xm Ай бұрын
@@DataEngUncomplicated Thank you, appreciate it. can we create the infrastructure using terraform if possible ?
@joudawad1042
@joudawad1042 Ай бұрын
one of the best channels on youtube for data engineering! great content, keep the great work
@DataEngUncomplicated
@DataEngUncomplicated Ай бұрын
Wow, thanks for the kind words! It made my day 😄
@DeepakSingh-of2xm
@DeepakSingh-of2xm Ай бұрын
Can you please make a video showing implementation of event driven architecture using event bridge, sns, sqs and lambda? Thank you!
@SimonLopez-hj2cj
@SimonLopez-hj2cj Ай бұрын
how do i get to personalize the message that sns sends?
@DataEngUncomplicated
@DataEngUncomplicated Ай бұрын
In the sns step there should be a box where you can customize the message
@SimonLopez-hj2cj
@SimonLopez-hj2cj Ай бұрын
@@DataEngUncomplicated then how do i use the parameters of the job? for example if i want to send "The job state is (~SUCCEDED~ or ~FAILED~). At this time ~endtime~ ", thanks
@victorsilva9000
@victorsilva9000 Ай бұрын
can lakeformation be created from IaC? like cloudformation/terraform?
@DataEngUncomplicated
@DataEngUncomplicated Ай бұрын
Yes lake formation has terraform resources!
@user-ij4ih8qp3e
@user-ij4ih8qp3e Ай бұрын
Thank u so much. Your tutorial helps me a lot.
@muralichiyan
@muralichiyan Ай бұрын
Data bricks glue are same?
@DataEngUncomplicated
@DataEngUncomplicated Ай бұрын
If you're asking if databricks and glue are the same then no they definitely are not.
@prabhathkota107
@prabhathkota107 Ай бұрын
I able to run successfully with glueContext.write_dynamic_frame.from_options & glueContext.write_dynamic_frame.from_jdbc_conf but I see some issue with glueContext.write_dynamic_frame.from_catalog as below: Getting below error: Error Category: UNCLASSIFIED_ERROR; An error occurred while calling o130.pyWriteDynamicFrame. Exception thrown in awaitResult: SQLException thrown while running COPY query; will attempt to retrieve more information by querying the STL_LOAD_ERRORS table Could you please guide
@DataEngUncomplicated
@DataEngUncomplicated Ай бұрын
These errors can be tricky and require you to look into the logs further to see what is causing it. It sounds specific to your data.
@stevenjosephceniza8245
@stevenjosephceniza8245 2 ай бұрын
Thank you for this guide! I tried using pycharm and my old computer cannot handle it. I almost purchased for a subscription.
@DataEngUncomplicated
@DataEngUncomplicated 2 ай бұрын
Hi, I'm not sure what you mean it almost purchased a subscription. But you need pro to use docker in pycharm.
@prabhathkota107
@prabhathkota107 2 ай бұрын
Beautifully explained about the setup. Understood how docker works as well. thanks a ton, Subscribed.
@prabhathkota107
@prabhathkota107 2 ай бұрын
Docker option not available in PyCharm community edition I guess
@DataEngUncomplicated
@DataEngUncomplicated 2 ай бұрын
Yes that's correct unfortunately
@prabhathkota107
@prabhathkota107 2 ай бұрын
Didnt understand why cost incured? As its running locally & why to keeep nodes set to 2
@DataEngUncomplicated
@DataEngUncomplicated 2 ай бұрын
Your ui is local but there is still a spark cluster running in AWS.
@prabhathkota107
@prabhathkota107 2 ай бұрын
Very much helpful. Thanks
@DataEngUncomplicated
@DataEngUncomplicated 2 ай бұрын
You're welcome!
@mihaicosmin866
@mihaicosmin866 2 ай бұрын
Is possible to do the same thing but with lines features?
@DataEngUncomplicated
@DataEngUncomplicated 2 ай бұрын
Hi, this should definitely be possible within FME
@jovidog9573
@jovidog9573 2 ай бұрын
Hello. I made a Glue Job that performs ETL changes to data in an S3 Bucket and exports the changed data to a Redshift database, but now I'm thinking of changing from Redshift to PostgreSQL. I know this video is for importing RDS data into Glue, but if I follow the video's instructions, would I also be able to export it back into RDS?
@DataEngUncomplicated
@DataEngUncomplicated 2 ай бұрын
Hi, This video is only about how to add an RDS data source like postgres to AWS Glue Catalog. So if you establish your postgres database connection, you should be able to read and write data to it.
@jzevakin
@jzevakin 2 ай бұрын
Thank you!!
@DataEngUncomplicated
@DataEngUncomplicated 2 ай бұрын
You're welcome!
@rohithreddy41
@rohithreddy41 2 ай бұрын
thank you for the video. I am unable to run the program because i do not see the run button after clicking "attach in current window"
@rohithreddy41
@rohithreddy41 2 ай бұрын
I had to install python in the container and now i see the run button.
@user-vb7im1jb1b
@user-vb7im1jb1b 2 ай бұрын
Thanks for this tutorial! I have a question: How do I store one big parquet file in s3 without running with Kernel dying issues? I have already use df.convert_dtypes() but it is still failing. My file has over 1.5 million rows. Files below 900k rows are not failing! Thanks
@bk3460
@bk3460 2 ай бұрын
sorry, what is wrong with df = spark.read.csv(path)?
@DataEngUncomplicated
@DataEngUncomplicated 2 ай бұрын
That works too but it's not using the aws glue library to do it.
@bk3460
@bk3460 2 ай бұрын
@@DataEngUncomplicated Sorry, I'm new to Spark and Glue. Would you mind to elaborate about glue library are you referring to? I know about Glue Data Catalogue, but it is not affected when I use df = spark.read.csv(path).
@DataEngUncomplicated
@DataEngUncomplicated 2 ай бұрын
Give a read on the aws glue api and the transformations that come with it: docs.aws.amazon.com/glue/latest/dg/aws-glue-api.html
@ajprasad6865
@ajprasad6865 2 ай бұрын
Clear and concise
@mickyman753
@mickyman753 2 ай бұрын
Just found your channel. can we have a complete playlist , a type of course or a oneshot video/videos, your explain in depth and I found your videos better than the other tutorials on youtube
@DataEngUncomplicated
@DataEngUncomplicated 2 ай бұрын
Thanks! Check out my playlists I have various ones for each AWS service I have made videos for. It sounds like that's what you are looking for.
@kckc1289
@kckc1289 2 ай бұрын
How would you recommend local dev and organization -> uploading to AWS for scripts with multiple files ?
@kckc1289
@kckc1289 2 ай бұрын
Do you have a Github for this Pytest example?
@DataEngUncomplicated
@DataEngUncomplicated 2 ай бұрын
Hey, checkout my videos on local development for AWS glue. I covered topics like using interactive sessions, pycharm and vs code with a docker container with AWS glue. In order to upload them, I recommend managing them with IaC with terraform or cdk.
@higiniofuentes2551
@higiniofuentes2551 2 ай бұрын
Thank you for this very useful video!
@DataEngUncomplicated
@DataEngUncomplicated 2 ай бұрын
You're welcome!
@user-gi2kp9hz5u
@user-gi2kp9hz5u 2 ай бұрын
how to use python in FME?
@DataEngUncomplicated
@DataEngUncomplicated 2 ай бұрын
You need to use the python caller or python creater transformer.
@renyang2320
@renyang2320 2 ай бұрын
Your functions based job is quite straightforward. Would you like to organize your glue job in a Python class?
@DataEngUncomplicated
@DataEngUncomplicated 2 ай бұрын
I made the script just for this KZfaq video, sure things could be organized into classes if it makes sense?
@DaveThomson
@DaveThomson 2 ай бұрын
Do you do any consulting?
@DataEngUncomplicated
@DataEngUncomplicated 2 ай бұрын
Hey David, I'm actually a full-time AWS D&A consultant for a company that is an AWS partner. Let me know if you want to chat.
@DaveThomson
@DaveThomson 2 ай бұрын
@@DataEngUncomplicated I would like to chat. I too work full time for a partner.
@DataEngUncomplicated
@DataEngUncomplicated 2 ай бұрын
@@DaveThomson Great, feel free to contact me through the email I have posted on my channel.
@DaveThomson
@DaveThomson 2 ай бұрын
@@DataEngUncomplicated sent you an email.
@Angleito
@Angleito 2 ай бұрын
how do you add third party python libraries ?
@DataEngUncomplicated
@DataEngUncomplicated 2 ай бұрын
I don't know an elegant way to do this but you can go into the docker container and install the python libraries you need directly that way.