Author AWS Glue jobs with PyCharm Using AWS Glue Interactive Sessions

  Рет қаралды 13,060

DataEng Uncomplicated

DataEng Uncomplicated

Күн бұрын

This video is based on an AWS blog post tutorial on getting started with AWS Glue interactive sessions.
AWS Blog Article: aws.amazon.com/blogs/big-data...
timeline
00:00 tutorial overview
01:33 AWS permission configuration
03:53 Pycharm Configuration
08:31 Run Jupyter notebook
#awsglue #aws

Пікірлер: 54
@tahayusufkomur1710
@tahayusufkomur1710 4 ай бұрын
That was so great man, thank you for sharing this. Subscribed!
@ericknavadel
@ericknavadel Жыл бұрын
Excellent video, thank you very much for sharing your knowledge with us.
@DataEngUncomplicated
@DataEngUncomplicated Жыл бұрын
Thanks Erick! I'm glad you found this helpful!
@ericsalesdeandrade9420
@ericsalesdeandrade9420 Жыл бұрын
Hi thanks for the tutorial. Super helpful. Would you know if it's possible (+ how) to use an external library within these interactive shell notebooks? I would like to use Pydantic and Pandera to validate the Dataframe schemas.
@DataEngUncomplicated
@DataEngUncomplicated Жыл бұрын
Hi everyone, for anyone recently getting errors when trying to start a interactive session, there appears to be a bug on windows machine. To solve this, if you revert the aws-glue-sessions to 0.32 this seemed to work again. You can revert by running the command pip install aws-glue-sessions==0.32 in the environment that has it installed.
@AmritAgarwal07
@AmritAgarwal07 Жыл бұрын
To run this command Jupiter-kernelspec. Aws cli is required.?
@DataEngUncomplicated
@DataEngUncomplicated Жыл бұрын
Not that I'm aware of since I believe this is not a aws cli command.
@tejaswinirao4873
@tejaswinirao4873 Жыл бұрын
Hi, I am not able to connect to the kernel , I could see all the glue libraries installed correctly. Jupyter terminal says kernel died please help to resolve this issue.
@DataEngUncomplicated
@DataEngUncomplicated Жыл бұрын
Strange, I'm not sure the root of the problem but perhaps restarting the kernel might help?
@RenevanDuren
@RenevanDuren Жыл бұрын
Finally got it to work, thanks to the comment of DataEng Uncomplicated solved my nightmare of not getting this to work on a Windows machine.
@DataEngUncomplicated
@DataEngUncomplicated Жыл бұрын
No problem! I felt your pain as I was reading this. I encountered the exact same issue. But when I first made the video, the issue didn't exist. I'm surprised they haven't fixed the issue yet with the later glue interactive session version.
@jinmina
@jinmina Жыл бұрын
how do we revert glue_kernel_version to 0.32? Is there any command line you can share? I do have the following error message: C:\Users\***\PycharmProjects\GlueInteractiveSession1\venv\Scripts\python.exe: Error while finding module specification for 'aws_glue_interactive_sessions_kernel.glue_pyspark.GlueKernel' (ModuleNotFoundError: No module named 'aws_glue_interactive_sessions_kernel')
@jinmina
@jinmina Жыл бұрын
I learn that there is a bug with current aws-glue-sessions. Please run the following command. That will do the trick. pip install aws-glue-sessions==0.32
@DataEngUncomplicated
@DataEngUncomplicated Жыл бұрын
If you are on a windows machine, you can use pip to revert back to 0.32.
@jeancarlovallejos2464
@jeancarlovallejos2464 Жыл бұрын
Hi! I have this error mesage: raise Exception(f"Valid Glue versions are {VALID_GLUE_VERSIONS}") Exception: Valid Glue versions are {'3.0', '2.0'}
@DataEngUncomplicated
@DataEngUncomplicated Жыл бұрын
Hi Jean, Sounds like you have not entered a valid glue version
@jeancarlovallejos2464
@jeancarlovallejos2464 Жыл бұрын
@@DataEngUncomplicated thanks for answer! And how can I ensure the correct version of Glue? I followed the same steps as yours installing py 3.10, spark and pyspark
@DataEngUncomplicated
@DataEngUncomplicated Жыл бұрын
There is a glue magic you can use to assign it to be 3.0
@zhouhaozqq
@zhouhaozqq Жыл бұрын
I had the same problem, have you solved it? And how?
@prabhathkota107
@prabhathkota107 2 ай бұрын
Didnt understand why cost incured? As its running locally & why to keeep nodes set to 2
@DataEngUncomplicated
@DataEngUncomplicated 2 ай бұрын
Your ui is local but there is still a spark cluster running in AWS.
@SoumilShah
@SoumilShah Жыл бұрын
You can just run glue locally on docker that ways you don’t pay a dime
@DataEngUncomplicated
@DataEngUncomplicated 10 ай бұрын
This is a great point! I'm working on a tutorial to set this up in my next video. I learned the documentation is not up to date and is missing important steps to get aws glue 4.0 docker container set up! I will share my video when it's done in a link here
@DataEngUncomplicated
@DataEngUncomplicated 10 ай бұрын
kzfaq.info/get/bejne/Y5qKoa2cspO1dJ8.htmlsi=GfAggr80BlN5eDp3
@CHANTI8947
@CHANTI8947 2 жыл бұрын
Is the same possible to do using vs code
@DataEngUncomplicated
@DataEngUncomplicated 2 жыл бұрын
Yes I believe so. I haven't tried it thougb
@ryanyue5159
@ryanyue5159 Жыл бұрын
@@DataEngUncomplicated can you please have a video or instruction to do with vs code? I got a problem which cannot find pyspark kernel via vscode
@mujadidkhalid3826
@mujadidkhalid3826 2 жыл бұрын
Can you please show some example how can we put functions in simple python file and then use that in notebook for glue interactive session? thanks
@DataEngUncomplicated
@DataEngUncomplicated 2 жыл бұрын
Hi Mujadid, it would be the same as developing a function in any other jupyter notebook with pyspark. Sorry the video was based on a tutorial by aws
@maximilianrausch5193
@maximilianrausch5193 2 жыл бұрын
I have a similar question. Is it possible to run a python script or can you only use jupyter?
@DataEngUncomplicated
@DataEngUncomplicated 2 жыл бұрын
@@maximilianrausch5193 Hi Max, their documentation doesn't say it's only for jupyternote books but I haven't tested it out with just a python script...in their press release they say "Interactive Sessions let them process data interactively using the Jupyter-based notebook or IDE of their choice."
@maximilianrausch5193
@maximilianrausch5193 2 жыл бұрын
@@DataEngUncomplicated if you could make a follow up video where you test a script that would be really helpful! I’ll also put in a ticket on AWS support.
@DataEngUncomplicated
@DataEngUncomplicated 2 жыл бұрын
Sure I have added this to my future video list, I'll try to play around with this feature to see if I can get it to work....Are you ok if I give you a shout out in this future video..."this video was requested by Maximilian as a follow up to my previous video"
@js3860
@js3860 2 жыл бұрын
Nice video. If like me you are using federated access and an assumed role this whole process will fail. Sadly AWS hasn't built out their service for SSO customers :(
@DataEngUncomplicated
@DataEngUncomplicated 2 жыл бұрын
Thanks, ah that's a shame! thanks for posting this point.
@thevijayraj34
@thevijayraj34 2 жыл бұрын
Can we do this with Python Community edition?
@DataEngUncomplicated
@DataEngUncomplicated 2 жыл бұрын
I don't think community edition supports Jupiter notebooks so it won't be possible as far as I know.
@thevijayraj34
@thevijayraj34 2 жыл бұрын
@@DataEngUncomplicated ok
@DataEngUncomplicated
@DataEngUncomplicated 2 жыл бұрын
Actually it says interactive glue sessions is supported by other IDS so it might be possible we don't need jupyter notebooks for this to work...I'm going to test this out I don't want to give you the wrong answer.
@thevijayraj34
@thevijayraj34 2 жыл бұрын
@@DataEngUncomplicated Thanks mate. Actually I'm stuck with Office credentials, I don't have free access to many things. 🥴
@trinath89
@trinath89 Жыл бұрын
@@DataEngUncomplicated Hi, I am struck with similar situation I am using the Pycharm latest community edition, i configured everything that is mentioned until 8:28 and then i cannot see the option to create a jupyter notebook and stuck here. Please help me.
@AmritAgarwal07
@AmritAgarwal07 Жыл бұрын
Jupiter-kernelspec: command not found
@DataEngUncomplicated
@DataEngUncomplicated Жыл бұрын
Hi Armit, check out the blog post I included in the description for code. Sounds like you might have missed a step.
@AmritAgarwal07
@AmritAgarwal07 Жыл бұрын
@@DataEngUncomplicated I have gone through with the blog. I am doing as per the steps but still facing the same issue
@AmritAgarwal07
@AmritAgarwal07 Жыл бұрын
@@DataEngUncomplicated in blog you are using EC2 not in video EC2 is not there
@DataEngUncomplicated
@DataEngUncomplicated Жыл бұрын
Hmm did you make sure your awscli is up to date? Not sure what version is mentions you need for this to work.
@AmritAgarwal07
@AmritAgarwal07 Жыл бұрын
@@DataEngUncomplicated can you help me to resolve this error
@shashankreddy8390
@shashankreddy8390 Жыл бұрын
Hi buddy this is a nice video, but every one creates video on reading and writing from s3. 1. Can you create a video on how to use Glue studio notebook to read data from Awsgluecatalog and write the results to S3? 2. Please can you include every step- i.e what kind of permissions should we need to create to read and write. Also recommend doing a video on Athena notebook editor reading data from Gluecatalog using pyspark. (Please also include detailed permissions steps)
AWS Glue 101 | Lesson 4: Glue Dev Endpoints
17:12
Johnny Chivers
Рет қаралды 11 М.
Now THIS is entertainment! 🤣
00:59
America's Got Talent
Рет қаралды 37 МЛН
ОСКАР vs БАДАБУМЧИК БОЙ!  УВЕЗЛИ на СКОРОЙ!
13:45
Бадабумчик
Рет қаралды 6 МЛН
Русалка
01:00
История одного вокалиста
Рет қаралды 7 МЛН
AWS Hands-On: ETL with Glue and Athena
22:35
Cumulus Cycles
Рет қаралды 25 М.
AWS Tutorials - Partition Data in S3 using AWS Glue Job
36:09
AWS Tutorials
Рет қаралды 17 М.
RAG from the Ground Up with Python and Ollama
15:32
Decoder
Рет қаралды 26 М.
AWS Tutorials - Working with Data Sources in AWS Glue Job
42:06
AWS Tutorials
Рет қаралды 9 М.
Why Data Engineers Should Develop AWS Glue Jobs Locally
6:45
DataEng Uncomplicated
Рет қаралды 6 М.
Orchestrate Glue Jobs With Step Functions
7:34
DataEng Uncomplicated
Рет қаралды 13 М.
Now THIS is entertainment! 🤣
00:59
America's Got Talent
Рет қаралды 37 МЛН