Why Data Engineers Should Develop AWS Glue Jobs Locally

  Рет қаралды 6,026

DataEng Uncomplicated

DataEng Uncomplicated

10 ай бұрын

If you're a data engineer, developer, or anyone working with AWS Glue, you'll know that the process of building and testing ETL jobs can be complex and resource-intensive. However, there's a solution that offers more control, faster feedback, and greater flexibility in your development process - developing your AWS Glue jobs locally. I will cover the top reasons I think Data Engineers will benefit from developing your glue jobs locally rather than using the UI on the AWS Glue Service.
Buy Me a Coffee: www.buymeacoffee.com/dataengu
My Tutorials for Running AWS Glue Locally
Configure AWS Glue with Docker - PyCharm: • Develop AWS Glue Jobs ...
Configure AWS Glue with Interactive Glue Sessions: • Author AWS Glue jobs w...

Пікірлер: 20
@user-vb7im1jb1b
@user-vb7im1jb1b 9 ай бұрын
Great informative video. Thanks for sharing. By the way, do you also have a tutorial showing how to work with interactive sessions with jupyter lab/notebooks (anaconda)?
@julianromero3359
@julianromero3359 4 ай бұрын
Awesome and valuable information. Great option to develop locally.
@DataEngUncomplicated
@DataEngUncomplicated 4 ай бұрын
Thanks Julian!
@user-nv9pq9ex8y
@user-nv9pq9ex8y 8 ай бұрын
Hi can you make an video on Data migration On premises to AWS cloud with end to end process and what are tools used.
@sjvr1628
@sjvr1628 10 ай бұрын
Keep doing more 😊
@DataEngUncomplicated
@DataEngUncomplicated 10 ай бұрын
Thanks! I will 😊 I have a lot of video ideas in the pipeline
@andrzejkozielec139
@andrzejkozielec139 10 ай бұрын
great video!
@DataEngUncomplicated
@DataEngUncomplicated 10 ай бұрын
Thanks!
@AdamAdam-oq4fy
@AdamAdam-oq4fy 10 ай бұрын
Well, my way of building glue jobs - using glue notebook/zepplin to build all the logic - using vscode/pycharm to wrap things up into classes/modules/methods with all the extentions of vscode - using cdk to deploy the glue job: using the scripts created above and link to the correct folder structure when deploying - once deployed, I should have my glue job ready on the console - run/ test/ or modify when needed, but I encourage doing the changes through code
@tello9504
@tello9504 6 ай бұрын
Do you have a tutorial?
@mickyman753
@mickyman753 3 ай бұрын
My team also does the same, I think if you have a established ci/cd setup then , this is the only way to perform addition of new glue jobs
@wilsonwaigant4827
@wilsonwaigant4827 10 ай бұрын
Nice video! I´m currently working on a project but I was worry about the cost of working on AWS. Now I have a question, if I started working locally, where could I storage the data that I´d generate in the process? and, how and when to migrate the whole work to AWS? Thank you!
@DataEngUncomplicated
@DataEngUncomplicated 10 ай бұрын
Thanks Wilson, so if you configure your AWS credentials, you can store your data in AWS s3 that you generate in the process if you need to store it. So you should migrate your process to AWS when you are done developing and ready to run your job on the actual data. I'm assuming your data is large and that's why you might want to use pyspark and a larger cluster to process it all. The best way to migrate it to AWS is by using infrastructure as code like cdf or terraform. I am going to make a video on how to do this with terraform soon.
@wilsonwaigant4827
@wilsonwaigant4827 10 ай бұрын
@@DataEngUncomplicated thank you! I'm waiting your video to learn more about it
@harshadk4264
@harshadk4264 3 ай бұрын
How do we orchestrate these aws glue jobs? Do we create the python code for eventbridge, lambda and step functions?
@DataEngUncomplicated
@DataEngUncomplicated 2 ай бұрын
You have many options for orchestrating glue jobs, Glue has an orchestration section which you can orchestrate your glue jobs. You can also orchestrate this in airflow if your company is already using this. If your jobs are more complex and requires trigging other aws services along the way, It would probably be a good idea to leverage step functions.
@ColdBlkPenguin
@ColdBlkPenguin 9 ай бұрын
Great video! Thank you for making this - my only feedback would be that it feels like you are reading a script to me (which I am sure you probably are). The information you are providing is great, but the delivery can feel a bit "lecturer reading off the powerpoint slides"-y. The video would also feel more engaging if you were "making eye contact" with the camera. Keep up the good work
@DataEngUncomplicated
@DataEngUncomplicated 9 ай бұрын
Thanks for the valuable feedback, I don't do too many talking head videos, it's definitely something I could improve on!
@externalbiconsultant2054
@externalbiconsultant2054 Ай бұрын
wondering if watching costs are really a data engineers activity?
@DataEngUncomplicated
@DataEngUncomplicated Ай бұрын
Yes, cost optimization is part of every role when working in a cloud environment. If you work for a large funded organization that isn't coming down on costs you might night feel it as much as a start up that freaks out for an extra $100 in cloud costs.
Mastering AWS Glue Unit Testing for PySpark Jobs with Pytest
11:41
DataEng Uncomplicated
Рет қаралды 4,5 М.
Olympic Data Analytics | Azure End-To-End Data Engineering Project
1:36:00
Cool Items! New Gadgets, Smart Appliances 🌟 By 123 GO! House
00:18
123 GO! HOUSE
Рет қаралды 17 МЛН
КАК ДУМАЕТЕ КТО ВЫЙГРАЕТ😂
00:29
МЯТНАЯ ФАНТА
Рет қаралды 8 МЛН
孩子多的烦恼?#火影忍者 #家庭 #佐助
00:31
火影忍者一家
Рет қаралды 52 МЛН
Build a Serverless Workflow with AWS Step Functions
39:42
Be A Better Dev
Рет қаралды 48 М.
Top AWS Services A Data Engineer Should Know
13:11
DataEng Uncomplicated
Рет қаралды 156 М.
7 Years of Software Engineering Advice in 18 Minutes
18:32
Author AWS Glue jobs with PyCharm Using AWS Glue Interactive Sessions
12:23
DataEng Uncomplicated
Рет қаралды 13 М.
Orchestrate Glue Jobs With Step Functions
7:34
DataEng Uncomplicated
Рет қаралды 13 М.
I Analysed 2,000 Cloud Jobs Ads: Here's What You NEED To Know
14:05
Open Up The Cloud
Рет қаралды 37 М.
AWS Glue Tutorial for Beginners [FULL COURSE in 45 mins]
41:30
Johnny Chivers
Рет қаралды 252 М.