Orchestrate Glue Jobs With Step Functions

  Рет қаралды 13,278

DataEng Uncomplicated

DataEng Uncomplicated

2 жыл бұрын

This is a step-by-step tutorial on how to create a step function to orchestrate a single or multiple glue jobs and configure the I am role.
#aws #awsglue #stepfunctions
IAM Permission Link: docs.aws.amazon.com/step-func...

Пікірлер: 40
@PatrickPoplawska
@PatrickPoplawska 8 ай бұрын
Excellent video. To the point, called out common failure points. Well done all around.
@DataEngUncomplicated
@DataEngUncomplicated 8 ай бұрын
Thanks for the comment Patrick, much appreciated!
@khandoor7228
@khandoor7228 2 жыл бұрын
I am really interested in Step Functions as well. Thanks for this, hope you do more!
@DataEngUncomplicated
@DataEngUncomplicated 2 жыл бұрын
Thanks! Absolutely! More videos to come!
@julioarenas7150
@julioarenas7150 Жыл бұрын
Thank you very much, very well explained very precise. greetings from Chile
@DataEngUncomplicated
@DataEngUncomplicated Жыл бұрын
Thank you my Chilean friend!
@NehalVerma-zr4mq
@NehalVerma-zr4mq Жыл бұрын
Thanks Brother! You Great!
@DataEngUncomplicated
@DataEngUncomplicated Жыл бұрын
Thanks Nehal!
@jaffarahamed6089
@jaffarahamed6089 2 жыл бұрын
Well explained... Thanks 👍🏻
@DataEngUncomplicated
@DataEngUncomplicated 2 жыл бұрын
Glad it was helpful!
@bhumisounds5107
@bhumisounds5107 Жыл бұрын
The additional policy adds that you mentioned helped a lot. My machine was hanging.
@DataEngUncomplicated
@DataEngUncomplicated Жыл бұрын
Your welcome, glad you got it working
@claytonvanderhaar3772
@claytonvanderhaar3772 Жыл бұрын
Hi great tutorial as usual but I am struggling with get a choice working I am not sure how to get the result input path from the Glue job and then pass it onto the choice state please if you know how do this I would really appreciate it
@user-hv9wx2md3c
@user-hv9wx2md3c 5 ай бұрын
could you please upload the complete AWS data engineering playlist? It will be helpful for us. your tutorials are easy to watch and grab things faster. Thank you.
@DataEngUncomplicated
@DataEngUncomplicated 5 ай бұрын
Hey, that's a good idea, I can put them all into 1 playlist. It will be a lot of videos though, I kind of broke them down into different aws services
@cringe6006
@cringe6006 Жыл бұрын
Really great video Thank you for posting Hope you don't get demotivated by view count 😭 Your videos are really good.
@DataEngUncomplicated
@DataEngUncomplicated Жыл бұрын
Thanks! Much appreciated!
@felixa4705
@felixa4705 Жыл бұрын
As of today, there are about 6k views! That's a lot more people than you could reach through normal means. I think they're doing a great job!
@joegenshlea6827
@joegenshlea6827 Жыл бұрын
Thank you so much for this video. It was a huge help to show the IAM permissions for the Glue job. Is there anything about the "permission_to_glue_topic" permission that we should know? Also, In my lambda invocation I'm pasting the lambda "event" json object into the the payload options which seems to work beautifully. Is there a way to reference the event configuration in lambda from the step function directly without having to copy-and-paste?
@DataEngUncomplicated
@DataEngUncomplicated Жыл бұрын
Hi Joe, You're welcome! If you are trying to pass your event payload to your lambda function through step functions, when you are running your step function execution in the console manually, you can paste your test payload there. You should set up your step function so the payload gets passed directly to your lambda function with the parameters your lambda needs. I hope this is what you are looking for.
@theroadbacktonature
@theroadbacktonature Жыл бұрын
thanks for the demo. Can you provide more details on what Glue publishes to SNS? So we dont have to write any custom json message to sns from glue, that Glue writes success or failure depending the run state automatically?
@DataEngUncomplicated
@DataEngUncomplicated Жыл бұрын
Hi Pradeep, if you attempt to configure a rule with eventbridge with the glue sample, it will tell you what the general payload will look like being passed to sns: for example: { "version": "0", "id": "66fbc5e1-aac3-5e85-63d0-856ec669a050", "detail-type": "Glue Job Run Status", "source": "aws.glue", "account": "123456789012", "time": "2018-04-24T20:57:34Z", "region": "us-east-1", "resources": [], "detail": { "jobName": "MyJob", "severity": "INFO", "notificationCondition": { "NotifyDelayAfter": 1 }, "state": "STARTING", "jobRunId": "jr_6aa58e7a3aa44e2e4c7db2c50e2f7396cb57901729e4b702dcb2cfbbeb3f7a86", "message": "Job is in STARTING state", "startedOn": "2018-04-24T20:55:47.941Z" } }
@STEVEN4841
@STEVEN4841 5 ай бұрын
Very useful, thanks, but, if I need to call 5 glue have bs for example, I can tell crate a workflow an then call whit workflow from this same way?
@DataEngUncomplicated
@DataEngUncomplicated 5 ай бұрын
Hi Steven, can you edit your sentance, I don't understand what you trying to do.
@GiorgosBastoulis
@GiorgosBastoulis 5 ай бұрын
Excellent video, thanks for sharing! I have a question, I want to run a bash script and trigger it via Lambda with Step Functions. Is that possible?
@DataEngUncomplicated
@DataEngUncomplicated 5 ай бұрын
Yes, you can “wrap” your bash script within a supported language like Node.js or Python. For example, in Node.js, you can use the child_process module to execute a bash script. Remember to package your bash script and any other necessary files into a ZIP file and upload it to AWS Lambda. Also, ensure that your bash script has the appropriate permissions to be executable.
@oscarnegrete486
@oscarnegrete486 2 жыл бұрын
What are the permissions for the publish_to_glue_topic?
@DataEngUncomplicated
@DataEngUncomplicated 2 жыл бұрын
Hi Oscar, It just had the sns:Publish action. The full statement looks like this: { "Version": "2012-10-17", "Statement": [ { "Sid": "VisualEditor0", "Effect": "Allow", "Action": "sns:Publish", "Resource": "arn:aws:sns:us-east-1:account#:glue_jobs" } ] }
@Kaisean
@Kaisean 4 ай бұрын
What would be the rationale for using Glue in Step Functions vs. Glue Orchestration? If you're doing more than using GlueJob and GlueCrawler, Step Functions make sense, but is that all?
@DataEngUncomplicated
@DataEngUncomplicated 4 ай бұрын
The choice between using AWS Glue in Step Functions vs. Glue Orchestration (Glue Workflows) depends on the complexity of your data pipeline and the services you’re using. AWS Glue Workflows are beneficial when you’re chaining together multiple Glue jobs and/or crawler. They are particularly useful for batch processing, where you can schedule workflows directly. However, Glue Workflows lack several features common in flow control tools, such as conditional branching, loops, dynamic maps, and custom steps. On the other hand, AWS Step Functions are more suitable when the complexity exceeds simple triggers and the services used extend beyond Glue. Step Functions provide more advanced orchestration capabilities, including support for error handling, parallel execution, and conditional logic. They also integrate with over 220 AWS services, making them a more flexible choice for complex, multi-service workflows. In addition, Step Functions can handle quick start and shutdown, which can manage a reasonable throughput. They also allow for the execution of parallel jobs, which is not possible in Glue Workflows.
@InvestorKiddd
@InvestorKiddd Жыл бұрын
is their any way to give s3 path and database as input to JobRun s3 stepfunction?
@DataEngUncomplicated
@DataEngUncomplicated Жыл бұрын
Yes, you can pass the S3 path and database as input parameters to an AWS Step Functions State Machine that includes an AWS Glue JobRun S3 Step. When you define your Step Function state machine, you can include an input parameter section that specifies the input data that will be passed to the state machine when it is executed. You can define the input parameters as key-value pairs in JSON format.
@InvestorKiddd
@InvestorKiddd Жыл бұрын
@@DataEngUncomplicated thanks,
@Velben
@Velben Жыл бұрын
I'm curious. How did you learn data engineering?
@DataEngUncomplicated
@DataEngUncomplicated Жыл бұрын
Working as a data engineer and in the data analytics field for 10 years. Also doing Udemy courses, AWS certifications and side projects to continue to learn as the field is changing so fast with new services coming out all the time.
@mallikarjunsangannavar907
@mallikarjunsangannavar907 Жыл бұрын
How to enable the step function to run the jobs in parallel
@DataEngUncomplicated
@DataEngUncomplicated Жыл бұрын
Hi Mallikarjun, there is a parallel state which will allow you to run whatever jobs in parallel
@SimonLopez-hj2cj
@SimonLopez-hj2cj 2 ай бұрын
how do i get to personalize the message that sns sends?
@DataEngUncomplicated
@DataEngUncomplicated 2 ай бұрын
In the sns step there should be a box where you can customize the message
@SimonLopez-hj2cj
@SimonLopez-hj2cj 2 ай бұрын
@@DataEngUncomplicated then how do i use the parameters of the job? for example if i want to send "The job state is (~SUCCEDED~ or ~FAILED~). At this time ~endtime~ ", thanks
Top AWS Services A Data Engineer Should Know
13:11
DataEng Uncomplicated
Рет қаралды 156 М.
Sigma Kid Hair #funny #sigma #comedy
00:33
CRAZY GREAPA
Рет қаралды 30 МЛН
WHAT’S THAT?
00:27
Natan por Aí
Рет қаралды 12 МЛН
Slow motion boy #shorts by Tsuriki Show
00:14
Tsuriki Show
Рет қаралды 5 МЛН
Beautiful gymnastics 😍☺️
00:15
Lexa_Merin
Рет қаралды 12 МЛН
Manage AWS Glue Jobs with Step Functions
19:36
Knowledge Amplifier
Рет қаралды 13 М.
AWS Tutorials - Using AWS Glue Workflow
30:55
AWS Tutorials
Рет қаралды 12 М.
Step Functions vs EventBridge Scheduler for AWS Timers
18:43
Be A Better Dev
Рет қаралды 6 М.
AWS Tutorials - Using Concurrent AWS Glue Jobs
24:33
AWS Tutorials
Рет қаралды 6 М.
What are AWS Step Functions? (and why you should love them)
13:31
Be A Better Dev
Рет қаралды 198 М.
Author AWS Glue jobs with PyCharm Using AWS Glue Interactive Sessions
12:23
DataEng Uncomplicated
Рет қаралды 13 М.
Sigma Kid Hair #funny #sigma #comedy
00:33
CRAZY GREAPA
Рет қаралды 30 МЛН