Manage AWS Glue Jobs with Step Functions

  Рет қаралды 13,748

Knowledge Amplifier

Knowledge Amplifier

2 жыл бұрын

In this video , the usage of AWS Step Functions to orchestrate multiple Glue ETL jobs is explained from scratch.
Prerequisite:
------------------------
AWS Glue Workflow in-depth intuition with Lab
• AWS Glue Workflow in-d...
Build and automate Serverless DataLake using an AWS Glue , Lambda , Cloudwatch
• Build and automate Ser...
Step 1:
--------
Create a crawler
Step 2:
--------
Start crawler and get crawler state in Step Function
Step 3:
--------
Inspect the Json of GetCrawler component to build the if-else condition
Step 4:
--------
Create a waiter block
Step 5:
--------
Add the Glue Run Job component (Below code)--
(Configure the block as synchronous component i.e. call the service, and have Step Functions wait for a job to complete)
import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job
@params: [JOB_NAME]
args = getResolvedOptions(sys.argv, ['JOB_NAME'])
sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
job.init(args['JOB_NAME'], args)
datasource0 = glueContext.create_dynamic_frame.from_catalog(database = "{}", table_name = "{}", transformation_ctx = "datasource0")
datasink4 = glueContext.write_dynamic_frame.from_options(frame = datasource0, connection_type = "s3",
connection_options = {"path": "s3://{}/{}/"}, format = "parquet", transformation_ctx = "datasink4")
job.commit()
Reusable Step Function Json:
-------------------------------
{
"Comment": "A description of my state machine",
"StartAt": "StartCrawler",
"States": {
"StartCrawler": {
"Type": "Task",
"Parameters": {
"Name": "{Write the Crawler name here}"
},
"Resource": "arn:aws:states:::aws-sdk:glue:startCrawler",
"Next": "GetCrawler"
},
"GetCrawler": {
"Type": "Task",
"Parameters": {
"Name": "{Write the Crawler name here}"
},
"Resource": "arn:aws:states:::aws-sdk:glue:getCrawler",
"Next": "Choice"
},
"Choice": {
"Type": "Choice",
"Choices": [
{
"Variable": "$.Crawler.State",
"StringEquals": "RUNNING",
"Next": "Wait"
}
],
"Default": "Glue StartJobRun"
},
"Wait": {
"Type": "Wait",
"Seconds": 5,
"Next": "GetCrawler"
},
"Glue StartJobRun": {
"Type": "Task",
"Resource": "arn:aws:states:::glue:startJobRun.sync",
"Parameters": {
"JobName": "{Write the Job name here}"
},
"End": true
}
}
}
Learn AWS Step Fucniton from Scratch:
• AWS Step Functions Sim...
Check this playlist for more AWS Projects in Big Data domain:
• Demystifying Data Engi...
🙏🙏🙏🙏🙏🙏🙏🙏
YOU JUST NEED TO DO
3 THINGS to support my channel
LIKE
SHARE
&
SUBSCRIBE
TO MY KZfaq CHANNEL

Пікірлер: 17
@josemanuelgutierrez4095
@josemanuelgutierrez4095 Жыл бұрын
I have a question my friend what happen is that I have 2 csv ein my bucket and when I want to execute my crawler in my tables I see both csvs no the name of my bucket as you , do you think some steps are missing ? . Thx
@SimonLopez-hj2cj
@SimonLopez-hj2cj 2 ай бұрын
how can i know the json output without executing the state machine?
@StephenNyatsine
@StephenNyatsine 4 ай бұрын
Very helpful but anyone can assist I am getting the below error "error":"States.Runtime" "cause":"Invalid path '$.Crawler.State': The choice state's condition path references an invalid value." }
@FaresTabet
@FaresTabet Жыл бұрын
Great video! Well prepared with examples, it helped me a lot
@KnowledgeAmplifier1
@KnowledgeAmplifier1 Жыл бұрын
Glad to know the video is helpful to you Fares Tabet! Happy Learning :-)
@youdontneedmyname2298
@youdontneedmyname2298 Жыл бұрын
Thank you!
@KnowledgeAmplifier1
@KnowledgeAmplifier1 Жыл бұрын
You are welcome buddy ! Happy Learning
@InvestorKiddd
@InvestorKiddd Жыл бұрын
Hi, very nice video, but is there any way to provide database name and table in glue as a input in step function instead of hard coding it inside script? Same question for crawler also, can we provide s3 object as a input?
@InvestorKiddd
@InvestorKiddd 7 ай бұрын
@@VinayGanesh-nk2lk yes, you need to create script file and save it in s3 bucket, and forward that key to glue
@InvestorKiddd
@InvestorKiddd 7 ай бұрын
@@VinayGanesh-nk2lk will share on Monday, remind me once if I forget
@josemanuelgutierrez4095
@josemanuelgutierrez4095 Жыл бұрын
Hi my friend I have a question , the code that you put inside of glue job , that codes convert cvs to parquet , right?
@KnowledgeAmplifier1
@KnowledgeAmplifier1 Жыл бұрын
@josemanuelgutierrez4095 yes correct
@josemanuelgutierrez4095
@josemanuelgutierrez4095 Жыл бұрын
@@KnowledgeAmplifier1 Thanks you my friend ,I like your videos , those videos help me to improve my skills a lot :v
@KnowledgeAmplifier1
@KnowledgeAmplifier1 Жыл бұрын
@@josemanuelgutierrez4095 glad to hear that .. Happy Learning
@DanielWeikert
@DanielWeikert Жыл бұрын
Do you know / use a good documentation to see how the json response always looks like? Because this is required to then refer to e.g $.Crawler.State thx
@KnowledgeAmplifier1
@KnowledgeAmplifier1 Жыл бұрын
Hello Daniel Weikert, you can check aws documentation ( docs.aws.amazon.com/step-functions/latest/dg/welcome.html ) or else simple way is to use pass block to check respose and then further code accordingly as I explained in this video 😊
Preventing Duplicate Executions in Step Function
25:00
Knowledge Amplifier
Рет қаралды 1,5 М.
New model rc bird unboxing and testing
00:10
Ruhul Shorts
Рет қаралды 23 МЛН
HAPPY BIRTHDAY @mozabrick 🎉 #cat #funny
00:36
SOFIADELMONSTRO
Рет қаралды 17 МЛН
Happy 4th of July 😂
00:12
Alyssa's Ways
Рет қаралды 64 МЛН
Build a Serverless Workflow with AWS Step Functions
39:42
Be A Better Dev
Рет қаралды 48 М.
AWS Tutorials - Using Concurrent AWS Glue Jobs
24:33
AWS Tutorials
Рет қаралды 6 М.
What are AWS Step Functions? (and why you should love them)
13:31
Be A Better Dev
Рет қаралды 198 М.
AWS Step Functions: Handling errors
59:55
Serverless Land
Рет қаралды 15 М.
AWS Step Functions Crash Course | Step by Step Tutorial
1:16:28
Enlear Academy
Рет қаралды 59 М.
Сколько реально стоит ПК Величайшего?
0:37
Отдых для геймера? 😮‍💨 Hiper Engine B50
1:00
Вэйми
Рет қаралды 1,2 МЛН
iPhone 15 Pro в реальной жизни
24:07
HUDAKOV
Рет қаралды 411 М.
Samsung Galaxy 🔥 #shorts  #trending #youtubeshorts  #shortvideo ujjawal4u
0:10
Ujjawal4u. 120k Views . 4 hours ago
Рет қаралды 8 МЛН
Это Xiaomi Su7 Max 🤯 #xiaomi #su7max
1:01
Tynalieff Shorts
Рет қаралды 1,2 МЛН