Build and automate Serverless DataLake using an AWS Glue , Lambda , Cloudwatch

  Рет қаралды 8,412

Knowledge Amplifier

Knowledge Amplifier

2 жыл бұрын

In this video, how to create a fully automated data cataloging and ETL pipeline to transform your data is explained in-depth from scratch.
Prerequisite:
-----------------------
Implement a CloudWatch Events Rule That Calls an AWS Lambda Function
• Implement a CloudWatch...
Using AWS Lambda with Amazon CloudWatch Events | Send notification when ec2 stops
• Using AWS Lambda with ...
Pipeline design with monitoring and alert functionalities using Cloudwatch Alarm , EC2 & Lambda
• Pipeline design with m...
Enable CloudWatch logs for API Gateway | Monitoring and Logging API Activity
• Enable CloudWatch logs...
Invoking State Machine with CloudWatch
• Invoking State Machine...
AWS Glue Workflow in-depth intuition with Lab
• AWS Glue Workflow in-d...
An automated data pipeline using Lambda, S3 and Glue - Big Data with Cloud Computing
• An automated data pipe...
Lambda Code to trigger Glue Crawler:
---------------------------------------------------------------
import json
import boto3
glue=boto3.client('glue');
def lambda_handler(event, context):
TODO implement
response = glue.start_crawler(
Name='{Put the Name of the Glue Crawler here}'
)
return {
'statusCode': 200,
'body': json.dumps('Hello from Lambda!')
}
Lambda Code to trigger Glue Job:
----------------------------------------------------------
import json
import boto3
def lambda_handler(event, context):
glue=boto3.client('glue');
response = glue.start_job_run(JobName = "{Put the Glue ETL Job name here}")
print("Lambda Invoke")
Glue Code:
---------------------
import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job
@params: [JOB_NAME]
args = getResolvedOptions(sys.argv, ['JOB_NAME'])
sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
job.init(args['JOB_NAME'], args)
datasource0 = glueContext.create_dynamic_frame.from_catalog(database = "{}", table_name = "{}", transformation_ctx = "datasource0")
datasink4 = glueContext.write_dynamic_frame.from_options(frame = datasource0, connection_type = "s3",
connection_options = {"path": "s3://{}/{}/"}, format = "parquet", transformation_ctx = "datasink4")
job.commit()
Cloudwatch rule for trigger the Lambda on success of the Glue Crawler:
-----------------------------------------------------------------------------------------------------------------------
{
"source": [
"aws.glue"
],
"detail-type": [
"Glue Crawler State Change"
],
"detail": {
"state": [
"Succeeded"
],
"crawlerName": [
"{Put your Crawler Name here}"
]
}
}
Cloudwatch rule for Triggering the SNS on success of Glue Job:
---------------------------------------------------------------------------------------------------------
{
"source": [
"aws.glue"
],
"detail-type": [
"Glue Job State Change"
],
"detail": {
"jobName": [
"{Put your Job name here}"
],
"state": [
"SUCCEEDED"
]
}
}
Check this playlist for more AWS Projects in Big Data domain:
• Demystifying Data Engi...

Пікірлер: 17
@sjdreams_13615
@sjdreams_13615 Жыл бұрын
It’s a great job done by you explaining the serverless Glue ETL process. Its the best video I found on KZfaq on this topic so far 👍🏻
@KnowledgeAmplifier1
@KnowledgeAmplifier1 Жыл бұрын
Thank you so much for your positive feedback, Sravan Kumar Jalluri! I am glad to hear that my video was helpful to you. Happy Learning
@Someonner
@Someonner Жыл бұрын
Most underrated video in AWS.
@deepakrawat5065
@deepakrawat5065 Жыл бұрын
Thank you Knowledge Amplifier for sharing your knowledge in simple and clear way
@KnowledgeAmplifier1
@KnowledgeAmplifier1 Жыл бұрын
You are welcome Deepak Rawat! Happy Learning
@rahulkakade1579
@rahulkakade1579 Жыл бұрын
Hats off to you brother what a details explanation thanks for sharing this
@KnowledgeAmplifier1
@KnowledgeAmplifier1 Жыл бұрын
Thank you Rahul kakade15 for your inspiring comment ! Happy Learning
@adesuraj4649
@adesuraj4649 Жыл бұрын
Great explanation 🙂
@KnowledgeAmplifier1
@KnowledgeAmplifier1 Жыл бұрын
Thank you Ade Suraj! Happy Learning
@SourabhDattalkar89
@SourabhDattalkar89 11 ай бұрын
Great video you have explained 5 hrs process in few minutes 😂
@KnowledgeAmplifier1
@KnowledgeAmplifier1 10 ай бұрын
Glad it helped!
@nagasabsreeshgontla4628
@nagasabsreeshgontla4628 Жыл бұрын
Running the crawler everytime when csv uploaded is not required right. Because it also increase the cost for crawler
@rahulkakade1579
@rahulkakade1579 Жыл бұрын
Can you please make video on what is sns,sqs,event bridge and when to use what 🙂
@KnowledgeAmplifier1
@KnowledgeAmplifier1 Жыл бұрын
ok sure Rahul kakade15, noted in backlog...
@rahulkakade1579
@rahulkakade1579 Жыл бұрын
@@KnowledgeAmplifier1 thank buddy
@Ashisagrawall
@Ashisagrawall 2 жыл бұрын
Hello I need to talk to you. Please let me know how to contact you, need some help related to an application we are building and wanted to use snowflake. Please let me know
@KnowledgeAmplifier1
@KnowledgeAmplifier1 2 жыл бұрын
Please post your doubt or requirements here buddy .. if I have knowledge in that domain , I will surely try to help u out :-)
Crawl different datastores in a single Glue Crawler job
8:45
Knowledge Amplifier
Рет қаралды 2,3 М.
Slow motion boy #shorts by Tsuriki Show
00:14
Tsuriki Show
Рет қаралды 6 МЛН
Эффект Карбонаро и нестандартная коробка
01:00
История одного вокалиста
Рет қаралды 9 МЛН
Heartwarming moment as priest rescues ceremony with kindness #shorts
00:33
Fabiosa Best Lifehacks
Рет қаралды 37 МЛН
An automated data pipeline using Lambda, S3 and Glue
16:26
Future X Insights
Рет қаралды 23 М.
How to Use AWS Glue with Snowflake | PySpark-Snowflake Connectivity
25:01
Knowledge Amplifier
Рет қаралды 16 М.
ETL From AWS S3 to Amazon Redshift with AWS Lambda dynamically.
35:02
Cloud Quick Labs
Рет қаралды 30 М.
Смартфон УЛУЧШАЕТ ЗРЕНИЕ!?
0:41
ÉЖИ АКСЁНОВ
Рет қаралды 1,1 МЛН
Как бесплатно замутить игровой ноутбук
1:00
ЖЕЛЕЗНЫЙ КОРОЛЬ
Рет қаралды 238 М.
Копия iPhone с WildBerries
1:00
Wylsacom
Рет қаралды 485 М.
تجربة أغرب توصيلة شحن ضد القطع تماما
0:56
صدام العزي
Рет қаралды 57 МЛН