Tracking Processed Data Using AWS Glue Job Bookmarks | Incremental ETL In-depth intuition

  Рет қаралды 6,438

Knowledge Amplifier

Knowledge Amplifier

2 жыл бұрын

Why only incremental ingestion ? Why not complete Incremental Pipeline starting from Ingestion , Curation & Publishing data ?
If the business logic what we are implementing in curate layer not dependent on past processed data ,then not only ingestion , complete pipeline we can make as Incremental & AWS Glue give the opportunity to do so using one of it's most powerful feature --Job Bookmarking 😊
Today in this video , I have discussed about Job Booking concept in Glue .
For details , you can refer this documentation --
docs.aws.amazon.com/glue/late...
V.V.I. Note:
-----------------
To identify which files stored on S3 to process, job bookmarks check the last modified time of the objects, not the file names. If your input objects changed since the last time the job ran, then they are reprocessed when the job runs again.
Prerequisite:
------------------------
An automated data pipeline using Lambda, S3 and Glue - Big Data with Cloud Computing
• An automated data pipe...
How to Use AWS Glue with Snowflake | PySpark-Snowflake Connectivity
• How to Use AWS Glue wi...
Set up the necessary AWS services to query the data inside an Amazon S3 (Datalake) using AWS Athena
• Set up the necessary A...
Transform Data Using AWS Glue and Amazon Athena
• Transform Data Using A...
Schema Evolution in AWS Glue using Glue Crawler | AWS Athena
• Schema Evolution in AW...
Simplify Amazon DynamoDB data extraction and analysis by using AWS Glue and Amazon Athena
• Simplify Amazon Dynamo...
AWS Glue as Hive catalog
• Using the AWS Glue Dat...
A very frequent technical requirement in big data domain--
You have to write spark dataframe but with kms encryption, if you are using Glue , then this is one approach you can try to improve the security of your pipeline by enabling server side encryption
• Security Configuration...
Incremental Glue crawling using Amazon S3 Event Notifications
• Incremental Glue crawl...
Check this playlist for more Data Engineering related videos:
• Demystifying Data Engi...

Пікірлер: 16
@yashgangrade5460
@yashgangrade5460 2 ай бұрын
I ran glue crawler but it's giving error HIVE_INVALID_METADATA: Hive metadata for table raw is invalid: Table descriptor contains duplicate columns.
@manojt7012
@manojt7012 2 жыл бұрын
Ur consistency is just inspiring... Fan of ur contents 👌🏻
@KnowledgeAmplifier1
@KnowledgeAmplifier1 2 жыл бұрын
Thank you Manoj T for your continuous support ! Happy Learning :-)
@FRUXT
@FRUXT 6 ай бұрын
How the job bookmark knows what to increment ? We need to specify it to track a specific column ?
@basavapn6487
@basavapn6487 2 ай бұрын
Can you please make an video when i have requirement where daily an getting files into s3 bucket and i want to process last 90days data present in s3 using glue
@balasakiran
@balasakiran Жыл бұрын
Nice explonatios, crisp and clearn. I have a quick question, over a period of time, say after 2 months, if there is a need to do a history load(process all files ) , how can this be achieved ?
@tcsanimesh
@tcsanimesh Жыл бұрын
Superb explanation!! However I have one question. When we enable bookmark for incremental load.. let’s assume the requirement is for incremental load only but it’s not daily but weekly.. so I mean weekly incremental load.. in that case also will this concept work.. I mean doesn’t aws glue read a definite duration back from the bookmarked time stamp only or it is like read all files after the last book marked time stamp
@farookshaik7462
@farookshaik7462 2 жыл бұрын
Really useful. Keeping going..
@KnowledgeAmplifier1
@KnowledgeAmplifier1 2 жыл бұрын
Thank you Farook Shaik! Happy Learning :-)
@MatheusRibeiro-or2hq
@MatheusRibeiro-or2hq Жыл бұрын
Great Video!
@KnowledgeAmplifier1
@KnowledgeAmplifier1 Жыл бұрын
Thank you Matheus Ribeiro! Happy Learning
@trinath89
@trinath89 Жыл бұрын
Hi, great video.. thanks for taking time to create this video, Please share the link for the incremental data load from RDS - Thanks
@ravikreddy7470
@ravikreddy7470 Жыл бұрын
What's the difference between incremental job bookmarking and incremental crawling?
@KnowledgeAmplifier1
@KnowledgeAmplifier1 Жыл бұрын
Ravi K R , Incremental crawls helps to prevent recrawling of same data from source systems , instead of that crawl only new data and make it available in Glue Catalog for processing , & AWS Glue Job bookmarking helps to prevent the reprocessing of old data . One helps in crawling incrementally , one helps in processing incrementally .... Hope this will give you some idea , for more details , you can refer these links -- Incremental crawls in AWS Glue docs.aws.amazon.com/glue/latest/dg/incremental-crawls.html Tracking processed data using job bookmarks docs.aws.amazon.com/glue/latest/dg/monitor-continuations.html Happy Learning
@ravikreddy7470
@ravikreddy7470 Жыл бұрын
@@KnowledgeAmplifier1 crawling and processing both are different?
@KnowledgeAmplifier1
@KnowledgeAmplifier1 Жыл бұрын
@@ravikreddy7470 yes , crawler creates the metadata that allows GLUE Jobs and services such as ATHENA to view the S3 information as a database with tables & process it .
AWS Glue Workflow in-depth intuition with Lab
30:24
Knowledge Amplifier
Рет қаралды 3,9 М.
KINDNESS ALWAYS COME BACK
00:59
dednahype
Рет қаралды 162 МЛН
ОСКАР vs БАДАБУМЧИК БОЙ!  УВЕЗЛИ на СКОРОЙ!
13:45
Бадабумчик
Рет қаралды 6 МЛН
AWS Tutorials - Using Job Bookmarks in AWS Glue Jobs
36:14
AWS Tutorials
Рет қаралды 12 М.
AWS Tutorials - Data Quality Check using AWS Glue DataBrew
42:50
AWS Tutorials
Рет қаралды 9 М.
AWS Tutorials - Partition Data in S3 using AWS Glue Job
36:09
AWS Tutorials
Рет қаралды 17 М.
Смартфон УЛУЧШАЕТ ЗРЕНИЕ!?
0:41
ÉЖИ АКСЁНОВ
Рет қаралды 1,1 МЛН
Зачем ЭТО электрику? #секрет #прибор #энерголикбез
0:56
Александр Мальков
Рет қаралды 625 М.
Как распознать поддельный iPhone
0:44
PEREKUPILO
Рет қаралды 2 МЛН
Собери ПК и Получи 10,000₽
1:00
build monsters
Рет қаралды 2,8 МЛН
😱Хакер взломал зашифрованный ноутбук.
0:54
Последний Оплот Безопасности
Рет қаралды 948 М.