Cricket Statistics Data Pipeline in Google Cloud using Airflow | Data Engineering Project

  Рет қаралды 15,675

TechTrapture

TechTrapture

7 ай бұрын

Looking to get in touch?
Drop me a line at vishal.bulbule@gmail.com, or schedule a meeting using the provided link topmate.io/vishal_bulbule Cricket Statistics Data Pipeline in Google Cloud using Airflow,Dataflow,Cloud Function and Looker Studio
Data Retrieval: We fetch data from the Cricbuzz API using Python.
Storing Data in GCS: After fetching the data, we store it in a CSV file in Google Cloud Storage (GCS).
Cloud Function Trigger: Create a Cloud Function that triggers upon file upload to the GCS bucket. The function will execute when a new CSV file is detected and trigger dataflow job.
Cloud Function Execution: Inside the Cloud Function, we will have code that triggers a Dataflow job. Ensure you handle the trigger correctly and pass the required parameters to initiate the Dataflow job.
Dataflow Job: The Dataflow job is triggered by the Cloud Function and loads the data from the CSV file in the GCS bucket into BigQuery. Ensure you have set up the necessary configurations.
Looker Dashboard: BigQuery serves as the data source for your Looker Studio dashboard. Configure Looker to connect to BigQuery and create the dashboard based on the data loaded.
Github Repo for all code used in this project
github.com/vishal-bulbule/cri...
============================================
Associate Cloud Engineer -Complete Free Course
• Associate Cloud Engine...
Google Cloud Data Engineer Certification Course
• Google Cloud Data Engi...
Google Cloud Platform(GCP) Tutorials
• Google Cloud Platform(...
Generative AI
• Generative AI
Getting Started with Duet AI
• Getting started with D...
Google Cloud Projects
• Google Cloud Projects
Python For GCP
• Python for GCP
Terraform Tutorials
• Terraform Associate C...
Linkedin
/ vishal-bulbule
Medium Blog
/ vishalbulbule
Github Repository for Source Code
github.com/vishal-bulbule
Email - vishal.bulbule@techtrapture.com
#dataengineeringessentials #dataengineers #dataengineeringproject #airflow #dataflow #cloudcomposer #bigquery #looker #googlecloud #datapipeline

Пікірлер: 36
@dhananjaylakkawar4621
@dhananjaylakkawar4621 7 ай бұрын
I was thinking to build a project on GCP and your video arrived . great work sir! thank you
@bernasiakk
@bernasiakk 7 күн бұрын
This is great! I followed your video step-by-step, and now it's time for me to do a project of my own based on your stuff! Will use something more European though, like soccer or basketball haha :D Thanks!!!
@techtrapture
@techtrapture 7 күн бұрын
True...better for you not to use Cricket 😅😅
@ajayagrawal7586
@ajayagrawal7586 4 ай бұрын
I was looking for this type of video for a long time. Thanks.
@venkatatejanatireddi8018
@venkatatejanatireddi8018 7 ай бұрын
I sincerely recommend this to people who wants to explore DE pipeline orchestration on GCP
@shyjukoppayilthiruvoth6568
@shyjukoppayilthiruvoth6568 2 ай бұрын
Very good video. would recommend to any one who is new to GCP
@prabhuduttasahoo7802
@prabhuduttasahoo7802 4 ай бұрын
Learnt a lot from you. Thank you sir
@brjkumar
@brjkumar 6 ай бұрын
Good job. Looks like the best video for GCP ELT & other GCP stuff.
@techtrapture
@techtrapture 6 ай бұрын
Glad it was helpful!
@balajichakali9293
@balajichakali9293 6 ай бұрын
Thanks is a small word to you sir..🙏 This is the Best Explanation I ever seen in youtube. It is very helpful to me. I have completed this project end to end and l have learnt so many things.
@techtrapture
@techtrapture 5 ай бұрын
Glad that it helped you.
@wreckergta5470
@wreckergta5470 5 ай бұрын
Thank you, learned a lot from you sir
@techtrapture
@techtrapture 5 ай бұрын
Happy to know. Keep learning brother 🎉
@rishiraj2548
@rishiraj2548 7 ай бұрын
Thanks
@user-ws9xy6db6y
@user-ws9xy6db6y 5 ай бұрын
Thanks a lot for such great explanation. Can you please share which video recording/editing tool is being used?
@ashishvats1515
@ashishvats1515 16 күн бұрын
Hello, sir! Great video. If we need to implement CDC or append new data to a table, do we have to extract the data date-wise and load it to GCS? And how do we append that data to an existing table in BigQuery? Cloud Composer: Extract data from an API and load it to GCS. Cloud Function: Trigger the event to load a new CSV file to BigQuery using Dataflow. So where do we need to write the logic to append the new data to an existing table in BigQuery?
@Anushri_M29
@Anushri_M29 2 ай бұрын
Hi Vishal, this is a really great video, but it would be very helpful if you could also explain the code that you have written from 6:01.
@NirvikVermaBCE
@NirvikVermaBCE 5 ай бұрын
I am getting stuck on the airflow code, I think it might be an issue with the filename in the python code, bash_command='python /home/airflow/gcs/dags/scripts/extract_data_and_push_gcs.py', I have uploaded the extract_data_and_push_gcs.py in scripts of dags. However, is there any way to check the path /home/airflow/gcs/dags/scripts/ ??
@techtrapture
@techtrapture 5 ай бұрын
/home/airflow/gcs/dags = your dags GCS bucket It's same path
@venkatatejanatireddi8018
@venkatatejanatireddi8018 7 ай бұрын
i have been facing issues invoking the dataflow job, while using the default App engine service account. Could you let me know if you were using a specific service account to work with the cloud function?
@techtrapture
@techtrapture 7 ай бұрын
No, I am using the same default service account.what error you are getting?
@SwapperTheFirst
@SwapperTheFirst 5 ай бұрын
Hi Vishal, in this and your other Composer videos you use standard Airflow operators (for example, Python or Bash). Do you know how to install Google Cloud Airflow package for Google cloud specific operators? I've tried to upload the wheel to /plugins bucket, but nothing happens. Composer can't import Google Cloud operators (like pubsub) and DAGs with these operators are listed as broken. Thanks!
@techtrapture
@techtrapture 5 ай бұрын
I usually refer this code sample airflow.apache.org/docs/apache-airflow-providers-google/stable/operators/cloud/index.html
@SwapperTheFirst
@SwapperTheFirst 5 ай бұрын
@@techtrapture thanks! But how to use these operators in Composer? In Airflow I just pip install the package. How to do this in Composer?!
@techtrapture
@techtrapture 5 ай бұрын
Ohh k got your doubts now...you have to add it in requirements.txt and keep in dags folder. Also other options available here. cloud.google.com/composer/docs/how-to/using/installing-python-dependencies
@SwapperTheFirst
@SwapperTheFirst 5 ай бұрын
​@@techtrapture yes, this is exacthly what I needed. I can use both of these options, depending on the DAGs. Great!
@ShigureMuOnline
@ShigureMuOnline 2 ай бұрын
nice video. just one question why do you create a dataflow ? you can insert rows using python?
@techtrapture
@techtrapture 2 ай бұрын
Yes I agree but as a project I want to show the complete orchestration process and use multiple services
@ShigureMuOnline
@ShigureMuOnline 2 ай бұрын
@@techtrapture really thanks for the faster answer. I Will see all your videos
@sampathgoud8108
@sampathgoud8108 4 ай бұрын
I tried the same way as per your video but i got this error when running the data flow job through template. Could you please help me out what exactly the mistake which i have done. I used the same schema which you have used. Error message from worker: org.apache.beam.sdk.util.UserCodeException: java.lang.RuntimeException: Failed to serialize json to table row: 1,Babar Azam,Pakistan
@techtrapture
@techtrapture 4 ай бұрын
Are you using same json files?
@sampathgoud8108
@sampathgoud8108 4 ай бұрын
yes@@techtrapture Below is the JSON file { "BigQuery Schema": [{ "name": "rank", "type": "STRING" }, { "name": "name", "type": "STRING" }, { "name": "country", "type": "STRING" } ] }
@sampathgoud8108
@sampathgoud8108 4 ай бұрын
I tried Rank column with both String and INTEGER data types. For both i am getting the same issue.
@pankajgurbani1484
@pankajgurbani1484 4 ай бұрын
@sampathgoud8108 I was getting the same error, this got resolved after I put the 'transform' in JavaScript UDF name in Optional Parameters while setting up DataFlow job
@TechwithRen-Z
@TechwithRen-Z 2 ай бұрын
This tutorial is 😩a waste of time for beginners. He did not show how to connect python to the GCP before storing data in bucket. There a lot of missing steps.
@Rajdeep6452
@Rajdeep6452 5 ай бұрын
you didnt show how to connect GCP before storing data in bucket. You have jumped a lot of steps. your video lacks quality. You should also include which dependencies to use and all. Just running your code and uploading to Github is not everything.
ПРОВЕРИЛ АРБУЗЫ #shorts
00:34
Паша Осадчий
Рет қаралды 7 МЛН
I'm Excited To see If Kelly Can Meet This Challenge!
00:16
Mini Katana
Рет қаралды 29 МЛН
ВОДА В СОЛО
00:20
⚡️КАН АНДРЕЙ⚡️
Рет қаралды 35 МЛН
Top 5 FREE Resources to 10X Your Data Engineering Skills
11:49
Jash Radia
Рет қаралды 52 М.
3 GCP Cloud Projects (Google Cloud Beginner Project Ideas)
5:00
📱магазин техники в 2014 vs 2024
0:41
djetics
Рет қаралды 646 М.
Как противодействовать FPV дронам
44:34
Стратег Диванного Легиона
Рет қаралды 75 М.
Samsung laughing on iPhone #techbyakram
0:12
Tech by Akram
Рет қаралды 7 МЛН