No video

End-To-End Data Engineering Project in 40 Minutes | AWS Cloud | PySpark

  Рет қаралды 36,093

Date with Data

Date with Data

Күн бұрын

Explore the world of AWS Data Engineering with this project. In this playlist we will leverage services like S3, Athena, Glue, Quicksight and many more services.
Stay Tuned. Like, Subscribe and Support.
Kaagle Link : shorturl.at/qBUX5
Processed Data : drive.google.c...
.
.
Project Series : • DE Projects
Snowflake : • Snowflake Tutorial
.
.
#dataengineer #project #aws #glue #awss3 #learnbydoing

Пікірлер: 114
@shivam87480
@shivam87480 3 ай бұрын
used aws glue under free teir cost a some amount after i month which i used because i was unaware of the extra charges..so i request u please provide a commet while making the video that which services can cost money and which services can be used under the free teir , that will be very helpfull for newbies like me...
@hlulaniwinners7076
@hlulaniwinners7076 6 ай бұрын
That was really good to follow...100% worked and I learned so much more in 40min😀😃
@adityatomar9820
@adityatomar9820 5 ай бұрын
i got a bill of 2.80 dollars by just running glue etl once...I dont know how im gonna create more projects if they keep billing like thiss..i cant afford fee rn...? what can i do?
@BOSS-AI-20
@BOSS-AI-20 2 күн бұрын
@@adityatomar9820 make new free tier account
@user-ub9ww9bz4w
@user-ub9ww9bz4w 5 ай бұрын
thats what i was looking for. thank you :)
@user-ub9ww9bz4w
@user-ub9ww9bz4w 5 ай бұрын
also, you should create a playlist with all data engineering projects you already done, gonna be easy to find :)
@ganesh.majety5260
@ganesh.majety5260 7 ай бұрын
Just watched 1 video, u gained a subscriber 🎉. Hope more from u😊
@ravivaddi9532
@ravivaddi9532 5 ай бұрын
Amazing content... This is first AWS DE video I watched in practical and I am glad I found this video. Thank You Can you please share some automated way of doing ingestion process in s3 staging folder and some preprocessing demo followed up by some SCD Type 2 implementation on glue?
@dggh7879
@dggh7879 6 ай бұрын
Great project for beginners!!
@TonyRydinger-bq9pk
@TonyRydinger-bq9pk 4 ай бұрын
Great video, can you please explain the preprocessing part, what exactly did you use to preprocess the datasets, was it a python script in pandas or something else?
@adityatomar9820
@adityatomar9820 5 ай бұрын
OH GOD! AWS UI always made me overwhelmed and scared me....But you just explained everything so beautifully...Thankyou soo much mann.....I finally feel confident that i can learn AWS and build awesome projects... BTW will AWS charge us for using ATHENA and GLUE as they don't come under free trial...?
@datewithdata123
@datewithdata123 5 ай бұрын
Yes. For completing this project the bill will be less than half a dollar(if you don’t run a glue job a lot).
@sanjeevpandey2753
@sanjeevpandey2753 7 ай бұрын
Nice one bhai, very precise and clear explanation
@datewithdata123
@datewithdata123 7 ай бұрын
Glad you like it
@sanjeevpandey2753
@sanjeevpandey2753 7 ай бұрын
May I have your mail id please?
@datewithdata123
@datewithdata123 7 ай бұрын
datewithdata1@gmail.com
@swapnilgaikwad9738
@swapnilgaikwad9738 7 ай бұрын
Good please again one end to end aws-data project video
@avinash7003
@avinash7003 7 ай бұрын
can you do on s3,glue,emr,lambda,athena,redshift
@datewithdata123
@datewithdata123 5 ай бұрын
Ongoing. Will be released soon
@ruben3815
@ruben3815 7 ай бұрын
good job dude!
@kunalk3830
@kunalk3830 2 ай бұрын
in real time do we have to perform these task regarding IAM, etc or do we have to jst run terraform scripts or something similar and our architecture or cluster spins up? can you clear this real time working process?
@TonyRydinger-bq9pk
@TonyRydinger-bq9pk 4 ай бұрын
Great video, can you let us know what did you use for preprocessing, was it a python script in pandas or something else?
@tulasipanthagani6401
@tulasipanthagani6401 4 ай бұрын
can u please help y crawler is not running, it is asking some permission ,which permission we need to add
@gnaneshwaripanthagani3515
@gnaneshwaripanthagani3515 4 ай бұрын
In AWS glue when I am creating pipeline in transform join I am not getting option to select any source key can u plzz help
@FredRohn
@FredRohn 4 ай бұрын
I used infer schema, and that seemed to fix the problem for me :)
@gnaneshwaripanthagani3515
@gnaneshwaripanthagani3515 4 ай бұрын
⁠@@FredRohnThank you so much,it works for me
@mushkarasaiprakash1915
@mushkarasaiprakash1915 2 ай бұрын
how did you preprocess data, what all you removed or changed while preprocessing the data
@djsamxgaming5732
@djsamxgaming5732 2 ай бұрын
I dont know why i am not able to see the output in datawarehouse, but i can see 100% success rate in job monitoring window. Could you tell me what will be the problem in this???
@ajtam05
@ajtam05 6 ай бұрын
Would you know why the 'Data preview' on joins may not populate any data aka 'No data to display'? I did a sanity check and the albums and artists files (in excel) , do indeed have matching data in the artist_id (album) to id (artist). But when I join on those conditions, as you did, it doesn't populate any data. Just to see, I tried right and left join, and that actually populated data for each respective side (oddly enough). Seems like a glitch, but because the script it simple and the join script looks correct. Do you know if the data types are converted or something else occurs behind the scenes when you join in Visual ETL?
@ajtam05
@ajtam05 6 ай бұрын
I basically can't do the project because the subsequent nodes require data being fed from previous nodes. But there's no data at the first join (album/artist). Really odd.
@datewithdata123
@datewithdata123 5 ай бұрын
Please check you have your data in s3.
@himanshusaini011
@himanshusaini011 4 ай бұрын
Yes we do have the data in s3 but the same issue is also popup for me as well
@danielpequeno33
@danielpequeno33 25 күн бұрын
I have the same problem, could any of you solve it? @himanshusaini @ajtam05
@eugenia6490
@eugenia6490 7 ай бұрын
Question please. 26min:38sec timestamp - you mentioned that the job created multiple blocks. Why are there multiple blocks? Thank you!
@datewithdata123
@datewithdata123 7 ай бұрын
We have created two worker nodes and since we have very little data. we could see that there were exactly 2 files in our warehouse table.
@eugenia6490
@eugenia6490 7 ай бұрын
@@datewithdata123 Thank you!
@mwanthidaniel1254
@mwanthidaniel1254 6 ай бұрын
Is S3 a data warehouse or data lake?
@datewithdata123
@datewithdata123 6 ай бұрын
S3 is neither a warehouse nor a data lake; it's an object storage service provided by AWS, but can be used as both because it can manage large volumes of structured and unstructured data for analytics, processing, and other purposes.
@Divya-gn5lh
@Divya-gn5lh Ай бұрын
hey @datewithdata firstly I like ur project playlist if uhh share the source code with us it would be helpful for us.....thank for content
@adityatomar9820
@adityatomar9820 5 ай бұрын
plz also tell how to push these kind of projects on GITHUB
@nguyentien4711
@nguyentien4711 23 күн бұрын
this procedure should not be on your github, it's just a BI tool while github is the place to show your code skill and project build merely by code from scratch
@CricketLover-qy9nn
@CricketLover-qy9nn 6 ай бұрын
I'm unable to the trackid from the join album and artist. What might be the reason
@KomalChavan-ht7wm
@KomalChavan-ht7wm 4 ай бұрын
same
@KomalChavan-ht7wm
@KomalChavan-ht7wm 4 ай бұрын
hey how u resolved this issue?
@FredRohn
@FredRohn 4 ай бұрын
use infer schema, that fixed the problem for me@@KomalChavan-ht7wm
@FredRohn
@FredRohn 4 ай бұрын
try infer schema, that made it work for me
@danielpequeno33
@danielpequeno33 25 күн бұрын
did you find a way to solve it?
@badboy1585
@badboy1585 5 ай бұрын
hello bro, the services you are used in this project are comes in free tier right ? or we have to pay
@datewithdata123
@datewithdata123 5 ай бұрын
Some of the services are not under free tier. For completing this project the bill will be less than half a dollar(if you don’t run a glue job a lot).
@adityatomar9820
@adityatomar9820 5 ай бұрын
@@datewithdata123 i got 2.80 dollar bill just after running etl once in glue
@udaykirankankanala3635
@udaykirankankanala3635 4 ай бұрын
When i am trying to save visual etl job it is showing me error as create job:access denied exception What is the policy we have to add in root account?
@datewithdata123
@datewithdata123 4 ай бұрын
iam:PassRole
@udaykirankankanala3635
@udaykirankankanala3635 4 ай бұрын
I am unable to find that policy in root account Please help me
@datewithdata123
@datewithdata123 4 ай бұрын
Or provide iam full access.
@FredRohn
@FredRohn 4 ай бұрын
did you solve this issue? I am experiencing the same thing. @@udaykirankankanala3635
@FredRohn
@FredRohn 4 ай бұрын
how do i do this? I'm having a similar issue@@datewithdata123
@Gauravsingh-hx6lw
@Gauravsingh-hx6lw Ай бұрын
When i add policy for glue its not working can you help me
@AshutoshParashar-u5l
@AshutoshParashar-u5l Ай бұрын
glue_s3_role which you have created assign glue access to it it will work!
@shivam87480
@shivam87480 3 ай бұрын
can anyone tell how to showcase the project in github or put it in resume????
@KomalChavan-ht7wm
@KomalChavan-ht7wm 4 ай бұрын
at time of trasforming enable to join table on condtion data is not fetching at column? is anybody help me
@himanshusaini011
@himanshusaini011 4 ай бұрын
Same issue with me
@ajtam05
@ajtam05 6 ай бұрын
iam:PassRole error when trying to attach the role to the project. iam:PassRole looks very confusing, but I'm not sure why no one else is encountering this issue.
@ajtam05
@ajtam05 6 ай бұрын
User: arn:aws:iam::905418287400:user/proj is not authorized to perform: iam:PassRole on resource: arn:aws:iam::905418287400:role/glue_access_s3 because no identity-based policy allows the iam:PassRole action
@datewithdata123
@datewithdata123 6 ай бұрын
In the beginning while creating IAM user, plz add IAMFullAccess. This is happening because the "iam:PassRole" action is required when a service like AWS Glue needs to pass a role to another AWS service.
@ajtam05
@ajtam05 6 ай бұрын
@datewithdata123 OK, I will try that. I tried multiple solutions with regards to creating a new policy and attaching it to the user, but no luck. Hope that works. 🙏
@ajtam05
@ajtam05 6 ай бұрын
​@@datewithdata123 Yep, that worked. Thanks for that.
@ajtam05
@ajtam05 6 ай бұрын
@@datewithdata123 I believe that change has affected the way joins are occurring. before i was able to join the album & artist join w/ the tracks. but now the ablum & artist join doesn't populate any data. it looks like people have similar issue when i google, but no solutions provide online. are you aware?
@kumarsumit6117
@kumarsumit6117 5 ай бұрын
could you please help me after sucessfully running Glue pipline data s not stored in final s3 bucket
@datewithdata123
@datewithdata123 5 ай бұрын
Please share your error SC at datewithdata1@gmail.com
@rahulcp7013
@rahulcp7013 15 күн бұрын
Were you able to resolve this issue, I am also facing the same
@SS-gv8kh
@SS-gv8kh 3 ай бұрын
@datewithdata123 when I am running glue job it's successful but ouput files are not created in s3. Did you or anyone face similar issue?
@AshutoshParashar-u5l
@AshutoshParashar-u5l Ай бұрын
the visual ETL for every node are you seeing greed ticked if no ten the ETL process is not completed as per design. Make sure all the nodes are green then run it. I faced same error and have resolved and its working as expected.
@akshaypy4117
@akshaypy4117 4 ай бұрын
Crawler will not run with just s3 full access as shown here right?
@datewithdata123
@datewithdata123 4 ай бұрын
You may need to add IAM:Full Access if you are working as an IAMUser
@sidharthv1060
@sidharthv1060 4 ай бұрын
@@datewithdata123 I have added IAM:Full Access also within the role glue_access_s3 but again failed to run crawler.
@VivekYadav-og4lt
@VivekYadav-og4lt 4 ай бұрын
⁠@@sidharthv1060I think you need add AWSGlue service role
@supriya9047
@supriya9047 3 ай бұрын
@@sidharthv1060 I am also facing the same issue repeatedly, even after providing all the required access.
@kshitijjoshi2092
@kshitijjoshi2092 2 ай бұрын
@@supriya9047same
@rahulteja4849
@rahulteja4849 7 ай бұрын
While joining the tables in visual etl, i could not add the condition as i could not look for colum names it is not showing me any columns
@tokyochannel5684
@tokyochannel5684 6 ай бұрын
Solved?
@vichitravirdwivedi
@vichitravirdwivedi 6 ай бұрын
Refresh it multiple times. it happened with me too
@datewithdata123
@datewithdata123 6 ай бұрын
This may happen sometimes when you have slow internet connection. Bcz glue will read the schema from data present in S3. Hence the connection need to be set.
@himanshusaini011
@himanshusaini011 4 ай бұрын
​@@vichitravirdwivedi I already did it multiple times but no output
@FredRohn
@FredRohn 4 ай бұрын
try to use infer schema, all of the fields popped up for me after doing that. @@himanshusaini011
@ishwariupadhyay8122
@ishwariupadhyay8122 7 ай бұрын
Can you provide your github link for preprocessing data.
@datewithdata123
@datewithdata123 5 ай бұрын
Sorry didn’t save the code. We have used visual etl so the code was auto generated.
@backgrounding4821
@backgrounding4821 3 ай бұрын
Hello can you please update the Processed Data Link please.
@datewithdata123
@datewithdata123 3 ай бұрын
drive.google.com/drive/folders/1PgZQDvw5GnvVQuhV7-MtxIZHnLsZA-Zs?usp=drive_link
@backgrounding4821
@backgrounding4821 3 ай бұрын
@@datewithdata123 thanks! (Y)
@vivekpawar3069
@vivekpawar3069 6 ай бұрын
sit please attach preprocessing of csv file code
@HanhNguyen-sp8zo
@HanhNguyen-sp8zo 6 ай бұрын
is it free ?
@datewithdata123
@datewithdata123 6 ай бұрын
Yes
@yashraj-hz5xo
@yashraj-hz5xo Ай бұрын
gluestudio-service.ap-southeast-2.amazonaws.com] createJob: AccessDeniedException: User: arn:aws:iam::010928209223:user/proj is not authorized to perform: iam:PassRole on resource: arn:aws:iam::010***********:role/glue_access_s3 because no identity-based policy allows the iam:PassRole action how to resolve this ??
@rahulcp7013
@rahulcp7013 16 күн бұрын
You have to add an inline policy for your IAM user. I also got the same issue and worked once I added this policy. Wondering why this was not mentioned in the video or is it something new due to recent changes in AWS services { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": "iam:PassRole", "Resource": "arn:aws:iam::010***********:role/glue_access_s3" } ] }
@avinash7003
@avinash7003 7 ай бұрын
please upload the Glue script
@datewithdata123
@datewithdata123 5 ай бұрын
Sorry didn’t save the code. We have used visual etl so the code was auto generated.
Top AWS Services A Data Engineer Should Know
13:11
DataEng Uncomplicated
Рет қаралды 162 М.
Joker can't swim!#joker #shorts
00:46
Untitled Joker
Рет қаралды 41 МЛН
Or is Harriet Quinn good? #cosplay#joker #Harriet Quinn
00:20
佐助与鸣人
Рет қаралды 18 МЛН
女孩妒忌小丑女? #小丑#shorts
00:34
好人小丑
Рет қаралды 87 МЛН
Get hired in cloud? I asked 5 engineers !
23:10
Open Up The Cloud
Рет қаралды 10 М.
Fundamentals Of Data Engineering Masterclass
3:02:26
Darshil Parmar
Рет қаралды 44 М.
What is ETL | What is Data Warehouse | OLTP vs OLAP
8:07
codebasics
Рет қаралды 415 М.
Practical Projects to Learn Data Engineering On AWS
8:04
DataEng Uncomplicated
Рет қаралды 46 М.
Apache Spark End-To-End Data Engineering Project | Apple Data Analysis
3:01:19
Joker can't swim!#joker #shorts
00:46
Untitled Joker
Рет қаралды 41 МЛН