How to use SQL to Query S3 files with AWS Athena | Step by Step Tutorial

  Рет қаралды 165,746

Be A Better Dev

Be A Better Dev

4 жыл бұрын

In this video, I show you how to use AWS Athena to query JSON files located in an s3 bucket. I show you how to set up an Athena Database and Table using AWS Glue's Crawler.
Looking to get hands on experience building on AWS with a REAL project? Check out my course - The AWS Learning Accelerator! courses.beabetterdev.com/cour...
00:16 Introduction to our S3 Dataset
01:57 Specifying an Athena Output Path
02:35 Connecting S3 Data Source to Athena Database
04:08 Using a Glue Crawler to Populate our Database
06:38 Testing our Athena Database with a Simple Query
🎉SUPPORT BE A BETTER DEV🎉
Become a Patron: / beabetterdev
📚 MY RECOMMENDED READING LIST FOR SOFTWARE DEVELOPERS📚
Clean Code - amzn.to/37T7xdP
Clean Architecture - amzn.to/3sCEGCe
Head First Design Patterns - amzn.to/37WXAMy
Domain Driver Design - amzn.to/3aWSW2W
Code Complete - amzn.to/3ksQDrB
The Pragmatic Programmer - amzn.to/3uH4kaQ
Algorithms - amzn.to/3syvyP5
Working Effectively with Legacy Code - amzn.to/3kvMza7
Refactoring - amzn.to/3r6FQ8U
🎙 MY RECORDING EQUIPMENT 🎙
Shure SM58 Microphone - amzn.to/3r5Hrf9
Behringer UM2 Audio Interface - amzn.to/2MuEllM
XLR Cable - amzn.to/3uGyZFx
Acoustic Sound Absorbing Foam Panels - amzn.to/3ktIrY6
Desk Microphone Mount - amzn.to/3qXMVIO
Logitech C920s Webcam - amzn.to/303zGu9
Fujilm XS10 Camera - amzn.to/3uGa30E
Fujifilm XF 35mm F2 Lens - amzn.to/3rentPe
Neewer 2 Piece Studio Lights - amzn.to/3uyoa8p
💻 MY DESKTOP EQUIPMENT 💻
Dell 34 inch Ultrawide Monitor - amzn.to/2NJwph6
Autonomous ErgoChair 2 - bit.ly/2YzomEm
Autonomous SmartDesk 2 Standing Desk - bit.ly/2YzomEm
MX Master 3 Productivity Mouse - amzn.to/3aYwKVZ
Das Keyboard Prime 13 MX Brown Mechanical- amzn.to/3uH6VBF
Veikk A15 Drawing Tablet - amzn.to/3uBRWsN
📚 Resources:
Part 2 - Query Athena from Lambda - • How to Query AWS Athen...
Athena Overview - • What is AWS Athena? Se...
Getting started with AWS: • Introduction to AWS | ...
☁Topics covered include:
AWS Athena
AWS Athena Database Setup
AWS Athena Table Setup
AWS Glue Crawler
AWS Athena with S3 Json Files
AWS Athena SQL Select
🌎 Find me here:
Twitter - / beabetterdevv
Instagram - / beabetterdevv
Patreon - Donations help fund additional content - / beabetterdev
#AWS
#Serverless
#BigData

Пікірлер: 127
@nicomelmo
@nicomelmo Жыл бұрын
Thanks man, you showed my EXACTLY what I needed just with this short video!!
@robindong3802
@robindong3802 2 жыл бұрын
Thank you, good job, simple and clear. love it.
@stutisharma594
@stutisharma594 Жыл бұрын
This is a really great tutorial for beginners with no exposure or access to AWS Consoles but want to learn. Thank you, Sir! :-)
@1upgreenshroon
@1upgreenshroon 3 жыл бұрын
Thanks for posting this. Watching a clear, concise youtube video saved me about 10 hours of reading AWS docs. Much appreciated.
@BeABetterDev
@BeABetterDev 3 жыл бұрын
You're very welcome!
@alpha_ray_burst
@alpha_ray_burst 4 жыл бұрын
This is a FANTASTIC video. So easy, quick, concise. Thank you for making this.
@BeABetterDev
@BeABetterDev 4 жыл бұрын
You're very welcome!
@evanfang6573
@evanfang6573 2 жыл бұрын
Really good demo, easy and clear, thanks.
@warrenb7450
@warrenb7450 3 жыл бұрын
It's wonderful to see a real demo of how Athena and S3 works. Thank you!!
@BeABetterDev
@BeABetterDev 3 жыл бұрын
Hi Peining, glad you found this helpful!
@siddharthsrivastava4491
@siddharthsrivastava4491 Жыл бұрын
Great tutorial buddy!! Thank you very much
@mosesg45
@mosesg45 6 ай бұрын
Awesome video using Athena and Glue..
@KS-tf3ri
@KS-tf3ri 4 жыл бұрын
Love the videos! Thank you for all the help and keep them coming!
@BeABetterDev
@BeABetterDev 4 жыл бұрын
Glad you like them!
@chrismcclure4264
@chrismcclure4264 Жыл бұрын
That was a great tutorial. Thank you
@user-ud5di3bo5w
@user-ud5di3bo5w 11 ай бұрын
Thanks its Very helpful, keep going god bless you mate.
@donamallik2011
@donamallik2011 3 жыл бұрын
Excellent video. I followed it to create table in Athena with Glue. The only thing that stumped me was the Role. I had to refer to Amazon docs to create the Role. I then used it in creating the crawler. Without that the table wasn't getting created.
@datag1199
@datag1199 2 жыл бұрын
This is great! Thank you for posting these awesome videos.
@BeABetterDev
@BeABetterDev 2 жыл бұрын
You're very welcome!
@sagayaraji2143
@sagayaraji2143 2 жыл бұрын
That was so quick and clear explanation
@BeABetterDev
@BeABetterDev 2 жыл бұрын
Glad you enjoyed Sagayaraj!
@skkkks2321
@skkkks2321 2 жыл бұрын
excellent explanation and demo.Great job,keep it up
@BeABetterDev
@BeABetterDev 2 жыл бұрын
Thanks a lot!
@bakozen891
@bakozen891 3 жыл бұрын
Thanks bro it's very useful to begin ;)
@najibmestaoui5024
@najibmestaoui5024 2 жыл бұрын
Thank you for the tutorial and thumbs up for using Firefox
@rohitpatil6150
@rohitpatil6150 3 жыл бұрын
Thank you for details explanation.
@BeABetterDev
@BeABetterDev 3 жыл бұрын
Glad it was helpful!
@zhengyangma1889
@zhengyangma1889 Ай бұрын
much better than the recommended tutorials on my home page
@saccherine
@saccherine 2 жыл бұрын
This is superb
@user-wk5ey3hx7p
@user-wk5ey3hx7p 3 жыл бұрын
Fantastic video for beginners.
@BeABetterDev
@BeABetterDev 3 жыл бұрын
Glad it was helpful!
@irfan4701
@irfan4701 4 жыл бұрын
I really appreciate all the hardwork you do to bring such a wonderful content to us. Salute May be someday you get billions subs
@BeABetterDev
@BeABetterDev 4 жыл бұрын
Thank you so much for your kind words!
@srirajjayaprakash3852
@srirajjayaprakash3852 8 ай бұрын
Good job man
@techTFQ
@techTFQ 3 жыл бұрын
Good video.. thank you..
@BuhlzI
@BuhlzI 3 жыл бұрын
Good stuff. The crawler makes it stupid easy.
@BeABetterDev
@BeABetterDev 3 жыл бұрын
Absolutely! As a lazy developer crawlers really help out :)
@tomaszziss5481
@tomaszziss5481 2 жыл бұрын
Thanks for nice tutorial.
@BeABetterDev
@BeABetterDev 2 жыл бұрын
You're very welcome!
@ihebbibani7122
@ihebbibani7122 3 жыл бұрын
Thanks for this video.I subscribed.Will look forward for your other videos ;)
@BeABetterDev
@BeABetterDev 3 жыл бұрын
Welcome!
@nitam2238
@nitam2238 3 жыл бұрын
Thank you so much. This very helpfull
@BeABetterDev
@BeABetterDev 3 жыл бұрын
You're very welcome Nita!
@krishind99
@krishind99 3 жыл бұрын
Super cool. Is the process same for CSV file too? Can you extend this idea a little more and include RedShift in the process. Also a simple tutorial on EMR with these steps will be a delight. Thanks & keep up the great work
@Kumarthangasamy
@Kumarthangasamy 3 жыл бұрын
Good one...short and sweet...
@BeABetterDev
@BeABetterDev 3 жыл бұрын
Thank you 🙂
@michaelhughes8413
@michaelhughes8413 2 жыл бұрын
Nice tutorial !
@BeABetterDev
@BeABetterDev 2 жыл бұрын
Thank you! Cheers!
@alik.8061
@alik.8061 Жыл бұрын
amazing
@nanwang4765
@nanwang4765 3 жыл бұрын
The video is awesome.
@BeABetterDev
@BeABetterDev 3 жыл бұрын
Thanks nan!
@omkarhaldankar2940
@omkarhaldankar2940 2 жыл бұрын
thanku so much
@BeABetterDev
@BeABetterDev 2 жыл бұрын
You're very welcome Omkar!
@sweatyguysteve8087
@sweatyguysteve8087 3 жыл бұрын
great video
@BeABetterDev
@BeABetterDev 3 жыл бұрын
Thanks Steve!
@hikingcloud
@hikingcloud 3 жыл бұрын
Good explanation!
@BeABetterDev
@BeABetterDev 3 жыл бұрын
Thanks Manish!
@hikingcloud
@hikingcloud 3 жыл бұрын
Hey Daniel, I have use case to capture details of ec2 resources when it’s created like type of instance, tags, size etc for reporting purpose. I can’t find that details in OOTB cloud trail logs.. thinking of writing some custom function to capture those detail whenever an instance is created and stored in s3 and then query through Athena.. does it sound like a good approach?
@Canda-fh4xc
@Canda-fh4xc Жыл бұрын
Thank you for the great tutorial. is the Json Files available for download
@xxy8118
@xxy8118 2 жыл бұрын
hi, can you please have a tutorial on how to add, edit and delete entries as well. thank you so much.
@zwc76
@zwc76 7 ай бұрын
The data structure that you have is called NDJSON (the one without brackets and commas at the end).
@Aearewn
@Aearewn 3 жыл бұрын
Thank you for this channel. It's helped me fill huge gaps in my knowledge about AWS. Quick question: how did you create a .json array file without square brackets?
@BeABetterDev
@BeABetterDev 3 жыл бұрын
Thanks so much for your support! A great website to generate JSON is Mockaroo (google it). After you generate your JSON, just delete the first [ and last ]
@chidambaranathanmanian4183
@chidambaranathanmanian4183 3 жыл бұрын
Hi, Could you Please provide the video on how to query Postgresql data in Athena (Postgres as a data source in Athena)
@duket.8598
@duket.8598 3 жыл бұрын
would/could the next step be to throw that 2nd Query (purchase data > $50) into Quickinsight to visualize the data -- or -- what would be the rationale to leave it in this state/final, curious. Thank you. Aloha
@birdsculptures
@birdsculptures 2 жыл бұрын
thanks. Would this work for nested json files?
@ibmuser13
@ibmuser13 3 жыл бұрын
thanks for the Simplified explanation. Liked and subscribed! What is the "default" database with Glue? I should be able to use the default database with as many s3 buckets I scan? thanks again
@BeABetterDev
@BeABetterDev 3 жыл бұрын
Thanks Bhakta! Not sure what the default one is - i think its just created as part of your account. You can certainly use it as your default database.
@gustavemuhoza4212
@gustavemuhoza4212 2 жыл бұрын
Is it still the case that classifiers work with that particular type of json format? Tried to use s crawler and DataBrew on some deeply nested json and it did not work
@tonyme7426
@tonyme7426 4 жыл бұрын
Nice
@BeABetterDev
@BeABetterDev 4 жыл бұрын
Thanks tony!
@pavansinghkushwa495
@pavansinghkushwa495 3 жыл бұрын
Hello there, I want to extract data from redshift cluster using JDBC end point of it to AWS Athena to run SQL queries.. can we implement this..? if yes can you please make a video on this..?
@vitache1276
@vitache1276 3 жыл бұрын
Waooo perfect
@BeABetterDev
@BeABetterDev 3 жыл бұрын
Thanks 😊
@csmatyi
@csmatyi Жыл бұрын
Hello, do you think I can query an sqlite3 db file that is being stored in an AWS S3 bucket? How would I do that? Thanks.
@thomas-sinkala
@thomas-sinkala Жыл бұрын
Any way I can use Athena to query Minio (S3 compatible storage)?
@debasishdeb0812
@debasishdeb0812 Жыл бұрын
Can we generate the SQL query using python script and how to query the data that are present in csv files that we upload in the S3 bucket
@zuu2051
@zuu2051 3 жыл бұрын
How I can create multiple tables from multiple files (CSV) which contains in the same folder. Please help
@YuanDong
@YuanDong 3 жыл бұрын
This json is a simple flattened one. I wonder how it works with a more complex json schema.
@sachavanweeren9578
@sachavanweeren9578 2 жыл бұрын
Thanks for the great video. Can you share the resources you use as well (e.g. the json file you use here). That makes following along a bit more easy
@BeABetterDev
@BeABetterDev 2 жыл бұрын
Hi Sacha, I checked my disk and unfortunately lost the file used in this demo. However I'm fairly certain I used the online tool Mockaroo to generate the fake data. You can give it a try to emulate. Daniel
@muhammadaliuppal1892
@muhammadaliuppal1892 2 жыл бұрын
when configuring output of crawler, I do not see the default database. What could be the reason for that?
@ganganegi6932
@ganganegi6932 2 жыл бұрын
It would have been great if you had shown the output S3 bucket again after running this query.
@hectornavarrete3350
@hectornavarrete3350 3 жыл бұрын
excellent video, I wonder if after read a text file with athena I can insert into a Oracle database that use (TLS)
@BeABetterDev
@BeABetterDev 3 жыл бұрын
Hi Hector, you definitely can!
@mohamedbousselham2173
@mohamedbousselham2173 3 жыл бұрын
do u have a course abt AWS Hands on on udemy ur skills explaining are the best
@marbas9215
@marbas9215 3 жыл бұрын
Hi, thank you for the video, it was very helpful for me to set up my AWS Athena. I have a question: if the files on S3 that are linked to the Crawler are updated daily, and I set up the crawler to run daily, will Athena update the output daily as well? Or is that another process that I need to schedule? I don't do anything in Athena, i only use it to access the data on S3 through Tableau by connecting it to Athena
@BeABetterDev
@BeABetterDev 3 жыл бұрын
Hi Mar, great question! If your crawler pulls in new data daily, then your Athena cluster should automatically pull this data in during its queries. You can easily recreate this scenario to test it out with a couple of small files. Have a great day!
@joshuanicolas2002
@joshuanicolas2002 Жыл бұрын
Can this be applied if i want to retrieved huge amount of files (jpg)
@royalhorusbernal9629
@royalhorusbernal9629 2 ай бұрын
Amazon s3 and amazon fsx windows file server is the,considering that i want to up an application common with diferents enviroments like DEV,QA and PROD (bd,app, and server) o i need to consider both to begin? and price for year it is very expensive?
@freakinmonkey85
@freakinmonkey85 4 жыл бұрын
No fucking way, I was just looking for a video on this when “boom, AWS Simplified...” Awesome
@BeABetterDev
@BeABetterDev 4 жыл бұрын
Glad I was able to help!
@marjan61717
@marjan61717 2 жыл бұрын
how can have a sample long JSON file to test it ?
@collimarco
@collimarco Жыл бұрын
Does AWS Athena uses S3 SELECT for the queries on S3?
@DiscoverWithHassan
@DiscoverWithHassan 2 жыл бұрын
Will it work with .PDF files?
@sangarajusukanya334
@sangarajusukanya334 3 жыл бұрын
hi can you explain if we delete the file in s3 path but still we need the data in athena table .
@BeABetterDev
@BeABetterDev 3 жыл бұрын
Hi Sanagarju, if you delete the file in the s3 path it will no longer be queryable in Athena.
@stephenpalfreyman4755
@stephenpalfreyman4755 Жыл бұрын
Time to redo this video for the new UI as it has completely changed now
@BeABetterDev
@BeABetterDev Жыл бұрын
Thanks for letting me know!
@saleemibrahim5984
@saleemibrahim5984 4 жыл бұрын
Hey, I do not have the connect to data source option and it's written (No databases or tables found). I do not know what is the problem
@BeABetterDev
@BeABetterDev 4 жыл бұрын
Hi Saleem, Did you accidentally delete the default database?
@richagoyal4681
@richagoyal4681 2 жыл бұрын
if i have added new file in s3 path and want to see the new data using athena query without crawler because crawler already exist then what command i can use in athena so that the data will refresh automatically without creating new crawler of re running existing crawler?
@richagoyal4681
@richagoyal4681 2 жыл бұрын
Please reply if anyone knows
@BeABetterDev
@BeABetterDev 2 жыл бұрын
Hi Richa, If you're just updating the data, you don't need to do anything to have it reflected in your athena query. The next time you query, the data will be present. However, if you change the schema (add a field, remove a field, update field), it won't be present until you re-run the crawler or change the athena table schema. Hope this helps.
@jasonwilson3329
@jasonwilson3329 2 жыл бұрын
Thank you for the video. I have a couple of questions. Does Glue have to continue to run to pick up new files or does it just need to run once to establish the schema? Also I got the point testing the data with json that looks like this { "id": "05b10b2a-7dee-11ec-90d6-0242ac120031", "time": "2021-01-19T18:43:48Z", "source": "testing", "resources": [], "detailType": "Test Event 31", "detail": { "eventid": "05b10b2a-7dee-11ec-90d6-0242ac120031", "eventName": "test_event 31", "claimCheck": false, "eventTime": "2021-01-19T18:43:48Z", "version": "1.0" } } And I receive the error "Row is not a valid JSON Object - JSONException: A JSONObject text must end with '}' at 2 [character 3 line 1]". Is the because of the embedded object "detail". Glue seemed to detect the schema correctly
@jasonwilson3329
@jasonwilson3329 2 жыл бұрын
Nevermind -- I realized it was because my JSON spanned multiple lines. I also realized that I was able to query new files once Glue was set up without running the crawler again. Looking at the S3 results bucket, I see that there "unsaved" result files. Do these persistent until I manually delete them or are they cleared out regularly?
@catherineaugustine7769
@catherineaugustine7769 Жыл бұрын
@@jasonwilson3329 So, can you query using athena here ? My schema is similar to this, one object would span multiple lines. How do I query in this case ?
@divyatirthadas
@divyatirthadas 4 жыл бұрын
Thanks for sharing this, somehow I don't see any data being returned. I checked my json file has the data. What am I missing?
@BeABetterDev
@BeABetterDev 4 жыл бұрын
Hi Deepak, is this via the Athena AWS console? Or are you calling programmatically?
@divyatirthadas
@divyatirthadas 4 жыл бұрын
@@BeABetterDev it's through the console.
@yashshinde108
@yashshinde108 4 жыл бұрын
Same. Zero records returned.
@mohamedifthikhar7821
@mohamedifthikhar7821 4 жыл бұрын
How the Query results will be exported to S3 bucket?
@BeABetterDev
@BeABetterDev 4 жыл бұрын
Hi, this is an automatic feature when using athena. Your query results are stored in the S3 bucket you specify during athena database setup.
@ahmedabdellatif5845
@ahmedabdellatif5845 3 жыл бұрын
What if my JSON files in S3 are not simple formatted like yours? anything i can do to me it work? i have over 10000 json files i need to query. the format { "StartTime":1596475018000, "FileName":"testfile", "Info":{"language":"english"},{"location":"usa"}, . . . }
@BeABetterDev
@BeABetterDev 3 жыл бұрын
Hey there Ahmed, Check out the answer on stackoverflow here: stackoverflow.com/questions/53314808/querying-nested-json-structures-in-aws-athena Hope this helps!
@lidavid9781
@lidavid9781 4 жыл бұрын
Is it possible to share your json file in your this video to let us to practice?
@BeABetterDev
@BeABetterDev 4 жыл бұрын
Hi David, I unfortunately can't find the JSON file I used for this video. However there is a neat tool called 'Mockaroo' that lets you quickly created formatted JSON files filled with content. Check it out here: www.mockaroo.com/ Cheers
@lidavid9781
@lidavid9781 4 жыл бұрын
Thank you for your information. Your video is great and helpful.
@csoutsource
@csoutsource Жыл бұрын
Strange, I do not have the option to "Connect Data Source"
@elahehseyedain9056
@elahehseyedain9056 2 ай бұрын
Me too!😞😧
@vitache1276
@vitache1276 3 жыл бұрын
do you offer trainings
@BeABetterDev
@BeABetterDev 3 жыл бұрын
Hi Vita, not at this time but I am currently have plans for a Lambda training course offering. Stay tuned!
@vitache1276
@vitache1276 3 жыл бұрын
Thank you for the update.
@DeviantDeveloper
@DeviantDeveloper 3 жыл бұрын
Sorry I was totally lost from the beginning. A more detailed step by step process would be useful. Where did you get the path from, what page was that? I can't find it (for example).
@user-vh8iq3fq8j
@user-vh8iq3fq8j 6 ай бұрын
u quickly went too further without tell us about how to create bucket
How to Query AWS Athena from a Lambda Function | Step by Step Tutorial
8:57
Stupid Barry Find Mellstroy in Escape From Prison Challenge
00:29
Garri Creative
Рет қаралды 18 МЛН
Они убрались очень быстро!
00:40
Аришнев
Рет қаралды 3,3 МЛН
How to Query Your DynamoDB Table with SQL using Athena
26:13
Be A Better Dev
Рет қаралды 13 М.
AWS Glue Tutorial for Beginners [FULL COURSE in 45 mins]
41:30
Johnny Chivers
Рет қаралды 245 М.
SQL For AWS Athena [FULL COURSE IN 40mins]
40:07
Johnny Chivers
Рет қаралды 16 М.
AWS S3 Tutorial For Beginners
27:18
Be A Better Dev
Рет қаралды 214 М.