Python Web Scraping Tutorial: scraping dynamic JavaScript/AJAX websites with BeautifulSoup

  Рет қаралды 30,656

Red Eyed Coder Club

Red Eyed Coder Club

Күн бұрын

This Python Web Scraping Tutorial is about scraping dynamic websites, where the content is rendered by JavaScript.
For this Python Web Scraping Tutorial I used the Steam Store as an example. Because Steam website is an example of heavy JavaScript/AJAX driven website with dynamic content.
To scrape Steamstore website with Python I used only Python Requests and BeautifulSoup (bs4) libraries. With further exporting scraped data to a csv file.
This web scraping Python tutorial is the detailed explanation of how to scrape JavaScript driven pages and websites with Python and BeautifulSoup library for absolute beginners.
To install BeautifulSoup, Requests and Lxml:
pip install bs4 requests lxml
Follow me @:
Telegram: t.me/red_eyed_coder_club
Twitter: / codereyed
Facebook: redeyedcoderclub
======================================
📎️ The SOURCE CODE is available via Patreon:
/ steam-store-with-35670113
======================================
Timecodes:
00:00 - Beginning.
01:09 - Preliminary research (what to scrape)
03:15 - Creating a function that performs GET requests to Steam Store
06:01 - Server response research: what url should be passed in to the get_html() function
09:24 - The scraping plan
09:43 - Getting all Steam Store games with Python Requests, and BeautifulSoup. Scraping pagination.
12:40 - The algorithm of scraping all pages using the pagination GET requests
16:35 - Scraping data of a certain page with games
25:30 - Scraping hovering data of all games on each page, including the data from the hovering window
38:40 - Writing Scraped data to a CSV file
✴️✴️✴️ Also can be useful ✴️✴️✴️
Python tutorial: Namespaces and Scopes - • Python tutorial #7: Py...
Python Regular Expressions tutorial - • Regex Python Tutorial:...
Python tutorial: handling exceptions - • Python tutorial #14: P...
How to read and write CSV - • Python CSV tutorial: H...
✴️✴️✴️ Web Scraping course ✴️✴️✴️
is available via Patreon here:
/ red_eyed_coder_club
or its landing:
red-eyed-coder-club.github.io...
✴️✴️✴️ PLAYLISTS ✴️✴️✴️
🔹Django 3 Tutorial: Blog Engine
• Python Django Tutorial...
🔹Kivy Tutorial: Coppa Project
• Python Kivy tutorial #...
🔹Telegram Bot with Python (CoinMarketCap)
• Python Telegram Bot Tu...
🔹Python Web Scraping
• Python Ebay Scraping T...
➥➥➥ SUBSCRIBE FOR MORE VIDEOS ➥➥➥
Red Eyed Coder Club is the best place to learn Python programming and Django:
Subscribe ⇢ / @redeyedcoderclub
Python Web Scraping Tutorial: scraping dynamic JavaScript/AJAX websites with BeautifulSoup
• Python Web Scraping Tu...
#python #pythonwebscraping #beautifulsoup #bs4 #redeyedcoderclub #webscrapingpython #beautifulsouptutorial

Пікірлер: 84
@EnglishRain
@EnglishRain 4 жыл бұрын
Another FANTASTIC topic, amazing! I absolutely love the niche topics you select, thank you so much for sharing your good knowledge my friend.
@RedEyedCoderClub
@RedEyedCoderClub 4 жыл бұрын
Thank you very much!
@georgekingsley3972
@georgekingsley3972 2 жыл бұрын
sorry to be so off topic but does any of you know a trick to get back into an Instagram account..? I was stupid forgot my password. I would love any assistance you can give me.
@robertoclay5729
@robertoclay5729 2 жыл бұрын
@George Kingsley instablaster =)
@georgekingsley3972
@georgekingsley3972 2 жыл бұрын
@Roberto Clay thanks so much for your reply. I got to the site thru google and im waiting for the hacking stuff atm. Takes quite some time so I will reply here later when my account password hopefully is recovered.
@georgekingsley3972
@georgekingsley3972 2 жыл бұрын
@Roberto Clay it worked and I actually got access to my account again. Im so happy:D Thank you so much you saved my account !
@bingchenliu1854
@bingchenliu1854 3 жыл бұрын
That is what exactly I'm searching for! Thank you, man!
@RedEyedCoderClub
@RedEyedCoderClub 3 жыл бұрын
Thanks for watching!
@abrammarba
@abrammarba 6 ай бұрын
This is great! Thank you! 😃
@ticTHEhero
@ticTHEhero 3 жыл бұрын
that was exactly what i was looking for, thanks man
@RedEyedCoderClub
@RedEyedCoderClub 2 жыл бұрын
Thanks for watching
@igorbetkier856
@igorbetkier856 2 жыл бұрын
Such a great tutorial! Thank you for that!
@RedEyedCoderClub
@RedEyedCoderClub 2 жыл бұрын
Thanks for watching, and for the comment!
@rustamakhmullaev5697
@rustamakhmullaev5697 4 жыл бұрын
very useful lesson, thank's for your job!
@user-nt1uf4gl1i
@user-nt1uf4gl1i 3 жыл бұрын
finally, i have found you! thx for videos.
@RedEyedCoderClub
@RedEyedCoderClub 3 жыл бұрын
:)
@JoJoSoGood
@JoJoSoGood 3 жыл бұрын
Best video ever ...I will follow your channel from now on
@RedEyedCoderClub
@RedEyedCoderClub 3 жыл бұрын
Thank you!
@youngjordan5619
@youngjordan5619 3 жыл бұрын
awesome. Always had problem with infinity scroll and used Selenium. Now I know how to do it with bs4 thanks to you, cheers :)
@RedEyedCoderClub
@RedEyedCoderClub 3 жыл бұрын
Glad you like my video! Thanks for watching!
@RedEyedCoderClub
@RedEyedCoderClub 2 жыл бұрын
What video should I make next? Any suggestions? *Write me in comments!* Follow me @: Telegram: t.me/red_eyed_coder_club Twitter: twitter.com/CoderEyed Facebook: fb.me/redeyedcoderclub Help the channel grow! Please Like the video, Comment, SHARE & Subscribe!
@amrhamza9831
@amrhamza9831 3 жыл бұрын
thank you a lot this was really helpful to me thanks again
@RedEyedCoderClub
@RedEyedCoderClub 3 жыл бұрын
Thanks for watching!
@KekikAkademi
@KekikAkademi 4 жыл бұрын
this trick is awesome !
@KekikAkademi
@KekikAkademi 4 жыл бұрын
please more crawling and scraping trick, without scrapy,selenium etc. for pyqt5 gui projects and telegram bot projects :)
@user-qz9dk1uj2k
@user-qz9dk1uj2k 4 жыл бұрын
Good job. Thanks for video. I'm click like
@RedEyedCoderClub
@RedEyedCoderClub 4 жыл бұрын
Thank you!
@tazimrahbar7882
@tazimrahbar7882 3 жыл бұрын
Great explanation sir
@RedEyedCoderClub
@RedEyedCoderClub 3 жыл бұрын
Thank you!
@shortcuts9005
@shortcuts9005 2 жыл бұрын
brilliance
@RedEyedCoderClub
@RedEyedCoderClub 2 жыл бұрын
Thank you very much!
@JackWQ
@JackWQ 4 жыл бұрын
Hi, thanks for this, but I am encountering the website using "Post" method instead of "Get" in the Request Method, thus not able to replicate what you are doing by scraping the IDs first and copy into urls. The page is just constantly loading and then eventually said page not found. Is there a way to bypass this?
@RedEyedCoderClub
@RedEyedCoderClub 4 жыл бұрын
Did you try to scrape Steam?
@noelcovarrubias7490
@noelcovarrubias7490 3 жыл бұрын
I need to scrape data from walmart, which is all in JavaScript . I'm going to watch and try this tomorrow, hopefully it works!
@RedEyedCoderClub
@RedEyedCoderClub 3 жыл бұрын
Thanks for watching!
@akram42
@akram42 4 жыл бұрын
awesome
@RedEyedCoderClub
@RedEyedCoderClub 2 жыл бұрын
Thanks for comment
@MrYoklmn
@MrYoklmn 4 жыл бұрын
Спасибо большое!) А не планируешь ли серию уроков по scrapy? Ну и второй вопрос, можешь ли сделать урок по созданию на джанго самонаполняющегося агрегатора(новостей/товаров и т д)? Чтобы сайт сам парсил и заполнял себя. Пытаюсь такое реализовать на джанге и скрейпи. Но проблема с запуском парсера из джанги так, чтобы процесс не блокировался. В итоге привинтил celery, но с ним тоже возникают сложности(reactor ошибку выдает). Или мне не стоит на этом канале на русском писать?
@RedEyedCoderClub
@RedEyedCoderClub 2 жыл бұрын
Thanks for comment!
@duckthishandle
@duckthishandle 3 жыл бұрын
Very, very good video on this topic. The way you are explaining the things helps understanding the whole process behind getting the data! I am trying to access the data on various sites, but sometimes I get an error message that I "do not have the auth token" or "access denied!".. How can I bypass those?
@RedEyedCoderClub
@RedEyedCoderClub 3 жыл бұрын
Thank you. An access can be denied by many reasons. And it's hard to say something definite blindly
@Shajirr_
@Shajirr_ Жыл бұрын
Tried to use this method with Reddit comment search and it doesn't work - the requests it sends are POST requests. So no conveniently available URL on them which you can use. The requests themselves are JSON objects.
@joeking9859
@joeking9859 2 жыл бұрын
Excellent - best video on xhr (gets) nthat i have seen..great work Could you do a video on xhr (posts) please?
@RedEyedCoderClub
@RedEyedCoderClub 2 жыл бұрын
Ok, thanks for your suggestion. POST requests require using of CSRF tokens, and it can be quite tricky or even barely possible to bypass this protection.
@joeking9859
@joeking9859 2 жыл бұрын
@@RedEyedCoderClub thank you for your response. OK I will not try to go down that rabbit whole.
@joeking9859
@joeking9859 2 жыл бұрын
do you see most sites going to this method to protect their sites from being scraped?
@RedEyedCoderClub
@RedEyedCoderClub 2 жыл бұрын
most sites? Not sure. We always can use Selenium or Pyppeteer, for example
@joeking9859
@joeking9859 2 жыл бұрын
@@RedEyedCoderClub why would selenium or pyppeteer be better?
@sassydesi7913
@sassydesi7913 3 жыл бұрын
This is great! How would you scrape something like teamblind.com? Looks like they have infinite scroll & their payload is encrypted for every call. How would I go about getting historical posts data from this website?
@RedEyedCoderClub
@RedEyedCoderClub 3 жыл бұрын
I'll look at it. Thanks for your comment!
@ThEwAvEsHaPa
@ThEwAvEsHaPa 2 жыл бұрын
great video really well explained. please can you make video showing login/sign in to website with Request sessions and OAUTH
@RedEyedCoderClub
@RedEyedCoderClub 2 жыл бұрын
Thank you. I'll think about your suggestion. Have you any site as an example?
@ThEwAvEsHaPa
@ThEwAvEsHaPa 2 жыл бұрын
@@RedEyedCoderClub Thanks. i dont really have a specfic site in mind, i have just noticed on a few sites i tried to scrape are using oauth and im not sure how to get around it with just requests.
@RedEyedCoderClub
@RedEyedCoderClub 2 жыл бұрын
Ok, I'll think about it
@ThEwAvEsHaPa
@ThEwAvEsHaPa 2 жыл бұрын
@@RedEyedCoderClub Thanks bro, keep up the great work
@akram42
@akram42 4 жыл бұрын
can you host this script online and make it run 24/7 and sent the data to MySQL database? that would be amazing
@RedEyedCoderClub
@RedEyedCoderClub 2 жыл бұрын
You can use cron to do it
@silvermir84
@silvermir84 4 жыл бұрын
The While loop doesnt stop @800... what did i wrong? the else: Break doesnt work @ 15:47
@RedEyedCoderClub
@RedEyedCoderClub 4 жыл бұрын
How can I know what did you do wrong? Check the conditions of the loop breaking
@user-bj7rl8zd4o
@user-bj7rl8zd4o 4 жыл бұрын
He interrupted the loop by himself
@avinashmahendran6067
@avinashmahendran6067 2 жыл бұрын
Did you get the solution for this error...
@mrpontmercy8906
@mrpontmercy8906 4 жыл бұрын
hmm. At the very first step, it finds only 28 links, and then returns an empty list
@RedEyedCoderClub
@RedEyedCoderClub 2 жыл бұрын
Thanks for comment
@EnglishRain
@EnglishRain 4 жыл бұрын
I have a challenge for you: 😜 Can you login to WhatsApp Web using Requests library without manually scanning the QR code & without using Selenium? I achieved it using Saved Profile in Selenium but just curious if you can do it using Requests library. Thanks!
@RedEyedCoderClub
@RedEyedCoderClub 4 жыл бұрын
Interesting idea. But I'm afraid of WhatsApp they can ban my phone number. They really don't like our "style". I'll think about your suggestion, it's interesting.
@EnglishRain
@EnglishRain 4 жыл бұрын
@@RedEyedCoderClub haha yes, i understand. No worries, let it be, i was just thinking aloud. :)
@adrianka9405
@adrianka9405 4 жыл бұрын
def main(): all_pages = [] start = 1 url = f'www.otodom.pl/sprzedaz/mieszkanie/warszawa/?page={start}' while True: page = get_index_data(get_page(url)) if page: all_pages.extend(page) start += 1 url = f'www.otodom.pl/sprzedaz/mieszkanie/warszawa/?page={start}' else: break for url in page: data_set = get_detail_data(get_page(url)) print( all_pages ) This is part of my code where I tried to get detailed info from many pages on the website but it doesn't;t work. Do you have any idea why?
@RedEyedCoderClub
@RedEyedCoderClub 2 жыл бұрын
Thanks for comment!
@Shajirr_
@Shajirr_ Жыл бұрын
This search returned 779 results when the video was released. Now, it returns 4927 results. Just to put into perspective how much garbage is being shovelled onto the platform.
@user-yq4dn3gj5p
@user-yq4dn3gj5p 3 жыл бұрын
Привет, это Олег Молчанов?
@RedEyedCoderClub
@RedEyedCoderClub 2 жыл бұрын
yep, it's him
@egormakhlaev4866
@egormakhlaev4866 4 жыл бұрын
Молчанов, это ты что-ли?
@user-wv9vk8io1y
@user-wv9vk8io1y 3 жыл бұрын
он самый
@RedEyedCoderClub
@RedEyedCoderClub 2 жыл бұрын
Yep, it's him
@sriramkasu5286
@sriramkasu5286 4 жыл бұрын
sir need help
@sriramkasu5286
@sriramkasu5286 4 жыл бұрын
this video is good but what if I want to scrap data from website after logging in and getting details present in that logged account since the html wont work because logged in page cannot be requested
@RedEyedCoderClub
@RedEyedCoderClub 4 жыл бұрын
kzfaq.info/get/bejne/rbOWaq9705bPZIk.html
@sriramkasu5286
@sriramkasu5286 4 жыл бұрын
@@RedEyedCoderClub thanks
@postyvlogs
@postyvlogs 3 жыл бұрын
Please provide source code without Patreon
@RedEyedCoderClub
@RedEyedCoderClub 2 жыл бұрын
Thanks for comment. The project is very simple, there is no need in source code at all
@anikahmed7456
@anikahmed7456 3 жыл бұрын
please make a video on these website abc.austintexas.gov/web/permit/public-search-other?reset=true Search by Property Select- Sub Type : any Date : any Submit inthis website data where url doesn't changes i try so many time but couldn't success. also it's has JavaScript pagination link : javascript:reloadperm[pagination number] which is changes randomly Please make a video 🙏🙏🙏
@RedEyedCoderClub
@RedEyedCoderClub 2 жыл бұрын
Thanks for comment!
Scary Teacher 3D Nick Troll Squid Game in Brush Teeth White or Black Challenge #shorts
00:47
A clash of kindness and indifference #shorts
00:17
Fabiosa Best Lifehacks
Рет қаралды 112 МЛН
Scrape LIVE scores - No BeautifulSoup or Selenium NEEDED!
15:44
John Watson Rooney
Рет қаралды 50 М.
Scraping Dynamic JavaScript Websites - Beautiful Soup Python
11:38
Python and Scrapy - Scraping Dynamic Site (Populated with JavaScript)
15:40
Web Scraping to CSV | Multiple Pages Scraping with BeautifulSoup
29:06
Beautifulsoup vs Selenium vs Scrapy - Which Tool for Web Scraping?
6:54
John Watson Rooney
Рет қаралды 75 М.
Difference between cookies, session and tokens
11:53
Valentin Despa
Рет қаралды 604 М.
Scary Teacher 3D Nick Troll Squid Game in Brush Teeth White or Black Challenge #shorts
00:47