Beautifulsoup vs Selenium vs Scrapy - Which Tool for Web Scraping?

  Рет қаралды 74,087

John Watson Rooney

John Watson Rooney

Күн бұрын

Lets talk about scraping and which tool should you use for your web scraping projects in 2021 - Beautifulsoup, Scrapy or Selenium? When picking the right tool for your web scraping project these are the main 3 options that pop up, so learning when to use each one is an important skill. I go through what I think the top line is for each, and give some insight into the pros and cons and what they are best suited for.
BS4 Tips: • 5 Things You Might Not...
Scrapy for beginners: • Scrapy for Beginners -...
Selenium scraping: • How to SCRAPE DYNAMIC ...
Support me:
Amazon US: amzn.to/3cdvjEr
Amazon UK: amzn.to/3iMRtOW
Digital Ocean: m.do.co/c/c7c90f161ff6
Disclaimer: These are affiliate links and as an Amazon Associate I earn from qualifying purchases

Пікірлер: 101
@xpathservice2179
@xpathservice2179 3 жыл бұрын
3:28 you got the wrong title there... I guess it should be SELENIUM not SCRAPY
@JohnWatsonRooney
@JohnWatsonRooney 3 жыл бұрын
Oh man, you’re right. thanks for pointing that out.
@celerystalk390
@celerystalk390 3 жыл бұрын
The best video comparing web scraping tools hands down!! Thank you for another extremely useful video John!
@rrahll
@rrahll 3 жыл бұрын
Really interesting theme! Thank you for your tutorials and good luck! Really useful content!
@Kralnor
@Kralnor Жыл бұрын
Thanks for a great rundown of the options available for web scraping in Pyhton. There were a few that I was not familiar with.
@omidasadi2264
@omidasadi2264 2 жыл бұрын
gist of the tech in web crawling ... nice and easy got summarized. thanks my friend
@rudisygo6804
@rudisygo6804 2 жыл бұрын
Mr John sir. I would really liketo thank you for the service you have given us online. YOU ARE TRULY VALUED THANK YOU FROM SOUTH AFRICA. You seriously are become a role model and you teaching styles are awesome!!!!!!!!!!
@JohnWatsonRooney
@JohnWatsonRooney 2 жыл бұрын
Thank you very much! Hello to South Africa 🇿🇦
@harshgupta1999
@harshgupta1999 2 жыл бұрын
best explainer video on youtube about this topic
@RichPortah
@RichPortah 3 жыл бұрын
Another A+ video..... After I watch your videos I find a website to scrap just for fun
@ProjectXH
@ProjectXH 10 ай бұрын
Greatly informative, thank you.
@mattmovesmountains1443
@mattmovesmountains1443 3 жыл бұрын
Also, have you thought about a patreon? Your videos are consistently helpful - I'd join it around a $5/mo
@NivAwesome
@NivAwesome 3 жыл бұрын
Thanks for the information! I found another method which is not very efficient but worked for me on a small dynamic website page. I ran selenium in the background, sent keys (ctrl + a, ctrl + c) then pyperclip.paste() into a variable. Then I used re module on the string to take the information I needed. I used split method as well on the new lines to convert the string into a list with strings.
@JohnWatsonRooney
@JohnWatsonRooney 3 жыл бұрын
Cool idea - if it works for you then that’s great!
@dreamtolive4564
@dreamtolive4564 3 жыл бұрын
Really good and informative
@ericxls93
@ericxls93 2 жыл бұрын
So useful! Thank you.
@AmirAnsari-wh6gt
@AmirAnsari-wh6gt 2 жыл бұрын
you are the god of scraping :) thank you sir
@hubertcombomarketing2693
@hubertcombomarketing2693 3 жыл бұрын
Hi John. Thank you for your great work. Could you please make some short video about parsing html tables with colspan inside?
@travis.gooden
@travis.gooden 2 жыл бұрын
Wish I had watched this before choosing Selenium for a scraping project. Really feel you hit the nail on the head. Great video!
@ayeshavlogsfun
@ayeshavlogsfun Жыл бұрын
Why ? In not selenium best?
@travis.gooden
@travis.gooden Жыл бұрын
@@ayeshavlogsfun Selenium is really good for automating tasks such as browsing, interacting with elements on a page, etc. I had more trouble setting it up purely to scrape results, as that isn't it's primary focus.
@ayeshavlogsfun
@ayeshavlogsfun Жыл бұрын
@@travis.gooden which Module are you using for scraping now ?
@Berenduinelbardo
@Berenduinelbardo 3 жыл бұрын
Thanks for your videos. I can understand you very well, thank you for taking care of your pronunciation, I am Spanish and we do not have Spanish-speaking channels as good as yours. Keep it up.
@JohnWatsonRooney
@JohnWatsonRooney 3 жыл бұрын
Thank you!
@alexgarov9164
@alexgarov9164 10 ай бұрын
Si encuentras alguno, por favor comparte! jajaja un saludo
@Actanonverba01
@Actanonverba01 Жыл бұрын
Very useful for beginners
@affanahmed9318
@affanahmed9318 Жыл бұрын
most of the time what I do is use selenium to get me where I want then extract what I want by making soup of the page using beautifulsoup extracting specific tag info then afterwards using pandas to save the list data in data frame and exporting it out as a csv or excel .
@ankushgaur9367
@ankushgaur9367 3 жыл бұрын
Helpful. Subscribed!
@e4rohan
@e4rohan 3 жыл бұрын
great video thanks for the summary
@JohnWatsonRooney
@JohnWatsonRooney 3 жыл бұрын
Glad it was helpful!
@omkarpatil9234
@omkarpatil9234 3 ай бұрын
Scrapy for me is like second to none. Its a total beast if you are able to use it to its full potential
@JohnWatsonRooney
@JohnWatsonRooney 3 ай бұрын
Yes I agree, have come around to it and using it a lot more now
@omkarpatil9234
@omkarpatil9234 3 ай бұрын
@@JohnWatsonRooney So now i am finding that to extract text and parse html 'beautifulSoup' is really good than Scrappy which also confirms what you said in your video ! So right now for my task i am using beautifulSoup (for parsing) and Scrappy(for running a headless browser), Which works fine but I am curious if you know any more easy techniques to parse html using scrappy especially to get text. Please let me know if you have any thoughts . Thanks you !
@hudsona4004
@hudsona4004 2 жыл бұрын
You mentioned that selenium sends information about itself to websites being scraped, so that websites could detect that selenium is being used. I'm curious if you know more about this and any workarounds?
@cnsnipon
@cnsnipon Жыл бұрын
I liked a very didactic explanation
@GelsYT
@GelsYT 2 жыл бұрын
I have been using requests along with bs4 & I did heard about scrapy and I agree it isn't good for beginners so I was a beginner that time it really was daunting. But now I think it's time to scrapy time
@JohnWatsonRooney
@JohnWatsonRooney 2 жыл бұрын
It really is very powerful when you understand it!
@thulasirao9139
@thulasirao9139 3 жыл бұрын
I had a confusion which one should I choose for scarping?. Thank you.
@YamekDrope
@YamekDrope 3 жыл бұрын
Can we get more information on the method mentioned at the end? How can we simulate requests?
@lautarob
@lautarob 2 жыл бұрын
Another very educative video of yours. Thanks!!. Question: what would be the best method to scrape a site that has a huge table that is spread across many pages, in a way that (for example) only 30 rows are shown in the first page (say 1 - 30) then the next 30 on the 2nd page (31-60) etc. etc. all the way up to over 10,000 rows or more. Can I move to the next page using Scrapy of BS4 or I need Selenium for that purpose?.
@JohnWatsonRooney
@JohnWatsonRooney 2 жыл бұрын
That depends on the way the page is getting the data. I suspect it’s via Ajax - check out my newer video on scraping JavaScript tables that might help you
@MirGlobalAcademy
@MirGlobalAcademy Жыл бұрын
Facing the same problem
@jjeffery129
@jjeffery129 2 жыл бұрын
It seems easier to use selenium to scrape google map by searching different zipcodes for gas prices but it’s too slow. Can scrapy be able to interact with the website like searching different inputs, or it is better to just use google api?
@ivy4372
@ivy4372 2 жыл бұрын
Thanks for the whole content! I got a question and it would be very helpful for me, if you can support: I have to scrape a dynamic website. If I scroll down, more objects are loading to this page (always 50 new). When I look in "developers" of my browser, I find the Data I need in the folder "XHRs" and with every scroll for new 50 objects, there is a new file called "730" with the new 50 objects in json-format. I need all the 730-files. do you know how to scrape them?
@JohnWatsonRooney
@JohnWatsonRooney 2 жыл бұрын
Sure check out this video I did here it covers how to get that information Always Check for the Hidden API when Web Scraping kzfaq.info/get/bejne/etekn7Vh3pbXpaM.html
@manishdaga
@manishdaga 2 жыл бұрын
Greate Video
@the1gofer
@the1gofer 3 жыл бұрын
Thanks!
@md.sharifulislamshanto2103
@md.sharifulislamshanto2103 3 жыл бұрын
What are you thinking about 'ParseHub'? Is it well enough to use in professional field?
@renatosardinhalopes6073
@renatosardinhalopes6073 2 жыл бұрын
Wait, isn't the last method just requests with beautiful soup? What you mean when you compare beautiful soup against scrapy then?
@mohitsharmagarg
@mohitsharmagarg Жыл бұрын
what would you recommend for creating scraping tool ?
@vitorrochatech
@vitorrochatech 8 ай бұрын
Hello John, Thanks for the video! I have two questions: What the best tool for scrapper a website with login or autentication? And, when the website use a api with autentication, what can i use?
@JohnWatsonRooney
@JohnWatsonRooney 8 ай бұрын
Easiest option is to use selenium or playwright, but it can be done with requests too - you’d need to find the login endpoint and send the credentials over.
@ahmedmando9502
@ahmedmando9502 Жыл бұрын
thanks bro
@Klausi-uq4xq
@Klausi-uq4xq 3 жыл бұрын
It depents on the Site and what infos i need..at the time in prefer Scrapy..
@adamchurchwell4019
@adamchurchwell4019 2 жыл бұрын
do you have any videos on scrape masking?
@abidhossain5527
@abidhossain5527 2 жыл бұрын
Can you make a video about recent scrapy-playwright bug about implementing scrapy-playwright setting implementation and some books or resources to learn scrapy.
@nachoeigu
@nachoeigu 2 жыл бұрын
There are some things which I can't understand. What if I have to scrape a website that use Javascript (if I make a request, I receive a part of the content of the page)? The unique solution is to use Selenium or, with Scrapy, you can handle it without problem?
@JohnWatsonRooney
@JohnWatsonRooney 2 жыл бұрын
You’ll need to use something to render the JavaScript out for you, and return all the page data as html to parse. That thing is a browser - but in some cases a smaller lighter version of a browser that runs headless (we don’t see it). That’s like splash or puppeteer (that’s what requests-html uses) or it could be selenium that we can control
@user-mc9ws5gx9o
@user-mc9ws5gx9o 3 жыл бұрын
Hi John, I missed you lately because I got busy. I want to tell you Happy New Year. And I want to ask how I can benefit financially from web sides scraping Regards Waleed
@JohnWatsonRooney
@JohnWatsonRooney 3 жыл бұрын
Happy new year to you too. To start I’d say try to get some paid work scraping data that people need, then try to build something useful with data you scrape and charge for the service
@sachinshavi2243
@sachinshavi2243 3 жыл бұрын
Sir I getting an error while running scrapy project.. error is Scrapy 2.4.1 -no active project Unknown command: crawl Use "Scrapy" to see available commands
@FishingWorlds
@FishingWorlds 3 жыл бұрын
I was trying to get things done last 20 minutes with BeautifulSoup but I have to press an accept button on wozwaardeloket.nl and the site is made in JSP so that mean's BeatifulSoup will not be able to post form data to one page and than post another form data to the new page right?
@Kig_Ama
@Kig_Ama 2 жыл бұрын
Great.
@DashaZakella1001
@DashaZakella1001 2 жыл бұрын
what tool i can use to bypass blocking websaits if it understands that I m using automation tools?
@lurkagurka
@lurkagurka 3 жыл бұрын
So using Scrapy won't get u blocked as from using Selenium?
@ShivamSharma-if1oh
@ShivamSharma-if1oh Жыл бұрын
I want to scrape data from infinite scrol website which library should I choose
@Hyuts
@Hyuts Жыл бұрын
Great video thank you. Thoughts on AutoHotkey?
@JohnWatsonRooney
@JohnWatsonRooney Жыл бұрын
I've never used it sorry!
@ahmedchokri3240
@ahmedchokri3240 2 жыл бұрын
Hey bro, i need some help, I'm working on a project, a part from it is getting some data from Instagram and put them Into my Web app of course they must be always updated.. In this case i think selenium is low but i need it to connect to Instagram account also i need http request... So plz advise me..
@sachinshavi2243
@sachinshavi2243 3 жыл бұрын
Sir how to take paragraphs one by one .?
@d8nnii_
@d8nnii_ 3 жыл бұрын
How do I scrape from reebonz.com? They added a layer of protection from a vendor (which I can’t remember) that renders their site almost impossible to scrape.
@JohnWatsonRooney
@JohnWatsonRooney 3 жыл бұрын
look for the api the site is making requests too (network tab of inspect element) and start to make them yourself from your code
@proribrajokproribrajok7789
@proribrajokproribrajok7789 2 жыл бұрын
Is node js scraping is like python scrapy? which one better performe?
@JohnWatsonRooney
@JohnWatsonRooney 2 жыл бұрын
The principles are the same, you make a request and receive data. I don’t know JavaScript well really to comment - but as far as I’m concerned you can’t go wrong with Scrapy - it’s built specifically to scrape data after all
@-__--__aaaa
@-__--__aaaa 3 жыл бұрын
bro what if captcha v2/v3 in some website . How to get rid of that
@-__--__aaaa
@-__--__aaaa 3 жыл бұрын
@Neo No dude i need to do manually
@depenz
@depenz 2 жыл бұрын
requests + bs4 VS scrapy?
@darqlite6780
@darqlite6780 2 жыл бұрын
you mentioned that if you needed to click a button or input in a field than selenium could be what you're after, does that mean that you _can't_ accomplish that with, say, scrapy and some addons?
@JohnWatsonRooney
@JohnWatsonRooney 2 жыл бұрын
Well, yes you could - depending on what it is. You can write LUA scripts for Splash that can simulate that, or if you are find a way around having to actualy click something, like getting the data elsewhere, or by finding the url that the data comes from you could get around it. There are some libraries that allow some control over these things but are all based around a browser somehow, like Mechanical Soup
@arjay_2002
@arjay_2002 3 жыл бұрын
Please tutorial on your command terminal on your windows
@JohnWatsonRooney
@JohnWatsonRooney 3 жыл бұрын
Sure, I could do a setup video
@arjay_2002
@arjay_2002 3 жыл бұрын
@@JohnWatsonRooney thank you!! It looks really neatt
@mattmovesmountains1443
@mattmovesmountains1443 3 жыл бұрын
Do you have a video that goes over the best scraper/tool for websites that have a constantly changing text element? Stock prices being the most well known example of this. I'm making something to scrape a "freefall auction" (price drops until someone buys one, or until it hits a predetermined low) and gather the lowest prices reached for multiple auction lots. I love using requests-html but it seems that it only captures the initial state of a rendered page, rather than any updates that occur once loaded. My guess is do basic gathering info with requests-html, then grab prices with selenium, which is my current approach, but wanted to check with the expert!
@JohnWatsonRooney
@JohnWatsonRooney 3 жыл бұрын
Sounds like a cool project - if you want to email me the site I can have a look and see what I think? Email on my KZfaq main channel page
@mattmovesmountains1443
@mattmovesmountains1443 3 жыл бұрын
@@JohnWatsonRooney Will do - thanks!
@NivAwesome
@NivAwesome 3 жыл бұрын
@@JohnWatsonRooney May I email you a similar thing as well? Found it difficult with scripts, couldn't reach the page source with python (I think they rejected me because it's headless) and couldn't render it with requestsHTML...
@0xmatriksh
@0xmatriksh 3 жыл бұрын
what about creating a video on this?
@ss877S
@ss877S 10 ай бұрын
@@JohnWatsonRooney Sir can we use Beautiful soup for web scraping stock prices .Please help
@higheringai68
@higheringai68 2 жыл бұрын
Who might be using IBM Watson Discovery?
@helovesdata8483
@helovesdata8483 2 жыл бұрын
Overall I think one should take the time to learn Scrapy if you need to web scrape for a job. It will be worth it in the long run. What do you think? (as I move over to your scrap for beginners video) LOL
@JohnWatsonRooney
@JohnWatsonRooney 2 жыл бұрын
I think once you have the principles down, or if you are already proficient with Python, then definitely learn Scrapy
@alihusham1560
@alihusham1560 3 жыл бұрын
So scrapy is the best
@mastersdubai4729
@mastersdubai4729 2 жыл бұрын
Video very interesting,,plz advise how I can scrap following , followers profiles in Instagram ,,plz
@sachinshavi2243
@sachinshavi2243 3 жыл бұрын
How to get unformatted data into format data .?
@roberttuttle4284
@roberttuttle4284 10 ай бұрын
I am new to scraping. If you have a dynamic website that requires you input dates or numbers and click on buttons, what else besides selenium works? Does Beutiful Soup work? Very interested.
Login and Scrape Data with Playwright and Python
10:22
John Watson Rooney
Рет қаралды 100 М.
The Biggest Mistake Beginners Make When Web Scraping
10:21
John Watson Rooney
Рет қаралды 99 М.
Would you like a delicious big mooncake? #shorts#Mooncake #China #Chinesefood
00:30
100❤️
00:19
Nonomen ノノメン
Рет қаралды 38 МЛН
it takes two to tango 💃🏻🕺🏻
00:18
Zach King
Рет қаралды 30 МЛН
Scrape LIVE scores - No BeautifulSoup or Selenium NEEDED!
15:44
John Watson Rooney
Рет қаралды 49 М.
EASIEST way to web scraping using Playwright!
29:15
Marius Espejo
Рет қаралды 9 М.
Python Tutorial: Web Scraping with BeautifulSoup and Requests
45:48
Corey Schafer
Рет қаралды 1,1 МЛН
Scraping Dynamic JavaScript Websites - Beautiful Soup Python
11:38
Scrapy for Beginners - A Complete How To Example Web Scraping Project
23:22
John Watson Rooney
Рет қаралды 257 М.
Industrial-scale Web Scraping with AI & Proxy Networks
6:17
Beyond Fireship
Рет қаралды 692 М.
Web Scraping to CSV | Multiple Pages Scraping with BeautifulSoup
29:06
Kalem ile Apple Pen Nasıl Yapılır?😱
0:20
Safak Novruz
Рет қаралды 1,2 МЛН
Samsung or iPhone
0:19
rishton vines😇
Рет қаралды 9 МЛН
С Какой Высоты Разобьётся NOKIA3310 ?!😳
0:43