What I'd Add FIRST To a new Scrapy Project

  Рет қаралды 33,398

John Watson Rooney

John Watson Rooney

Күн бұрын

In my last Scrapy video we created a basic project from scratch but found some limitations. In this episode we will go through how to use Items and the Itemloader classes in Scrapy to make our project better. The Items class allows us to define fields for our data within our items.py, and utilises the ItemLoader to help us clean the data before loading it ready for use.
Scrapy p1: • Scrapy for Beginners -...
-------------------------------------
Disclaimer: These are affiliate links and as an Amazon Associate I earn from qualifying purchases
-------------------------------------
Digital Ocean (Cloud Servers, Affiliate Link) - m.do.co/c/c7c9...
Sound like me:
microphone amzn.to/36TbaAW
mic arm amzn.to/33NJI5v
audio interface amzn.to/2FlnfU0
-------------------------------------
Video like me:
webcam amzn.to/2SJHopS
camera amzn.to/3iVIJol
studio lights amzn.to/3aBpKik
small lights amzn.to/2GN7INg
-------------------------------------
PC Stuff:
case: amzn.to/3dEz6Jw
psu: amzn.to/3kc7SfB
cpu: amzn.to/2ILxGSh
mobo: amzn.to/3lWmxw4
ram: amzn.to/31muxPc
gfx card amzn.to/2SKYraW
27" monitor amzn.to/2GAH4r9
24" monitor (vertical) amzn.to/3jIFamt
dual monitor arm amzn.to/3lyFS6s
mouse amzn.to/2SH1ssK
keyboard amzn.to/2SKrjQA

Пікірлер: 65
@janekstern
@janekstern Жыл бұрын
You videos helped me understand scrapy more than any other resource, ty!
@linuxinstalled
@linuxinstalled 3 жыл бұрын
I wish this video had more exposure. I greatly appreciate that you took the time to put this series together. Being able to see these examples of the various mechanics behind scrapy has been hugely helpful. Thank you again.
@JohnWatsonRooney
@JohnWatsonRooney 3 жыл бұрын
Glad you enjoyed it! Thank you
@davyroger3773
@davyroger3773 3 жыл бұрын
Thanks! the documentation did not go into enough depth and im glad someone made a comprehensive video on it
@victormaia4192
@victormaia4192 3 жыл бұрын
Great tutorial! very easy to follow, had no problems, about the typos, I'm the worst typer ever, but tabnine always saves my life.
@shihlun5291
@shihlun5291 Жыл бұрын
Thanks for the tutorial, after watching it, it gave me a better understanding of scrapy itemloader documents.
@woldemarkiev
@woldemarkiev 2 жыл бұрын
Great tutorial!! It really helps to understand
@amineboutaghou4714
@amineboutaghou4714 3 жыл бұрын
Another great video ! Very well done John 👏🏼
@gwulfwud
@gwulfwud 2 жыл бұрын
Thank you! I watched the previous video and then this, and it felt like I know so much about scrapy already. Really really good videos. Keep it up!
@codewithnacho
@codewithnacho 3 жыл бұрын
Awesome vid! It answered my questions with Item Loaders. Docs were confusing me haha
@JohnWatsonRooney
@JohnWatsonRooney 3 жыл бұрын
I know! The docs are good but also, not so good haha
@carinafelnecan7802
@carinafelnecan7802 Жыл бұрын
Thank you, I learned a lot from this video:)
@cosmicblack
@cosmicblack 2 жыл бұрын
Great video. Thanks!!!
@yangvictor5349
@yangvictor5349 Жыл бұрын
thank you for sharing
@hendrikfeddersen6768
@hendrikfeddersen6768 3 жыл бұрын
Thanks a lot. The videos are very clear. Do you mind explaining please in one of your next videos the correct folder structure of a Scrapy project and what file goes where and why.
@milank9857
@milank9857 2 жыл бұрын
Great explanation as always, really helpful tutorial
@nadyamoscow2461
@nadyamoscow2461 3 жыл бұрын
Thanks a lot, what you do is amazing.
@vidproli4231
@vidproli4231 3 жыл бұрын
great tutorial, explain the exact thing I was looking for, thank you
@vitalij09
@vitalij09 3 жыл бұрын
Thanks man!
@NatureLover02005
@NatureLover02005 3 жыл бұрын
Excellent!!!
@sheikhakbar2067
@sheikhakbar2067 3 жыл бұрын
Thanks a lot, that was very helpful.
@ferilukmansyah3037
@ferilukmansyah3037 3 жыл бұрын
thanks for best tutorial
@alfakih7247
@alfakih7247 Жыл бұрын
More scrappy blog please
@JnWayn
@JnWayn 2 жыл бұрын
Nice to know what the competition is. I got a wisdom tooth. Is it possible with Scrapy to mark a checkbox, then click a button to get to the next page?
@justinames5439
@justinames5439 2 жыл бұрын
As the others have said, thanks for your time and effort, a great help. The links connecting to Amazon (e.g. the lighting link) are dead, and you might want to update them. On another front, have you added a video on caching? All in all, really well done, and, again, thanks. jA
@JohnWatsonRooney
@JohnWatsonRooney 2 жыл бұрын
Thanks, one of the issues with a lot of the scrapers I wrote is that always age well! I haven’t actually done anything in caching yet no, I’ll add it to my list
@abukaium2106
@abukaium2106 3 жыл бұрын
Great video. I wish a video of scrapy using proxy from you
@MohAmuza
@MohAmuza 2 жыл бұрын
I scraped a product and some items don't have some data so the result is a nonetype which means None, I created in the items.py a function to check if it is None print something: def check_gift(value): if value is None: return "No gift" else: return value but it don't work where is the problem?
@leleemagnu6831
@leleemagnu6831 3 жыл бұрын
John, Another great video. In the title the first word should read Scrapy or the video won't come up in a search. Let me wish you a, well deserved, fantastic Christmas ! e
@JohnWatsonRooney
@JohnWatsonRooney 3 жыл бұрын
Oh wow I didn’t notice! Thank you for pointing that out, I’ve changed it. Happy Christmas to you too!
@GordonShamway1984
@GordonShamway1984 3 жыл бұрын
Super
@mandela_byron
@mandela_byron 2 жыл бұрын
Hello John, could you do a video on how to host the scrapy scripts
@JohnWatsonRooney
@JohnWatsonRooney 2 жыл бұрын
Hi! Yes I've been wanting to cover this for a while, unfortunatley ScrapyD doesn;t work with the latest version of Scrapy, so the best alternative I could come up with was hosting the Spider on a Linux server and using a cronjob to run it every X hours. Would that be of interest?
@mandela_byron
@mandela_byron 2 жыл бұрын
@@JohnWatsonRooney Sounds great. Looking forward to that. I've been having challenges as to how best to host my scraping scripts, I know there's some among us who also face the same challenge. Thanks, your efforts are much appreciated
@TheWhoIsTom
@TheWhoIsTom 3 жыл бұрын
Nice tutorial!! Would be nice if you would show how to store the data of THIS code (item loader) into mongo DB. :)
@JohnWatsonRooney
@JohnWatsonRooney 3 жыл бұрын
Thanks! Sure, I’m going to extend this project to cover more of Scrapy’s features, including pipelines and databases
@TheWhoIsTom
@TheWhoIsTom 3 жыл бұрын
@@JohnWatsonRooney Awesome. Thanks a lot :)
@alexportugal3986
@alexportugal3986 Жыл бұрын
Hi, i just don't quite get why you use the itemloader part and all of that stuff when you can do it within the parse function. Seems to me that it gets more complicated to get the same result. Surely there is something I am missing
@maheshsharma-zq2uc
@maheshsharma-zq2uc 2 жыл бұрын
Can you make one project with scrappy to extract stocks information along with historical data
@abdulcute
@abdulcute 3 жыл бұрын
Best Vid for scrapy and best explanation @john Watson Rooney and others i have a one question along item loader that how we extract data if the element have more than one information (e.g. if element have two cell no then Item loader pick only first number not second one) as i learned from you previous vid we use getall()
@karthikkarthik100
@karthikkarthik100 Жыл бұрын
Thanks for the informative video, Can't we just write if next_page: instead of if next_page is not None ?
@thewheeldeal8439
@thewheeldeal8439 2 жыл бұрын
This is a great video thanks! Question: Can scrapy save item objects to pickle binary files? If so, how? I just find it really convenient to save my scraped data into pickled objects that can be used quickly in other files, but I can't find any doc on that for scrapy...
@user8ZAKC1X6KC
@user8ZAKC1X6KC 2 жыл бұрын
I am having an issue where it seems like fetch(req) is the going a bit too fast, so it's only catching part of the page. Is there a way to slow it down? I can find it for when the crawler is working, but not for when you're scraping the shell. Thoughts?
@alessandr2
@alessandr2 3 жыл бұрын
Thanks for the tutorial!! One question, what part of the new code prevents the error to appear if there is no price info?? Thanks in advance !!!
@kevin_daang
@kevin_daang 2 жыл бұрын
If i wanted to include when a whisky bottle was sold out, how would i do it with the item loader?
@Daviuliano
@Daviuliano Жыл бұрын
Super nice, however I am struggling to understand how would that work with a dynamic website where I am following a GET method which returns a data in json format. I do a bit of working around and convert it to a dictionary - but can’t seem to get it to return an item… any ideas that can help me?
@JohnWatsonRooney
@JohnWatsonRooney Жыл бұрын
I think you'd still need to parse through the JSON and then load it into the item loader and item, it's been a while since I've done that though so not 100% sure sorry
@Daviuliano
@Daviuliano Жыл бұрын
@@JohnWatsonRooney thank you… I managed to do it now. Had to yield them all individually. But it’s working 👍🏼
@fatihkarakus6189
@fatihkarakus6189 Жыл бұрын
@@JohnWatsonRooney when i import items I get an error like this: attempted relative import with no known parent package how can i solve this error
@Scuurpro
@Scuurpro 2 жыл бұрын
How would change a stock item in item loader. It only returns "In Stock" or " " when things are out of stock. Would I create a function with a value and if else statement?
@KhalilYasser
@KhalilYasser 3 жыл бұрын
Amazing tutorial. Thank you very much. Can you share the code as usual?
@JohnWatsonRooney
@JohnWatsonRooney 3 жыл бұрын
Yes, sure I've updated my repo here: github.com/jhnwr/whiskyspider
@dokanplugincustomization1587
@dokanplugincustomization1587 2 жыл бұрын
Awesome Playlist But i have one question ( products which are sold out they are not giving us any data in its price field i tried to place the alternative value something which you have done in previous vedio using try and except block ) But i failed to do so please guide me
@dokanplugincustomization1587
@dokanplugincustomization1587 2 жыл бұрын
Sold out products are only giving the output of name and link only
@salimbo4577
@salimbo4577 3 жыл бұрын
thank you so much. is there a way i can scrap audio data like sound data ?
@Abdul_Rafay_Pal
@Abdul_Rafay_Pal Жыл бұрын
what would you recommend? splash or playwright?
@JohnWatsonRooney
@JohnWatsonRooney Жыл бұрын
Playwright is my go to now
@Abdul_Rafay_Pal
@Abdul_Rafay_Pal Жыл бұрын
@@JohnWatsonRooney Thank you very much🥰
@fatihkarakus6189
@fatihkarakus6189 Жыл бұрын
when i import items I get an error like this: attempted relative import with no known parent package how can i solve this error
@ShahidulsPerspective
@ShahidulsPerspective 2 жыл бұрын
How to save the URL of the extracted page when using itemloader.
@ShahidulsPerspective
@ShahidulsPerspective 2 жыл бұрын
I got that. its: l.add_value("url", response.url)
@isabelsilva-wf8vg
@isabelsilva-wf8vg 2 жыл бұрын
how do I use this on the xpath, I tried but it didnt work exactily like this {l.add_xpath( ' title ' , ' .//h1[@class="product__title"]')}
@dcevansuk
@dcevansuk 3 жыл бұрын
Another Excellent Video!!! I have one question; This is working with the parent URL data, is there a way to also use ItemLoader() with the associated child URL scraped data to end up with one combined yield l.load_item()? It could be an interesting video.
Get Started with Scrapy - Python's Best Web Scraping Framework
23:13
John Watson Rooney
Рет қаралды 18 М.
2608202400000G280
2:05
Pedro Noel Irey S.R.L.
Рет қаралды 2,4 М.
Gli occhiali da sole non mi hanno coperto! 😎
00:13
Senza Limiti
Рет қаралды 16 МЛН
Они так быстро убрались!
01:00
Аришнев
Рет қаралды 3,1 МЛН
Logo Matching Challenge with Alfredo Larin Family! 👍
00:36
BigSchool
Рет қаралды 21 МЛН
Is Your Scraper Slow? Try THIS Simple Method
10:43
John Watson Rooney
Рет қаралды 5 М.
Google News scraper - Scrape news data to Excel (NoCode)
1:13
The most important Python script I ever wrote
19:58
John Watson Rooney
Рет қаралды 184 М.
ArchTitus Revamp
Titus Tech Talk
Рет қаралды 152
Item Loaders in Scrapy
47:51
Code [RE] Code
Рет қаралды 3,5 М.
Scrapy Basics - How to Get Started with Python's Web Scraping Framework
20:30
Crawl and Follow links with SCRAPY - Web Scraping with Python Project
15:47
John Watson Rooney
Рет қаралды 38 М.
Beautifulsoup vs Selenium vs Scrapy - Which Tool for Web Scraping?
6:54
John Watson Rooney
Рет қаралды 76 М.
Слетела прошивка на LiXiang L7
1:01
Настя ЧПЕК Туман
Рет қаралды 3,9 МЛН
ГОТОВЫЙ ПК с OZON за 5000 рублей
20:24
Ремонтяш
Рет қаралды 324 М.
Что делать если в телефон попала вода?
0:17
Лена Тропоцел
Рет қаралды 4,7 МЛН