Web Scraping with GPT-4 Vision AI + Puppeteer is Mind-Blowingly EASY!

  Рет қаралды 25,371

ByteGrad

ByteGrad

Күн бұрын

👉 Thanks to video sponsor Bright Data! Get $10 free credit: brdta.com/bytegrad1
👉 NEW React & Next.js Course: bytegrad.com/courses/professi...
Hi, I'm Wesley. I'm a brand ambassador for Kinde (paid sponsorship). Add authentication to your app with Kinde fast: bit.ly/3QOe1Bh
👉 NEW React & Next.js Course: bytegrad.com/courses/professi...
👉 Professional JavaScript Course: bytegrad.com/courses/professi...
👉 Professional CSS Course: bytegrad.com/courses/professi...
👉 Web development roadmap 2024 & 2025: email.bytegrad.com
👉 Email newsletter (BIG update soon): email.bytegrad.com
👉 Discord: all my courses have a private Discord where I actively participate
⏱️ Timestamps:
00:00 Intro
00:48 ChatGPT for HTML
02:01 OpenAI API
04:08 Puppeteer
04:28 Avoid restrictions (Bright Data)
05:11 Get HTML with Puppeteer + Proxy
08:35 Processing HTML
10:00 Give HTML to OpenAI
12:19 ChatGPT for scraper code
14:41 Vision API
19:40 Vision API pricing
21:12 Vision vs Text pricing
22:52 Future of web scraping (AI + Bright Data)
#webdevelopment #programming #coding #reactjs #nextjs

Пікірлер: 35
@ByteGrad
@ByteGrad 6 күн бұрын
Hi, my latest course is out now (Professional React & Next.js): bytegrad.com/courses/professional-react-nextjs -- I'm very proud of this course, my best work! I'm also a brand ambassador for Kinde (paid sponsorship). Check out Kinde for authentication and more bit.ly/3QOe1Bh
@hxxzxtf
@hxxzxtf 2 ай бұрын
🎯 Key Takeaways for quick navigation: 00:00 *🌐 Web scraping has been revolutionized by AI, particularly with the latest Vision AI model, making data extraction more efficient.* 01:07 *💻 Manually copying HTML and using Chat GPT for extraction is one method, but OpenAI's API offers programmable solutions for scalability.* 02:16 *🔄 Using Puppeteer with Bright Data's scraping browser helps circumvent website restrictions and rate limiting during scraping.* 05:33 *🖥️ Puppeteer allows for easy scraping of HTML content, but there's a need to manage and clean up the extracted data before analysis.* 08:35 *💡 Extracting only necessary data from HTML can optimize costs when using OpenAI's models for analysis.* 12:17 *💰 Text-based scraping methods can be cost-effective, but they require ongoing maintenance due to HTML structure changes.* 14:49 *📸 Utilizing OpenAI's GPT-4 Vision API enables data extraction from screenshots, potentially offering a more robust solution for complex web scraping tasks.* 17:52 *🖼️ Using base64 encoding allows passing images to models, enhancing data processing capabilities.* 18:49 *💸 Consider cost-effectiveness when choosing between complex HTML-based or text-based approaches for web scraping.* 19:58 *🎚️ Adjusting image resolution can significantly decrease token usage in web scraping, but it may increase the likelihood of errors.* 20:53 *🖼️🔄 Balance image resolution and price when utilizing Vision API for web scraping, as higher resolution images incur higher costs.* 21:19 *🧹 Clean up HTML before web scraping to reduce token usage and ensure accuracy in results.* 22:57 *🤖 Explore advanced features of AI tools, such as identifying clickable elements, to enhance web scraping automation.* Made with HARPA AI
@sidouglas
@sidouglas 2 ай бұрын
This is such a timely video - i'm doing something similar to resurrect a website from the wayback machine.
@dmitriydorogonov7918
@dmitriydorogonov7918 2 күн бұрын
Perfect video, thanks
@nomicgaming5730
@nomicgaming5730 2 ай бұрын
thank you a lot ♥
@imranhrafi
@imranhrafi 2 ай бұрын
It's interesting, but what if I want pagination? I will still need to select next button in old way. Is there any other way of doing the pagination?
@reidevanson181
@reidevanson181 2 ай бұрын
what an amazing video - like its so niche but so useful
@ByteGrad
@ByteGrad 2 ай бұрын
Glad you liked it
@felipeblin8616
@felipeblin8616 Күн бұрын
Great video. Some question though. What about hallucinating? How can be sure is not doing it?
@hellokevin_133
@hellokevin_133 2 ай бұрын
Hey man, mind if I ask what programming languages you know other than Javascript/TS ?
@juliushernandez9855
@juliushernandez9855 Ай бұрын
Can you create a video how to deploy puppeteer and next js to vercel?
@Garejoor
@Garejoor 2 ай бұрын
can crewAI do this as well?
@hishamazmy8189
@hishamazmy8189 Ай бұрын
amazing
@RobShocks
@RobShocks Ай бұрын
Have you thought about or tried using a local model to scrape, it would save all the costs
@subhranshudas8862
@subhranshudas8862 2 ай бұрын
how do you handle paginated data?
@binhtruongdac2861
@binhtruongdac2861 2 ай бұрын
You just need to use the URL with page number in query params then run for loop to request multiple html page
@user-vo8kk8dv8l
@user-vo8kk8dv8l 2 ай бұрын
elegant
@dmytroocheretianyi7577
@dmytroocheretianyi7577 Ай бұрын
Perhaps it will be cheaper on Claude.
@LifeTrekchannel
@LifeTrekchannel Ай бұрын
How to do this using Braina AI? Braina can run GPT-4 Vision.
@Lars16
@Lars16 2 ай бұрын
This is a great video. But the problem with scraping has hardly ever been parsing the HTML or maintaining the parsers. The biggest problem is efficiently accessing websites that actively try to block you by gating their content being a login or captchas. Then comes IP blocking (or worse data obfuscation) if you Scrape their website in a large volume.
@binhtruongdac2861
@binhtruongdac2861 2 ай бұрын
That’s why you need smth like Bright Data, yes, it’s not free unfortunately
@karenapatch1952
@karenapatch1952 4 күн бұрын
Octoparse can deal with this, and it's free. No thanks
@laihan4469
@laihan4469 10 күн бұрын
How a full stack dev work with AI?
@amadeuszg1491
@amadeuszg1491 2 ай бұрын
I am interested in creating a price comparison website featuring approximately 10-20 shops, each offering around 10,000 similar products. Unfortunately, these shops do not provide APIs for direct access to their data. What would be the most efficient approach to setting up such a website while keeping maintenance costs reasonable?
@Braincompiler
@Braincompiler 2 ай бұрын
Make it like the other comparison sites and provide an upload for CSV, XML and so on or YOU provide the API for them so their shop systems can push the data ;) Crawling by yourself is the last option and could be made with XPath and stuff.
@amadeuszg1491
@amadeuszg1491 2 ай бұрын
@@Braincompiler Yes, but in this case store needs to send me the csv, xml file with their products. What if they dont?
@Braincompiler
@Braincompiler 2 ай бұрын
@@amadeuszg1491 Yes of course. If your comparison site has a benefit for them be sure they will.
@abhisycvirat
@abhisycvirat 2 ай бұрын
I did this 6 years ago, scraped each website and compared the price using SKU
@amitjangra6454
@amitjangra6454 Ай бұрын
I am scrapping (dropping html) with python code with selenium (aprrox 60,000 articles) and later creating vector embeddings for Llama 3 and asking it to write article for me.
@richerite
@richerite 17 күн бұрын
Do you have a GitHub link? What did you mean write article
@5minutes106
@5minutes106 8 күн бұрын
We're you able to scrape 60,000 articles without getting your IP address blocked ? That's impressive if you did
@OnlyUseMeEquip
@OnlyUseMeEquip Сағат бұрын
@@5minutes106 obviously not, you just rotate proxies
@UserAliyev
@UserAliyev 2 ай бұрын
First
@semyaza555
@semyaza555 2 ай бұрын
2nd
The Biggest Issues I've Faced Web Scraping (and how to fix them)
15:03
This AI Agent can Scrape ANY WEBSITE!!!
17:44
Reda Marzouk
Рет қаралды 34 М.
FOOTBALL WITH PLAY BUTTONS ▶️ #roadto100m
00:29
Celine Dept
Рет қаралды 75 МЛН
100❤️
00:19
Nonomen ノノメン
Рет қаралды 38 МЛН
GPT-4o is WAY More Powerful than Open AI is Telling us...
28:18
MattVidPro AI
Рет қаралды 236 М.
26 Incredible Use Cases for the New GPT-4o
21:58
The AI Advantage
Рет қаралды 630 М.
How to use AI for Web Scraping with Excel and Google Sheets
5:19
Peter Mangan
Рет қаралды 11 М.
GPT-4 Vision API + Puppeteer = Easy Web Scraping
56:25
Unconventional Coding
Рет қаралды 164 М.
Web Developer Roadmap (2024) - Everything is Changing
25:02
ByteGrad
Рет қаралды 264 М.
GPT4V + Puppeteer = AI agent browse web like human? 🤖
24:48
All 29 Next.js Mistakes Beginners Make
1:45:10
ByteGrad
Рет қаралды 53 М.
How I Made AI Assistants Do My Work For Me: CrewAI
19:21
Maya Akim
Рет қаралды 692 М.
Unlimited AI Agents running locally with Ollama & AnythingLLM
15:21
FOOTBALL WITH PLAY BUTTONS ▶️ #roadto100m
00:29
Celine Dept
Рет қаралды 75 МЛН