I Made a FAST Search Engine

  Рет қаралды 126,836

conaticus

conaticus

Күн бұрын

Get $15 free credits with BrightData: brdta.com/conaticus1
BrightData KZfaq Channel: @BrightData
TF-IDF Blog Post: janav.wordpress.com/2013/10/2...
Lemmetization Word Lists: github.com/michmech/lemmatiza...
Crawler Repository: github.com/conaticus/search-e...
API Repository: github.com/conaticus/search-e...
Client Repository: github.com/conaticus/search-e...
Discord: / discord
Github: github.com/conaticus
Twitter: / conaticus
Join this channel to get access to perks:
/ @conaticus
I Made a FAST Search Engine
0:00 Intro
0:20 BrightData
2:10 Inverse Term Frequency & Indexing
6:41 Page Ranking & Lemmetization

Пікірлер: 169
@conaticus
@conaticus Ай бұрын
Start building awesome projects with $15 free credits using BrightData today: brdta.com/conaticus1
@AWIRE_onpc
@AWIRE_onpc Ай бұрын
no
@xulaxwtf
@xulaxwtf Ай бұрын
no
@aryanszone4963
@aryanszone4963 Ай бұрын
no
@lifeofme702
@lifeofme702 Ай бұрын
I don't know what this guy said, and still was mind-blown of all the effort this guy puts
@conaticus
@conaticus Ай бұрын
Thanks much so 🙏 It would not be possible without your support
@jaymarksum6542
@jaymarksum6542 Ай бұрын
I’m impressed, can’t wait to see you build a multithreaded web server in assembly
@da40au40
@da40au40 Ай бұрын
Why do I find it super funny 😅😅😅.
@ArthursHD
@ArthursHD Ай бұрын
@@da40au40 Me too :D
@DanskeCrimeRiderTV
@DanskeCrimeRiderTV Ай бұрын
it's not impressive. Of course querying a few hundred or even hundred thousand web pages isn't as complicated or slow of a task than querying trillions of webpages.
@KibitoAkuya
@KibitoAkuya Ай бұрын
​@@DanskeCrimeRiderTV google also wastes time deciding wether you are allowed to see or not certain sites
@DanskeCrimeRiderTV
@DanskeCrimeRiderTV Ай бұрын
@@KibitoAkuya what does that have to do with anything? Google is still faster at querying trillions of results than this.
@asm_x86
@asm_x86 Ай бұрын
That's really impressive, I can't even figure out how to run it.
@ZuperPotato
@ZuperPotato Ай бұрын
Nice username
@conaticus
@conaticus Ай бұрын
Just added some instructions to the READMEs if you're interested :)
@asm_x86
@asm_x86 Ай бұрын
@@conaticus thanks, I'll do that
@coderx8634
@coderx8634 Ай бұрын
Love your content. You and your quality have really improved. Keep it up ❤
@conaticus
@conaticus Ай бұрын
Thanks so much, your support means a lot ♥
@greensporevalley
@greensporevalley Ай бұрын
SERBIA MENTIONED 🎉🎉🎉
@europa_the_last_battle
@europa_the_last_battle Ай бұрын
Now waiting for Russia 🥰
@RealMephres
@RealMephres Ай бұрын
​@@europa_the_last_battle>goes to comments >sees meme comment >looks at replies >only a LARPer replied lol
@MAXHASS-ph5ib
@MAXHASS-ph5ib Ай бұрын
@@RealMephres this aint 4chan nga
@jawadmansoor6064
@jawadmansoor6064 Ай бұрын
that name rings a bell, maybe from some kind of Serbian movie?
@RealMephres
@RealMephres Ай бұрын
@@MAXHASS-ph5ib tell that to the LARPer dawg
@ccost
@ccost Ай бұрын
7:40 flashing those questionable websites in a sponsored video is quite the move
@twitchizle
@twitchizle Ай бұрын
You scared of porn?
@coderan5029
@coderan5029 21 күн бұрын
This is basically what we learned in my big data class, but we used map-reduce to do the TF-IDF calculations, so it's impressive you figured this out on your own
@rafaelpereiracoias1047
@rafaelpereiracoias1047 Ай бұрын
Nice video and nice code, keep up the good work!
@ExpandedCuber
@ExpandedCuber Ай бұрын
Let's go another conaticus video
@foqsi_
@foqsi_ Ай бұрын
Love this dude and his video projects
@conaticus
@conaticus Ай бұрын
🙏
@MySachincool
@MySachincool 10 күн бұрын
Subscribed & notifications on :) you deserve more recognition bruh
@GermanTimecrafter
@GermanTimecrafter Ай бұрын
such a cool video! i love the way how you explain what you are doing :) random question but what is your editor font?
@conaticus
@conaticus Ай бұрын
Appreciate it :) I'm using Jetbrains Mono it's free to download
@polyshrub
@polyshrub Ай бұрын
This is very impressive, what was the size of the database when indexing is finished? Seems like it would be quite big
@turb0004
@turb0004 Ай бұрын
Please finish your file explorer in rust fully, because the idea of it is awesome. Love your videos, content is very engaging 🎉
@R_Y_Z_E_N
@R_Y_Z_E_N 23 күн бұрын
Google also does the same but with disstributed computing to reduce the overall time . Just scale the database horizontally and mimic googles apporach
@6IGNITION9
@6IGNITION9 Ай бұрын
filter out JS for another 10x bandwidth savings alternatively use an adblocker. (can puppeteer do that? It's just chromium right?)
@devinlauderdale9635
@devinlauderdale9635 Ай бұрын
The problem is this approach is susceptible to SEO spamming/invisible SEO keywords
@conaticus
@conaticus Ай бұрын
Yeah for sure, realistically it should be moderated based on user interaction as well
@iritesh
@iritesh Ай бұрын
Awesome effort ✨
@SG-kn2jl
@SG-kn2jl Ай бұрын
Why did you choose TF-IDF instead of word2vec or any context aware model?
@skorp5677
@skorp5677 Ай бұрын
+1 Woule like to know
@madalenaferreira3018
@madalenaferreira3018 22 күн бұрын
great video, gave me ptsd from my information retrieval class though
@stayhappy-forever
@stayhappy-forever Ай бұрын
thats insane, hows this only at 12k views
@alexmoses3215
@alexmoses3215 8 күн бұрын
Programming 🤝 martincitopants…match made in heaven
@Nerdimo
@Nerdimo Ай бұрын
Impressive, seriously!
@80sVectorz
@80sVectorz Ай бұрын
3:07 Best pronunciation of Euclidean I have every heard :P
@CrazyDiamondo
@CrazyDiamondo Ай бұрын
Where?
@80sVectorz
@80sVectorz Ай бұрын
@@CrazyDiamondo I added a timestamp
@a6gitti
@a6gitti Ай бұрын
Supa dope. I would like to use this search engine of yours
@allenfpascua
@allenfpascua Ай бұрын
Super good editing 🫡🫡🫡🫡
@conaticus
@conaticus Ай бұрын
Would not possible with your breathtaking animations 😄
@dreamsofcode
@dreamsofcode Ай бұрын
🔥🔥🔥
@jugurtha292
@jugurtha292 Ай бұрын
very nice, built something similar for my info retrieval class. we have to use okapi bm25 formula for the ranking but overall very similar. scrape, tokenize, parse, inverted index, rank
@lonelybookworm
@lonelybookworm Ай бұрын
Well of course it is very fast, it only has like 200 websites
@errplane_
@errplane_ Ай бұрын
oh my fuck i saw this on your github last night
@jsalsman
@jsalsman Ай бұрын
I believe it's "inverted indexing", as inverse indexing is something else.
@yorailevi6747
@yorailevi6747 Ай бұрын
how much did you pay for the web scraping service in total?
@maksymilianglowacki1409
@maksymilianglowacki1409 Ай бұрын
is this engine oneline or ( wouldt it be abel to be oneline for otcher users ) so otcher also coulst enjoy it? or was it dust a peak or somthing you made cuz ( you where bored or smt )
@user-xl2om2up2x
@user-xl2om2up2x Ай бұрын
W ad plug, it's 100% relevant and actually necessary to fulfill the premise of this vid.
@gaimnbro9337
@gaimnbro9337 Ай бұрын
Nice job :D
@MortonMcCastle
@MortonMcCastle Ай бұрын
Good! The world needs a new Google Search, one that's more like how it was in the 2000s.
@thekwoka4707
@thekwoka4707 Ай бұрын
How much did the scraping cost if it wasn't free?
@ethanstewart1011
@ethanstewart1011 21 күн бұрын
How did you manage to get a node.js memory leak??
@joenutt1232
@joenutt1232 Ай бұрын
Create your own database engine for shits and giggles
@conaticus
@conaticus Ай бұрын
B+Trees 💀
@mahrezjanati3426
@mahrezjanati3426 Ай бұрын
first time watching a vid of yours ... i have one question : why are you vibrating ??
@-rate6326
@-rate6326 Ай бұрын
Cause he is vibrator
@animeworld4775
@animeworld4775 Ай бұрын
what is things that i should to know or learn to create like these projects
@GONDWANA-de4od
@GONDWANA-de4od Ай бұрын
HTML for website creation CSS page designing Javascript for making website dynamic and for backend SQL for indexing Rust for fast backend services
@gammongaming9081
@gammongaming9081 26 күн бұрын
yk what would be funny? making the slowest search engine possible without like halting the program for a set time, just with maths
@carlitosdummy
@carlitosdummy Ай бұрын
i love this channel
@datainsight1724
@datainsight1724 Ай бұрын
Next time use the Common Crawl dataset ;)
@HyperCodec
@HyperCodec Ай бұрын
Bro managed to memleak in js
@larry_berry
@larry_berry Ай бұрын
Lol. Got notif after clicking the video.
@gamedirection_us
@gamedirection_us Ай бұрын
🍎 👀 .. Apple being like "when will it be ready?".
@gopallohar5534
@gopallohar5534 15 күн бұрын
ain't see rust there!
@Raven-fu1zz
@Raven-fu1zz Ай бұрын
Remember, never return an over 18 site without an over 18 word in the search request
@SlimyFrog123
@SlimyFrog123 Ай бұрын
Now make your own email system to go along with it. 😉
@lazarusNoob
@lazarusNoob Ай бұрын
You should host it
@playtatus1758
@playtatus1758 Ай бұрын
how do you edit your vids
@conaticus
@conaticus Ай бұрын
Allen uses adobe after effects for the amazing animations - I just use Davinci to cut things up 😁
@playtatus1758
@playtatus1758 Ай бұрын
@@conaticus ok thx
@Serhii_Volchetskyi
@Serhii_Volchetskyi 29 күн бұрын
🔥🔥🔥 I was looking for that algorithm and didn't know its name.
@fangg194
@fangg194 Ай бұрын
you seem ok
@monotonedevelopment
@monotonedevelopment Ай бұрын
If only windows file explorer could do the same
@SandWire
@SandWire 26 күн бұрын
For this we have thing named Everything :)
@binpersonal
@binpersonal Ай бұрын
"some fucking genius" lmao
@daemonkisure2952
@daemonkisure2952 Ай бұрын
how can i install this search engine?
@conaticus
@conaticus Ай бұрын
Instructions are on the Github repos :)
@humanontheinternet6510
@humanontheinternet6510 9 күн бұрын
Auto solve captcha you say🧐
@a224kkk
@a224kkk 28 күн бұрын
Nice, you re-invented the lucene library
@TheRealMangoDev
@TheRealMangoDev Ай бұрын
good vid
@AquaQuokka
@AquaQuokka Ай бұрын
Rewrite your genetic code in Rust.
@pyyrr
@pyyrr Ай бұрын
i would rather be bug free so i will pass
@_DarkLiquid
@_DarkLiquid Ай бұрын
discord clone when
@ALTERRAa8
@ALTERRAa8 Ай бұрын
6:08 nahhhhhhhhhhh whats bro even searching 💀💀💀💀
@v037_
@v037_ Ай бұрын
I found a worthy opponent
@thescratchguy428
@thescratchguy428 Ай бұрын
at a desert
@trolIface_
@trolIface_ 29 күн бұрын
hub 🎉🎉
@igrb
@igrb 23 күн бұрын
nice
@Xanmattauri
@Xanmattauri Ай бұрын
@google acquire this man
@Faeest
@Faeest Ай бұрын
why disallow and user-agent matter? can't you just scrap everything?
@skorp5677
@skorp5677 Ай бұрын
You can but it might be illegal
@sleepybraincells
@sleepybraincells Ай бұрын
Why is there Rust in the thumbnail? This was written in Javascript
@conaticus
@conaticus Ай бұрын
Used Rust for the API and TF-IDF matching - decided not to keep in much of the footage for that as it was already explained in the animations
@neologicalgamer3437
@neologicalgamer3437 Ай бұрын
Bro sounds like WilburSoot
@J0Y22
@J0Y22 Ай бұрын
shockedd
@Miluum
@Miluum 29 күн бұрын
1:06 automatically solve captchas? i knew these things exist just to waste our time and energy
@Tech_Code127-76
@Tech_Code127-76 Ай бұрын
Good
@Ayymoss
@Ayymoss Ай бұрын
MAKE LONGER VIDEOS
@Macellaio94
@Macellaio94 Ай бұрын
Liked and subbed
@danielisop3182
@danielisop3182 25 күн бұрын
What did u mean by the websites u shouldn’t have searched
@dylhack
@dylhack Ай бұрын
da goat
@juniordevmedia
@juniordevmedia Ай бұрын
what TF is IDF ?!!
@neofox2526
@neofox2526 Ай бұрын
idk man but watching it makes me feel smart
@jamesbarret4240
@jamesbarret4240 Ай бұрын
Term frequency (the number of times a given word or so shows up in total) - inverse document frequency (the number of times it shows up in a specific document). The wikipedia article is pretty good: en.wikipedia.org/wiki/Tf-idf
@chiroyce
@chiroyce Ай бұрын
What are the consequences of scrapings sites you aren't allowed to?
@conaticus
@conaticus Ай бұрын
Probably not much on its own as long as you're not violating copyright - however it is curtious not to scrape sites forbidden by the robots.txt
@trollinqu
@trollinqu Ай бұрын
wastes their resources and yours
@monkshee
@monkshee Ай бұрын
damn
@user-fj5ts6sz1f
@user-fj5ts6sz1f Ай бұрын
rust is a real badass❤❤
@susannerudolph8469
@susannerudolph8469 Ай бұрын
then brightdata makes captchas useless
@educacionespecialchannel3756
@educacionespecialchannel3756 26 күн бұрын
Captcha's effectiveness has been in question for quite some time now.
@AhmedMahmoud-ec4kz
@AhmedMahmoud-ec4kz 9 күн бұрын
Great video 😊 FYI: bright data is an Israeli company 😮
@_sohom
@_sohom Ай бұрын
Make a better version of VSCode.
@konstantinsotov6251
@konstantinsotov6251 Ай бұрын
we had a hackathon where we basically had to implement TF/IDF - also a search engine of a sort, but for files. we did the interface in python and all mathematics processing in C++. It would have been a fun experience if not for the time limit. we struggled really hard, on test data our solution worked faster by an order or two than most other participants, but... we somehow failed on the exam data. we failed fucking IO. and won nothing. I fucking hate hackathons since then. fuck IDF. also maybe this happened because i had written 75% of the code, while 4 other members did almost nothing. It was (their) responsibility to handle IO, and mine to handle mathematics and processing. I hate working in teams. I know noone cares but i might as well just burst out all of the rage I have towards that experience. once again, fuck team work, fuck hackathons, fuck my teammates, fuck everything and everyone
@skorp5677
@skorp5677 Ай бұрын
skill issue
@konstantinsotov6251
@konstantinsotov6251 Ай бұрын
@@skorp5677 exactly
@kavinbharathi
@kavinbharathi Ай бұрын
Not to be the 🤓☝️ guy, but "Jana Vembunarayanan" is pronounced 'Ja' as in 'Jarvis' and 'na' as usual. Just fyi
@conaticus
@conaticus Ай бұрын
Thank you, I'll do this if I ever pronounce it again 😂
@ph03n1x_dev
@ph03n1x_dev Ай бұрын
You made a search engine for porn?! Thats disgusting... is it on GitHub?! 👀
@conaticus
@conaticus Ай бұрын
All open source and ready to play around with 😂
@latrapa918
@latrapa918 Ай бұрын
105
@lukamajcenic1172
@lukamajcenic1172 Ай бұрын
This is just an ad for BrightData. Compared to previous videos very low effort.
@deadshadow759
@deadshadow759 29 күн бұрын
this result dont make any sense xha... very fast
@vrljk
@vrljk Ай бұрын
SRBIJAAAAAA
@planktonfun1
@planktonfun1 Ай бұрын
Still not fast and scalable enough. The result is not even relevant, you made bing not google
@LaugeHeiberg
@LaugeHeiberg Ай бұрын
wow really? Im also surprised one single guy didnt manage to make a product rivaling Google
@avi7278
@avi7278 4 күн бұрын
You need to learn how to sync up your audio and video.
@DanskeCrimeRiderTV
@DanskeCrimeRiderTV Ай бұрын
how is this impressive? Of course it's gonna be faster. You aren't querying billions or even trillions of web pages unlike Google? So this search engine isn't even faster than Google...
@conaticus
@conaticus Ай бұрын
It wasn't meant to be impressive it was meant to be informative and entertaining 👍
@DanskeCrimeRiderTV
@DanskeCrimeRiderTV Ай бұрын
@@conaticus your thumbnail implies it is faster than Google. And I believe the original title did too.
@FaZekiller-qe3uf
@FaZekiller-qe3uf 27 күн бұрын
Disappointing
@FeTetra
@FeTetra Ай бұрын
⬛🟧 http scrape?????
I Made a Graph of Wikipedia... This Is What I Found
19:44
adumb
Рет қаралды 2,1 МЛН
What Happened To Google Search?
14:05
Enrico Tartarotti
Рет қаралды 3,1 МЛН
格斗裁判暴力执法!#fighting #shorts
00:15
武林之巅
Рет қаралды 35 МЛН
Balloon Pop Racing Is INTENSE!!!
01:00
A4
Рет қаралды 16 МЛН
Why Minecraft Players Built a Real Life Supercomputer
23:24
HellCastle & Tylerrrr
Рет қаралды 763 М.
Projects Every Programmer Should Try
16:58
ThePrimeTime
Рет қаралды 334 М.
How Fast can Python Parse 1 Billion Rows of Data?
16:31
Doug Mercer
Рет қаралды 135 М.
Understanding B-Trees: The Data Structure Behind Modern Databases
12:39
Never install locally
5:45
Coderized
Рет қаралды 1,6 МЛН
I'm a Mess, so I'm Making My Own File Organizer [TagStudio]
23:32
98% Cloud Cost Saved By Writing Our Own Database
21:45
ThePrimeTime
Рет қаралды 238 М.
I Tried C++, here's what I learnt...
6:01
conaticus
Рет қаралды 13 М.
How principled coders outperform the competition
11:11
Coderized
Рет қаралды 1,5 МЛН
how NASA writes space-proof code
6:03
Low Level Learning
Рет қаралды 2 МЛН
Any Sound & Call Recording Option Amazing Keypad Mobile 📱
0:48
Tech Official
Рет қаралды 325 М.
Google I/O 2024 - ИИ, Android 15 и новые Google Glass
22:47
Apple watch hidden camera
0:34
_vector_
Рет қаралды 2 МЛН
Я Создал Новый Айфон!
0:59
FLV
Рет қаралды 3,2 МЛН