Searching Duplicate Files with C

  Рет қаралды 35,559

Tsoding Daily

Tsoding Daily

Күн бұрын

References:
- The Russian Book About Computer Forensics www.ozon.ru/product/forenzika...
- Crypto Algorithms Repo: github.com/B-Con/crypto-algor...
- STB Hash Table: github.com/nothings/stb/blob/...
- The Source Code: github.com/tsoding/dedup
Support:
- Patreon: / tsoding
- Twitch Subscription: / tsoding
- Streamlabs Donations: streamlabs.com/tsoding/tip
Feel free to use this video to make highlights and upload them to KZfaq (also please put the link to this channel in the description)

Пікірлер: 53
@klarnorbert
@klarnorbert 2 жыл бұрын
Finally a good beginner project :D Thanks!
@alimousavi2676
@alimousavi2676 2 жыл бұрын
may you suggest more C beginner projects like this?
@john.dough.
@john.dough. 2 жыл бұрын
tbh, I didn't realize you are bilingual until your recommended a book in russian. Your english is very good.
@happywednesday6741
@happywednesday6741 2 жыл бұрын
The accent is obvious John...but yes your English is native level pronunciation, formal chat too so enjoyable to listen to not just understandable
@AndrieMC
@AndrieMC 9 ай бұрын
rite?
@soniablanche5672
@soniablanche5672 2 жыл бұрын
I really like the fact you describe every line of code you write, it saves us non C programmers time to go read the documentation online. Also, I spotted that strcmp && strcmp error the moment you wrote it, I knew it was gonna loop infinitely lmao.
@iorekby
@iorekby 2 жыл бұрын
And you didn't stop him? Whats wrong with you? 😁 For neurodivergent people: This was a joke
@gabe3538
@gabe3538 Жыл бұрын
1:12:30 don't let it get to you tsoding! There might not have been anyone watching at the time, but I'm watching now and I love your streams and your explanations about solving the problems are great! Thanks for the efforts you put into your videos :)
@wouter11234
@wouter11234 2 жыл бұрын
I dont know why but this was so fun to watch. With long videos like this, especially if someone has a strong accent I usually get bored after 15 mins but this was just too good. Keep up the great work, and I hope you get some viewers to ask "what do you guys think" to next stream 😂
@viktormikhaltsevich7400
@viktormikhaltsevich7400 2 жыл бұрын
No luck finding this online. Your i3 status bar is hilarious! Took me a while to notice.
@tiborgrun6963
@tiborgrun6963 2 жыл бұрын
1:13:05 Yes, let's do it this way.
@axalius572
@axalius572 2 жыл бұрын
Instead of concatenating the strings, you could use openat, fopendir and dirfd.
@EONorlander
@EONorlander 2 жыл бұрын
Great video!
@Lemon_Inspector
@Lemon_Inspector Жыл бұрын
Top of the stack to you, good sir.
@fumanchu332
@fumanchu332 2 жыл бұрын
You may want to double check the fread inside the loop in main.c. Looks like it still has swapped arguments.
@TsodingDaily
@TsodingDaily 2 жыл бұрын
Good catch! Thank you very much! Feel free to submit a Pull Request for this one! :) UPD. Actually, the fix is super trivial, I can do it myself. Thanks for the heads up anyway!
@hocky-ham324-zg8zc
@hocky-ham324-zg8zc 2 жыл бұрын
So essentially you went from using depth first search recursively, to the stack-based approach?
@xesemesa12345678
@xesemesa12345678 7 ай бұрын
A noob question: Do you use malloc for RECDIR as the memory heap can handle bigger arrays, in this case the stack_size? So If you have not used malloc Then RECDIR would go to the memory stack and you may have a stack overflow. Please correct me if I am wrong.
@cheebadigga4092
@cheebadigga4092 10 ай бұрын
AFAIK the file name limitation on *nix systems is 255 + null termination, but the path can have as many characters as your file system allows (I think it's the value of 32 bits - 65-something - for Ext4, but I could be wrong). In contrast, on Windows/NTFS, the whole path can only be 255 + null termination long.
@anon_y_mousse
@anon_y_mousse 2 жыл бұрын
This is cool. It's like if you added hashing to updatedb and incorporated diff into locate so you could search for only differences. This needs to be a tool in findutils. I would've said coreutils, but they split the find tools and discontinued coreutils. edit: I just checked and found that my system has sha{1,224,256,384,512}sum, in /bin and with links to those from /usr/bin as well. I think I'll give it a try with bash scripting.
@Vulto166
@Vulto166 2 жыл бұрын
Remember for myself in the future: Don't let C library confuse you about bytes and elements!
@awesomeguy11000
@awesomeguy11000 8 ай бұрын
one potential optimization is to sort all files by size first and only compute the hashes for files that are the same size. Another optimization is to take the potential duplicates (files of the same size) and only compare the first kilobyte or so of data as a second pass of removing false positives, this should also reduce number of files that need to be read completely.
@HenrikScheel_
@HenrikScheel_ 2 жыл бұрын
The book you are linking to is written by Shelupanov Alexander Alexandrovich and, Smolina Anna Ravilievna. Are any of them you?
@venkateshhariharan4341
@venkateshhariharan4341 2 жыл бұрын
great project
@greob
@greob 2 жыл бұрын
Next level: implement a perceptual diff to dedupe image files. ;) 14:40 the limit is actually 255 bytes not just chars, which makes a big difference when dealing with non-ASCII encodings (specifically multi-bytes encodings such as unicode).
@WilderPoo
@WilderPoo 2 жыл бұрын
Aye you can do a lot with unicode just treating it as a byte stream rather than characters. Makes things very simple
@fedordmitry
@fedordmitry Жыл бұрын
Hello Tsoding! I'm very excited of your videos. They helped me to make my brain to think in other ways:) P.S. Just in case it's not fixed: kzfaq.info/get/bejne/mNZzfclnnMuVnH0.html you forgot to fix line 42+3 to use correct order of the arguments Thank you for your content once again!
@eboubaker3722
@eboubaker3722 2 жыл бұрын
1:13:00 oh boy you made my eyes hydrated :((
@didyoustealmyfood8729
@didyoustealmyfood8729 2 жыл бұрын
yooo why do you have 10 Gb of porn folder lmaoooooooooooo fucking legend you are.
@anon_y_mousse
@anon_y_mousse 2 жыл бұрын
I know right, who has that small of a collection.
@rogo7330
@rogo7330 2 жыл бұрын
Right :)
@elirannissani914
@elirannissani914 2 жыл бұрын
How is this code editor called?
@uninhm
@uninhm 2 жыл бұрын
Emacs
@kacperfilipek8461
@kacperfilipek8461 2 жыл бұрын
How do you type that fast?
@m.h.7121
@m.h.7121 2 жыл бұрын
Years of practice to build your freakin memory muscle, i guess?
@danilo2735
@danilo2735 2 жыл бұрын
He's an alien
@lolcat69
@lolcat69 5 ай бұрын
12:56 azozin moan
@hard2borrow428
@hard2borrow428 2 жыл бұрын
I'm a pretty slow reader also... Definitely not… Russian
@hard2borrow428
@hard2borrow428 2 жыл бұрын
haw beeg is the dayda base? Hoooooooooge dayta bayse.
@KillerMZE
@KillerMZE 2 жыл бұрын
Why not use c++? You could program the whole thing in 20 minutes with only standard library functions
@foggy7595
@foggy7595 2 жыл бұрын
Why did he not just use Java? Given that he types faster than my terminal emulator prints, he could get it done in Java in half that time. /s
@experiment0003
@experiment0003 2 жыл бұрын
it'll run faster in C.
@hxccz
@hxccz 2 жыл бұрын
why not just delete all the porn?
@avananana
@avananana 2 жыл бұрын
Why not just do it in python where you can probably do it in a one liner. Geez man
@notanenglishperson9865
@notanenglishperson9865 2 жыл бұрын
@@foggy7595 why not to use brainf*ck, the amount of possible characters is less overwhelming
@timoxa_dev
@timoxa_dev 2 жыл бұрын
20:00 .гитигнор улетел в бан :C
@Tijan1
@Tijan1 2 жыл бұрын
in python : print('hello world')
Coding a Bouncing Ball in Terminal
1:26:29
Tsoding Daily
Рет қаралды 81 М.
What is Relative Pointers?
1:51:27
Tsoding Daily
Рет қаралды 32 М.
Survive 100 Days In Nuclear Bunker, Win $500,000
32:21
MrBeast
Рет қаралды 90 МЛН
لااا! هذه البرتقالة مزعجة جدًا #قصير
00:15
One More Arabic
Рет қаралды 10 МЛН
Useful gadget for styling hair 🤩💖 #gadgets #hairstyle
00:20
FLIP FLOP Hacks
Рет қаралды 10 МЛН
Hiding Information Inside of PNG
1:53:49
Tsoding Daily
Рет қаралды 48 М.
I regret doing this...
1:20:07
Tsoding Daily
Рет қаралды 67 М.
I contributed to C3 Compiler and So Can You
4:15:02
Tsoding Daily
Рет қаралды 47 М.
Writing Garbage Collector in C
1:43:38
Tsoding Daily
Рет қаралды 61 М.
You don't need Generics in C
1:37:38
Tsoding Daily
Рет қаралды 59 М.
Hot Code Reloading in C
2:16:18
Tsoding Daily
Рет қаралды 63 М.
why do header files even exist?
10:53
Low Level Learning
Рет қаралды 383 М.
Inline Functions in C
49:58
Tsoding Daily
Рет қаралды 10 М.
The $5 Mid-2000s Mini PC - Thrift Store Finds
32:43
Michael MJD
Рет қаралды 60 М.
8 Товаров с Алиэкспресс, о которых ты мог и не знать!
49:47
РасПаковка ДваПаковка
Рет қаралды 176 М.
Как удвоить напряжение? #электроника #умножитель
1:00
Hi Dev! – Электроника
Рет қаралды 1,1 МЛН
АЙФОН 20 С ФУНКЦИЕЙ ВИДЕНИЯ ОГНЯ
0:59
КиноХост
Рет қаралды 1,2 МЛН
low battery 🪫
0:10
dednahype
Рет қаралды 1,7 МЛН
ноутбуки от 7.900 в тг laptopshoptop
0:14
Ноутбуковая лавка
Рет қаралды 3,5 МЛН
Todos os modelos de smartphone
0:20
Spider Slack
Рет қаралды 65 МЛН