Рет қаралды 57,969
Previous Episodes: • Data Mining in C
References:
- Source Code: github.com/tsoding/data-minin...
- Less is More: Parameter-Free Text Classification with Gzip - arxiv.org/abs/2212.09410
- Wikipedia - K-nearest neighbors - en.wikipedia.org/wiki/K-neare...
- Wikipedia - Kolmogorov complexity - en.wikipedia.org/wiki/Kolmogo...
- github.com/tknishh/Text-Class...
Chapters:
- 0:00:00 - Announcement
- 0:00:37 - Intro
- 0:01:39 - kNN Classifier
- 0:03:39 - The Paper
- 0:06:25 - Kolmogorov's Complexity
- 0:11:13 - Connection between Compression and AI
- 0:11:55 - Plans for the Session
- 0:13:39 - Getting AG News Dataset
- 0:18:14 - "Hello, World"
- 0:19:04 - Assessing Quality of Data
- 0:21:02 - Reading Data into Memory
- 0:23:10 - Parsing Hack
- 0:24:28 - CSV lib for C
- 0:26:01 - Splitting by Lines
- 0:29:36 - Current nob_read_entire_file() implementation
- 0:30:51 - mmap
- 0:32:34 - Improving nob_read_entire_file()
- 0:41:31 - Concatenation of Files
- 0:42:26 - Compilation Errors
- 0:42:38 - Alternatives to C
- 0:43:32 - Assessing the Performance
- 0:45:47 - Parsing
- 0:49:30 - Command Line Args
- 0:51:11 - zlib
- 0:54:05 - Implementing deflate_sv()
- 1:00:51 - Testing deflate_sv()
- 1:03:49 - Bug
- 1:06:28 - Skipping header
- 1:06:59 - Compression ratio
- 1:09:42 - Compression performance
- 1:12:19 - Collecting Samples
- 1:15:49 - Accepting test samples file
- 1:16:24 - Class names
- 1:16:45 - Searching for non-existing website
- 1:18:02 - Using actual data in the wild
- 1:18:34 - Modern Internet
- 1:19:19 - AG News is Antonio Gulli's corpus of new articles
- 1:19:29 - TBD: