Data Mining in C

  Рет қаралды 38,376

Tsoding Daily

Tsoding Daily

Күн бұрын

More Episodes: • Data Mining in C
References:
- Source Code: github.com/tsoding/data-minin...
- Wikipedia - K-means Clustering - en.wikipedia.org/wiki/K-means...
- Less is More: Parameter-Free Text Classification with Gzip - arxiv.org/abs/2212.09410
- archive.ics.uci.edu/dataset/2...
Support:
- BTC: bc1qj820dmeazpeq5pjn89mlh9lhws7ghs9v34x9v9
- Servers: zap-hosting.com/en/shop/donat...
Chapters:
- 0:00:00 - Announcement
- 0:00:37 - Intro
- 0:06:16 - Hello, World
- 0:10:11 - Dataset
- 0:11:13 - Raylib
- 0:16:10 - Displaying Samples
- 0:20:27 - Projecting Axes
- 0:21:41 - project_sample_to_screen()
- 0:28:51 - Resizable Window
- 0:30:12 - It's easier than CSS
- 0:31:14 - Random Samples
- 0:32:51 - Vector Based Generation
- 0:34:00 - How to Gatekeep
- 0:34:55 - Generating Cluster
- 0:42:11 - "Mouse" Dataset
- 0:43:48 - Weird Distribution
- 0:46:52 - K means
- 0:52:33 - Colors
- 0:59:28 - Clustering
- 1:05:23 - Coloring Clusters
- 1:07:57 - Factoring out Operations
- 1:11:10 - update_means()
- 1:17:22 - Playing with Clustering
- 1:19:25 - The Leaf Dataset
- 1:23:15 - My PDF Reader
- 1:23:44 - Inspecting Data like a Scientist
- 1:24:47 - Studying Available Attributes
- 1:26:09 - Parsing CSV
- 1:31:27 - enum Leaf_Attr
- 1:34:08 - Picking 2 Attributes
- 1:37:25 - Parsing Floats
- 1:39:00 - Printing Extrated Points
- 1:40:36 - Summary of the CSV Parsing
- 1:41:21 - Axes Boundaries
- 1:49:20 - Disabling Random Data
- 1:50:18 - Trying to Add Padding
- 1:51:55 - Clustering Leafs
- 1:58:20 - Outro

Пікірлер: 91
@TsodingDaily
@TsodingDaily 4 ай бұрын
Happy New Year everybody!
@redstonetutorials5493
@redstonetutorials5493 4 ай бұрын
Happy new year
@albtein
@albtein 4 ай бұрын
Happy New Year!
@good4710
@good4710 4 ай бұрын
Happy new year
@jpmoboat4914
@jpmoboat4914 4 ай бұрын
Happy new year! Hope you have a good one:)
@kingofbithynia449
@kingofbithynia449 4 ай бұрын
Happy new year
@dionsolang7296
@dionsolang7296 4 ай бұрын
The reason why the samples are denser in the center is because you were generating it by randomizing the magnitude then the angles. randomizing the magnitude will give you the probability of the samples lay within r*mag to the center of the circle. However, the area in which those samples can be placed (pi*(r*mag)**2) doesn't grow at the same rate as r*mag. thus, the greater mag is, the lower the density. Bertrand paradox illustrates this phenomenon really well.
@fivefoottwelve2789
@fivefoottwelve2789 4 ай бұрын
What would be a better way to create an even distribution? By generating randomly in a square and then throwing out points that fall outside the circle?
@dev688
@dev688 4 ай бұрын
@@fivefoottwelve2789 maybe tweaking the random function and hard-coding some percentage to generate certain random values. I don't know any better way. \_(`-`)_/
@dionsolang7296
@dionsolang7296 4 ай бұрын
@@fivefoottwelve2789 if you want the density to be equal across all region of the circle, that is a great way to do it.
@carvas18
@carvas18 4 ай бұрын
​@@fivefoottwelve2789 A quick and dirty way is to use `r * sqrtf(mag)` instead of `r * mag` (where r is the fixed radius and mag ranges from 0 to 1).
@denoww9261
@denoww9261 4 ай бұрын
@@fivefoottwelve2789 that'd probably be my approach. i remember a Sebastian Lague video where he generated points randomly in a circle and ran into this same issue, and fixed it by using that method instead
@aetherialKilix
@aetherialKilix 4 ай бұрын
i love watching these vods while writing code myself, although your projects are consistently more interesting than mine.
@thisguyisnotable
@thisguyisnotable 4 ай бұрын
same here!
@shivashankar28
@shivashankar28 4 ай бұрын
Man I love your videos in C, I am starting to love C slowly, my passion in C is increasing due to you, Thanks a lot Tsoding, hope one day I will send minor patches to linux kernel
@oscardeits4709
@oscardeits4709 4 ай бұрын
If you replace the commas with nulls you can use the c apis directly without the temporary buffer. That way the csv is actually a sequence of null terminated strings. You do need to keep track of the newline and replace thatvwith null aswell
@bestformspielt
@bestformspielt 4 ай бұрын
I really enjoyed watching this! This was one of the coolest videos you've done. Not that the others weren't good, but this one stood out.
@Vicente75480
@Vicente75480 4 ай бұрын
as seen in a video by mathemanic called "the numerical simulation is not as easy as you think". the phenomenon where the clusters are denser at the center can be fixed by assigning the magnitude equal to the square root of the random variable between 0 and 1. that is (as others have pointed out before) the rate of change of the area of a circle is not constant as you increase the radius, instead it increases with the square of the radius
@postmodernist1848
@postmodernist1848 4 ай бұрын
Yeah, the samples are actually more dense in the center, because the probability for a point on a vector with a larger magnitude is the same as the one with a shorter magnitude, so you have the same chance to get a point on a large circumference and it's much sparser. It's a generate_cluster() problem, not a rand() problem. You could generate points in a square which is 2 radii in width and height and only pick points that are within the radius to get "uniform" distribution
@spacewad8745
@spacewad8745 4 ай бұрын
happy new year folks. great way to start the new year with a tsoding video 🎉
@monootaku6350
@monootaku6350 4 ай бұрын
- "It looks like a machine learning algorithm" Because it it. Its algorithm from "Unsupervised learning" group which used to process/aggregate data when there is no right answer or goal. It used to cluster data, compress, aggregate and stuff like then. П.С. "С новым годом"
@user-fr1no2ir3r
@user-fr1no2ir3r 3 ай бұрын
This is better than other data mining tutorial that teach how to use library instead of doing actual things
@koktszfung
@koktszfung 4 ай бұрын
The Lloyd’s algorithm can be used to create a mesh called centroidal Voronoi tessellation. I once used it to generate a mesh on a sphere with non uniform density. That would be pretty cool to make and it basically uses the same algorithm as the one you implemented
@yaksher
@yaksher 4 ай бұрын
@42:49 You say "Disney lawsuit incoming," but the mouse is in the public domain now. You're safe.
@prokras8609
@prokras8609 4 ай бұрын
wait really? since when?
@anon_y_mousse
@anon_y_mousse 4 ай бұрын
@@prokras8609 January 1st of this year.
@ABuffSeagull
@ABuffSeagull 4 ай бұрын
Original Steamboat Willy mouse went public domain at the start of 2024
@CuriousCyclist
@CuriousCyclist 4 ай бұрын
I love your videos buddy. Keep doing what you are doing. You teach well.
@SiiKiiN
@SiiKiiN 4 ай бұрын
You could visualize the high dimensional data by running pca two reduce the dimensions. In your case you can do pca of dimension 2 and what you would obtain is a 2 dimensional vector where the 2 values have the largest “explained variance” this basically means that those 2 features contribute to the variance in the data more than any other 2 features. You would be able to do clustering in the high dimension and just display using the pca.
@RandomGeometryDashStuff
@RandomGeometryDashStuff 4 ай бұрын
44:04 maybe it's denser in center for same reason as if you take same length sticks and place them with ends at one point (4 sticks look like +, 5 sticks look like *), then whole thing's center is dense (biggest wood/air ratio by volume)
@labsendeyshent
@labsendeyshent 4 ай бұрын
Pog! New zozin video just dropped
@apppples
@apppples 4 ай бұрын
can I recommend poison disc sampling for initial means? bridsons algorithim is not hard to implement and it will give you better means spaced a min distance apart. also randomly sampling a radius and angle from a spot will not uniformly sample a circle, because the area of a circle is proportional to the square of the radius. so take the sqrt of your random float when placing points to get a more uniform sampling of points
@ecosta
@ecosta 3 ай бұрын
It is so weird how CSV is meant to be a simple standard, but devs tend to make it complicated by introducing a bazillion of libraries to parse it. I love to see a simple parse without dependencies like that.
@stintaa
@stintaa 4 ай бұрын
Loved this stream
@toshevislombek
@toshevislombek 3 ай бұрын
46:00 reason why center is denser bc line or track neer the center will be shorter but distribustion per track is equal shorter track length causes denser around ceter, if use normale space like square and cancel out outer dots to circle gives you eqaul distribution on screen, I liked video))
@NoneNone-ly6xz
@NoneNone-ly6xz 4 ай бұрын
Why doesn't this video show up in my subscriptions page? It showed up in the homepage but not in the subscription page. Am I the only one experiencing this?
@Scriabinfan593
@Scriabinfan593 4 ай бұрын
Do you have any resources you’d recommend for learning how to make build systems for projects?
@mazenmohsen3423
@mazenmohsen3423 4 ай бұрын
Tsoding, How do you get ideas for your projects? It's interesting to me how you don't run out of ideas.
@pemrograman-cepat3393
@pemrograman-cepat3393 4 ай бұрын
This is what I am looking for😮
@SeishukuS12
@SeishukuS12 4 ай бұрын
Micky mouse goes into public domain, and this is what people do with it? 🤔 lol
@JamesSjaalman
@JamesSjaalman 4 ай бұрын
Simple data set := Iris sepal/petal. (3 clusters, 4 dimensions)
@user-pi7mg9hn3j
@user-pi7mg9hn3j 4 ай бұрын
happy new year!!
@viacheslavprokopev8192
@viacheslavprokopev8192 4 ай бұрын
But they use k-nearest neighbours algorithm in a paper, no k-means
@TsodingDaily
@TsodingDaily 4 ай бұрын
Ah shi, another stream incoming then
@forayer
@forayer 4 ай бұрын
​​@@TsodingDailywe are not complaining! 😊
@viacheslavprokopev8192
@viacheslavprokopev8192 4 ай бұрын
@@TsodingDaily K-means is harder and more interesting anyway. K-nearest neighborus is just some dot products and sorting.
@sanjaux
@sanjaux 4 ай бұрын
NOB_GO_REBUILD_URSELF Technology™
@JnillCorreia
@JnillCorreia 4 ай бұрын
Is it possible some kind of funcitonal style in C?
@albtein
@albtein 4 ай бұрын
Do you have some tip for people who really wanna learn C? I don't see many good courses out there or even tutorials with development patterns.
@bebre_2288
@bebre_2288 4 ай бұрын
There are not any patterns for C really, it is language wich purpose is being as simple as possible. If you want to learn some "tricks" you really should watch his videos on C. The are really good source. (And btw he learned all by himself, so he cannot really suggest anything except for programming and reading others people code)
@user-tk5gj2cz5q
@user-tk5gj2cz5q 4 ай бұрын
To start with, check out Jacob Sorber. He answers many questions for beginners.
@Stroopwafe1
@Stroopwafe1 4 ай бұрын
Start a project and write the code, you can't really learn by just watching people do something. If you work on a project, you'll look up why stuff doesn't work, how to do stuff in C, and slowly but surely, you'll get better and better. Best tip when learning to do anything new is literally just to start doing it
@VojtaJavora
@VojtaJavora 4 ай бұрын
Tsoding. Do you know why you are using CLITERAL?
@hubstrangers3450
@hubstrangers3450 4 ай бұрын
Thanks you.... same ...peaceful 2024!!!,
@obsidianhead
@obsidianhead 4 ай бұрын
Is a group if react developers a degenerate cluster?
@nevokrien95
@nevokrien95 4 ай бұрын
Gzip is all u need could be made in python in like 20 minutes.... with 2 packages and like 20 dependencies
@anthonygg_
@anthonygg_ 4 ай бұрын
How about that guys
@semihkaplan
@semihkaplan 4 ай бұрын
He uses makefiles for everything but c
@ed9w2in6
@ed9w2in6 4 ай бұрын
i see Emacs I like
@KimberlyWilliamsch
@KimberlyWilliamsch 4 ай бұрын
Who is tsoding daily, and where is he come from?
@__gadonk__
@__gadonk__ 4 ай бұрын
He's a totally not crazy person, that wrote a compiler in PHP that compiles C code into Python.... He's from Novisibirsk(if i recall corectly), Russia
@biniyam106
@biniyam106 4 ай бұрын
first
@gargleblasta
@gargleblasta 4 ай бұрын
Your uni truly must have been shit if they didn't even went over k-means clustering
@SiiKiiN
@SiiKiiN 4 ай бұрын
Exactly, but unfortunately It’s pretty common in the “computer science” major which will often forgo many of the computational techniques which are more “applied”. I took 4 discrete math classes and not one on data centered algorithms.
@Stroopwafe1
@Stroopwafe1 4 ай бұрын
@@SiiKiiN I went to a university of applied sciences and I also didn't have anything about this. But then again, I didn't take the data-oriented semesters
@user-ux2kk5vp7m
@user-ux2kk5vp7m 4 ай бұрын
Most universities don’t cover k-means clustering
@gargleblasta
@gargleblasta 4 ай бұрын
@@user-ux2kk5vp7m then they are not worth the title 'University'
@gargleblasta
@gargleblasta 4 ай бұрын
@@Stroopwafe1 that feels so odd to me
the new PS4 jailbreak is sort of hilarious
12:21
Low Level Learning
Рет қаралды 38 М.
I regret doing this...
1:20:07
Tsoding Daily
Рет қаралды 60 М.
格斗裁判暴力执法!#fighting #shorts
00:15
武林之巅
Рет қаралды 38 МЛН
Can You Draw The PERFECT Circle?
00:57
Stokes Twins
Рет қаралды 65 МЛН
Godzilla Attacks Brawl Stars!!!
00:39
Brawl Stars
Рет қаралды 10 МЛН
когда одна дома // EVA mash
00:51
EVA mash
Рет қаралды 11 МЛН
Can C actually do Perfect Bézier Curves?
2:17:22
Tsoding Daily
Рет қаралды 7 М.
Making Simple Windows Driver in C
7:26
Nir Lichtman
Рет қаралды 274 М.
Why is Raylib becoming so popular?
9:24
Chris_PHP
Рет қаралды 11 М.
Just Buy More Cores (1min to 1sec no optimization)
1:39:27
Tsoding Daily
Рет қаралды 28 М.
I tried React and it Ruined My Life
1:19:10
Tsoding Daily
Рет қаралды 110 М.
i wrote my own memory allocator in C to prove a point
5:23
Low Level Learning
Рет қаралды 325 М.
Master Pointers in C:  10X Your C Coding!
14:12
Dave's Garage
Рет қаралды 262 М.
NEW GPT-4o: My Mind is Blown.
6:28
Joshua Chang
Рет қаралды 465 М.
Will Ada Replace C/C++?
44:57
Tsoding
Рет қаралды 68 М.
Mastering Memory: Allocation Techniques in C, C++, and ARM Assembly
17:05
📱 SAMSUNG, ЧТО С ЛИЦОМ? 🤡
0:46
Яблочный Маньяк
Рет қаралды 946 М.
Samsung or iPhone
0:19
rishton vines😇
Рет қаралды 6 МЛН
Готовый миниПК от Intel (но от китайцев)
36:25
Ремонтяш
Рет қаралды 389 М.