The fastest matrix multiplication algorithm

Рет қаралды 279,170

Күн бұрын

Keep exploring at ► brilliant.org/TreforBazett. Get started for free, and hurry-the first 200 people get 20% off an annual premium subscription.
0:00 Multiplying Matrices the standard way
2:05 The Strassen Method for 2x2 Matrices
3:52 Large matrices via induction
7:25 The history and the future
10:19 brilliant.org/TreforBazett
In this video we explore how to multiply very large matrices as computationally efficiently as possible. Then standard algorithm from linear algebra results in n^3 multiplications to multiply nxn matrices. But can we do better? The Strassen algorithm improved this to about n^2.8, first in the 2x2 case and then we can prove via induction it works in general. This improvement has seen a range of improvements over the last 50 years inching closer - but still far away - to the theoretically limit of n^2.
Further Reading:
Going into the details of the laser method (what happened after the Strassen algorithm I showed): theoryofcomputing.org/article...
The AlphaTensor AI paper: www.nature.com/articles/s4158...
A nice summary from Quanta: www.quantamagazine.org/mathem...
Check out my MATH MERCH line in collaboration with Beautiful Equations
►beautifulequations.net/pages/...
COURSE PLAYLISTS:
►DISCRETE MATH: • Discrete Math (Full Co...
►LINEAR ALGEBRA: • Linear Algebra (Full C...
►CALCULUS I: • Calculus I (Limits, De...
► CALCULUS II: • Calculus II (Integrati...
►MULTIVARIABLE CALCULUS (Calc III): • Calculus III: Multivar...
►VECTOR CALCULUS (Calc IV) • Calculus IV: Vector Ca...
►DIFFERENTIAL EQUATIONS: • Ordinary Differential ...
►LAPLACE TRANSFORM: • Laplace Transforms and...
►GAME THEORY: • Game Theory
OTHER PLAYLISTS:
► Learning Math Series
• 5 Tips To Make Math Pr...
►Cool Math Series:
• Cool Math Series
BECOME A MEMBER:
►Join: / @drtrefor
MATH BOOKS I LOVE (affilliate link):
► www.amazon.com/shop/treforbazett
SOCIALS:
►Twitter (math based): / treforbazett
►Instagram (photography based): / treforphotography

Пікірлер: 225

@planaritytheory Жыл бұрын

It's important to note that, although additions _are_ considered cheaper than multiplications, what we _aren't_ doing is making a tradeoff between the two operations (accepting many more additions to save a few multiplications). You didn't explicitly say that's what's going on, but someone could get the impression from this video that that's the deal---I myself used to think that. Instead, the number of additions, multiplications, and total number of operations are all lower for Strassen's algorithm, and it's because the number of "multiplications" at each step is counting the number of recursive calls that are made.

@DrTrefor Жыл бұрын

Thanks for the clarification. Indeed, I left the induction proof for additions as an "exercise" so this got glossed over.

@samueljehanno Жыл бұрын

69th

@gregorymorse8423 Жыл бұрын

Considering that addition is O(n) and multiplication is O(n log n) in best case, given a 2s complement number system, treating multiplication and addition as like operations is completely incorrect especially as the values of the matrix grow larger. So it's not just the recursion that is relevant.

@destiny_02 Жыл бұрын

If I'm not wrong, multiplication is cheaper than addition in case of floating point number on a modern CPU.

@gregorymorse8423 Жыл бұрын

@Jotaro Kujo just because it is simpler to implement floating point multiplication doesn't make it cheaper. It's not faster even if addition requires resources to shift the smaller argument and the result. There is still an integer multiply or add in the middle. My point was only for unbounded big integers as well. Treating addition and multiplication as like ops when using a finite bit width like 32 or 64 be they integers or floats does make sense. Even if their clock cycles are slightly different. The story just gets more complicated gived a fused multiply add (FMA) is used in modern hardware to improve accuracy and speed. And applies to complex number of matrix multiplication very specifically in fact. But unbounded data types mist treat addition and multiplication separately.

@Mattes_H Жыл бұрын

I think it should be noted that all of these faster algorithms are so called galactic algorithms, meaning the constant that is hidden by the phrase "order of" is so big they become useless for any practical application. In practice the challange is more about efficent and parallel implemetation. In fact GEMM the subroutine most often used for matrix-matrix-multiplication usally implements either the naive or Strassen-based algorithm but uses all the hardware tricks available on modern processors to get their high perforamce.

@maxrush206 Жыл бұрын

nice icon lol

@vaff69420 Жыл бұрын

I don't exactly understand your comment. Why aren't library developers implementing these algorithms if they are simply faster than the native or other slower algorithms?

@pablote325 Жыл бұрын

@@vaff69420 Because of the constants.. take for example the last algorithm, this one does O(n^2.37285) multiplications. What this means is that the number of multiplications is exactly gonna be: Mult = A* n^2.37285 + B , for some A and B, usually unknown. thats what the big O notations means... So like the previous comment said, this weird galactic algorithms always have HUGE constants, so is not even practical to use it to multiply a 64x64 matrix (i'm just putting numbers here, but that's usually what happens, the real benefits of using these algorithms is when n is literally an astronomical number xd)

@porschepanamera92 Жыл бұрын

@@pablote325 ... and in that case it might be much faster but still extremely slow using current hardware?

@luckyw4ss4bi Жыл бұрын

@@porschepanamera92 I think the idea he's hinting at in general here is that the size of matrices you'd have to be multiplying to gain a performance boost from the current fastest algorithm is too large for modern computers to handle, anyway.

@cbunix23 Жыл бұрын

I wrote a cache friendly matrix multiplication algorithm that improves performance by 40% when matrices are very large; basically, the multiplications and additions are the same, but the order is done in a way that respects cache memory.

@magicflour Жыл бұрын

I had a class where a whole third of it was dedicated to these linear algebra algorithms. The most I learned really was "optimize the hardware and optimize with hardware" to get good results lol. This was a computer engineering class so it made sense.

@DrTrefor Жыл бұрын

Ha, yes that has been my sense too even as mathematicians have fun with asymptotic limits:D

@marcotroster8247 9 ай бұрын

Finally someone said it so that I don't need to comment it myself 😂

@michaelsaba6769 Жыл бұрын

Just read about this algorithm in CLRS. It's a perfect teaching tool to show why constant factors might not matter in asymptotic notations, but how they often do matter when working with recurrences. Like in this case, where 7 multiplications are preffered over 8, even if we're only multiplying the runtime by a constant factor. Strassen's yields a Theta(n^log(7)) time complexity, while the standard divide and conquer approach yields Theta(n^3) time.

@frfr1022 Жыл бұрын

It reminds me of this algorithm for natural numbers multiplication, which is theoretically a lot faster than any other algorithm we know of, but gets worth using only at ridiculously huge numbers. I can't remember how its called, but i think it was mentioned in 3b1b's latest video on convolutions

@DrTrefor Жыл бұрын

If I recall correctly, some of the ideas of large number multiplications are related to ideas from this very Strassen algorithm.

@adamel-sawaf4045 Жыл бұрын

@@DrTrefor Strassen also worked on a multiplication algorithm for numbers (the Schönhage-Strassen algorithm), so that would make sense!

@oyaoya2468 Жыл бұрын

Karatsuba algorithm, is this what you mean? Because it uitlizes the same technique as Strassen algorithm

@alphalunamare Жыл бұрын

Fascinating! It brings back old memories. In 1977 I was mulling over the problem and managed, over a few days, to create an algorithm for 3x3 multiplications requiring 24 multiplications and 66 additions only. It reduced to strassen's method in the 2x2 case but was no improvement as that would have required 21 multiplications. I did this longhand over yards of line printer paper. Unfortunately a Canadian published an algorithm with 23 multiplications and 96 additions. it pissed me off no end, especially as it had no information regarding its derivation. (If I could trade 1 multiplication for 29 additions I could improve but I left it on the shelf) Sulking I guess. Over the years I wondered if it was just a novelty, then I realised that in an object oriented sense that the savings are manifestly more than for just simple arithmetic multiplications. An intriguing thought is whether the intermediate Pn multiplications of 'objects' have a real meaning in the physics of the model under analysis that perhaps could shed some light? Perhaps in the Geometric Algebraic treatment of Spacetime?

@ThunderAppeal Жыл бұрын

No one cares.

@alphalunamare Жыл бұрын

@@ThunderAppeal lol says who?

@planaritytheory Жыл бұрын

kzfaq.info/get/bejne/q5ipiLan1q6UnJc.html

@boltez6507 6 ай бұрын

@@ThunderAppeal I care

@kruksog Жыл бұрын

I remember learning the Karatsuba algorithm for multiplication. It is also computationally cheaper than the standard way of multiplying numbers. It had a very similar feel to this. Cool stuff. Thanks!

@adamel-sawaf4045 Жыл бұрын

This also reminded me of Karatsuba! I learned it in my algorithms class. I implemented it in a C++ program for extra credit.

@discreet_boson Жыл бұрын

Please make more computational math videos like this! Programming documentation never explains the math like you do

@DrTrefor Жыл бұрын

I'd love to!

@eti-iniER Жыл бұрын

As a first year CS student, I'm completely blown away by the fact that humans know and understand concepts like this. Amazing video, btw Cue the mild depression

@L4ZY2137 Жыл бұрын

There is a thing calleg "Hierarchical matrix", that allows to change a matrix to a more "compact" form that allows to compute many operations like inversion or multiplication in almost O(n) time. If I remember correctly if we remove the constant (k) that controls approximation the complexity is about O(n * (logn) ^ 2)

@wcdeich4 Жыл бұрын

Interesting! I previously knew some algorithms could do matrix multiplication faster for large sparse matrices, but those algorithms are actually slower for large dense matrices.

@mohammadhomsee8640 Жыл бұрын

That's a great videos, I really love to hear your lessons. Good job professor!

@rezamiau Жыл бұрын

Absolutely brilliant! One of the major problems that engineers like me are dealing with is the time consuming eigenvalues calculations, because of increasing the precession, due to Ill-Conditioning of a matrix. Please let us know if there is a good solution to it. Thank you so much.

@cmilkau Жыл бұрын

One important question with this algorithm: how many additions does the recursive algorithm use? Because in the end, we want a faster algorithm, not just a reduction in elementary multiplications. It's also a bit counterintuitive that this strategy *reduces* the total number of additions asymptotically.

@prashantsingh6370 Жыл бұрын

As someone who absolutely loved solving matrices & determinants problems in higher secondary mathematics, way back in 2011-12, a huge thumbs up for the excellent video. 👍

@DrTrefor Жыл бұрын

Thanks for sharing!

@freshrockpapa-e7799 5 ай бұрын

enjoying computations doesn't make you special in any way, there was no reason to mention that

@adamel-sawaf4045 Жыл бұрын

Great vid! Just as informative as it needed to be, while sparking inspiration to learn more. One thing I think you should’ve mentioned was how much the space complexity grows for the trade-off in speed.

@ThunderAppeal Жыл бұрын

No one cares.

@airsquid8532 Жыл бұрын

The amount of power I wield is unrivaled with this knowledge, thank you for yet another interesting video full of passion!!

@laurenpinschannels Жыл бұрын

I'm curious what you're researching!

@ddognine Жыл бұрын

Interesting video of an area of active research of which I was unaware. I would be interested in a similar video covering the state of the art in matrix inversion which I believe is much more problematic for large matrices than multiplication and is typically the first operation before any multiplication.

@quarkonium3795 Жыл бұрын

One thing to keep in mind is that most of these improvements from the Strassen algorithm are theoretical. Most only improve on the Strassen algorithm for extremely large matrices-so large that these algorithms will probably never be practical

@guigazalu Жыл бұрын

While watching, I asked myself if you would cite the AlphaTensor article and results, as I read it in the recent months. It turned out you did! It is a very cool article, even though it's a "heavy reading" for me.

@gooball2005 Жыл бұрын

I do enjoy a good video on complexity theory! Also, your thumbnail game is on point!

@gzhang207 Жыл бұрын

One specialization of matrix multiplication is the matrix multiplication with a vector (or filter) that is widely used in convolution based ML. Winograd convolution seems to follow the same optimization idea.

@yutubl Жыл бұрын

Thanks for this very interesting information.

@johnniefujita Жыл бұрын

Single operations are not really what is expensive and increase algorithm complexity, the "O"s, actually complexity is direclty relates to nested computations. Usually we want a algorithm to be linear or linearithm. Polynomial can be tractable. But never exponential and factorial, those cannot scale. Also algorithms may vary worst, average and best case scenarios for each different implementations, and sometimes, like in the case of sorting algorithm, the best choice can vary based on how well organized the data already is.

@lebdesmath2510 Жыл бұрын

nice vid, keep up the good work

@DrTrefor Жыл бұрын

Thanks, will do!

@christossymA3A2 Жыл бұрын

Man it feels almost illegal such great knowlegde being available for free

@DrTrefor Жыл бұрын

ha!

@garrettkajmowicz Жыл бұрын

One area I found missing from your video was in showing that using block matrices got us the correct result. That is, that treating a 4x4 matrix as a 2x2 matrix of 2x2 matrices would, when multiplied, get us the same result as we would otherwise expect.

@kristas-not Жыл бұрын

ok, you officially kick ass in my book.

@noot3316 4 ай бұрын

I just finished my first term of Engineering at UVic (you were an amazing calc prof by the way!) and I can FINALLY understand how cool this video is!

@DrTrefor 4 ай бұрын

Haha that’s awesome, thank you!

@taotaotan5671 Жыл бұрын

Very very interesting topic! Thanks for sharing this, Trefor! Wondering if the algorithm applies to a more general case of matrix multiplication, e.g. (m x n) * (n x p) Another thing I found interesting is, after some research, matrix inversion and matrix multiplication have the same time complexity. The optimization method you mentioned in this video also applies to matrix inversion. That contradicts my impression that matrix inversion causes more headaches than multiplication.

@DrTrefor Жыл бұрын

For non-square matrices you can just imagine they are square with extra zeros so the same upper bound applies here.

@streampunksheep Жыл бұрын

ok this is cool. it makes me want to study matrices again

@changjeffreysinto3872 6 ай бұрын

Solution to 7:17 Let the number of addition to the matrix product of 2^n*2^n sized matrixes be f(n). Since we calculate the multiplication of 7 smaller matrixes of size 2^(n-1)*2^(n-1), we have 7f(n-1) operations. Also, we calculate the sum and differences of these 2^(n-1)*2^(n-1) matrixes for 18 times in total, amounting to 18*(2*2^(n-1)) times overall. Therefore, the recursive relation is f(n)=7f(n-1)+(9/2) 4^n, which is a first order linear DE. We now substitute f(n)=7^n * f_n, obtaining f_n = f_(n-1) + (9/2) (4/7)^n. We have f_n = 9/2* sum 1-> n (4/7)^n, which is less than 9/2 * sum 1-> inf (4/7)^n = 21/2. Finally, f(n)=7^n * f_n

@MrHaggyy Жыл бұрын

A huge deal in practical matrix multiplication is taking the naive approach: look for redundant operations, look for dependencies in computation, group up independent operations and parallelize them. A 3x3 matrix on a GPU would schedule every single one of that 27 multiplies on one 32 core chunk. Then you would copy the data in another 32 core chunk doing the addition. You could also do the addition of three matrix multiply in one execution of 32 additions. Thats the reason why Strasson is such a big deal. 8 multiply and 4 additions fit beautifully in 32 core chunks. And once you have setup your execution pipeline you can feed it any matrix. If a rank is odd the CPU will make it even by adding zeros befor passing it to the GPU.

@dr.rahulgupta7573 Жыл бұрын

Excellent presentation 👌

@DrTrefor Жыл бұрын

Thanks a lot!

@gofaonepaul Жыл бұрын

This is so cool. ❤

@flexeos Жыл бұрын

I am not sure that the pure optimization of operations is very relevant. The limiting factor is often moving the data in and out of memory. What you have to optimize is how may operations you can do for each load.

@Miguel_Noether Жыл бұрын

Same amount of memory and speed of allocation, doing faster operations is going to be the most efficient way to optimize everything else (that's all the hype of quantum computing)

@pyropulseIXXI Жыл бұрын

I know the *proof by induction* works absolutely, but the 'step' of _finding_ the formula in question is often skipped. I know it is not strictly necessary for the 'formal proof,' but it is still very nice to get this infos via intuitive 'derivations.'

@dvir-ross Жыл бұрын

According to wikipedia, there is already an improvment from 2022.

@DrTrefor Жыл бұрын

Oh nice! Good to be obsolete already!

@realdragon Жыл бұрын

Speaking from hardwere perspective if possible try to avoid jumping between columns. Even 2D array is very long row so jumping between columns means jumping through huge parts of array while calling next number is this row is very fast

@Yupppi 6 ай бұрын

Since this is specifically computational related, it might be worth talking about not only the algorithm for calculating sub matrices, but cache memory (its layout and use) and how transposing the matrix gives better performance due to the elements being laid out in the memory more efficiently and also how the relevant ones stay in the cache over the time you need to use them, which avoids cache misses. Furthermore copying the matrices (to local buffer) interestingly enough helps as well. And lastly if you split it to multiple threads. Which leads to performance doing such a leap that you can for example go from 2000 ms to 20 ms execution time. I feel like n^3 and n^2,4 aren't quite as concrete demonstrations compared to that. But of course Intel has a Basic Linear Algebra Subprogram implementation that does it in roughly 2 ms. These are in my understanding matrices the size of 1000.

@Oscar1618033 Жыл бұрын

I have the feeling that such algorithms might involve some kind of matrix log and fourier transforms. Could it be?

@ericphantri96734 Жыл бұрын

Use divider into 2×2 or 3×3 the do all the way to use the grid point inverter and addition to perform by electronic across the board mean reducing to the determinant of each small matrix then move them together into smaller matrix with all the determinant and repeat until final determinant each is only single number then multiple ply out. And find the angle between the two number by trigonometric value then we had final vector so if n = 50 numbers each matrix then use the 2× 2 because even so far total operations about 50×49 estimate 2450 calculation by electronic of 20 mhz then it take 0.00000085 second and if use slower CPU like 1 mhz then it take 0.005 second for 50×50

@DiabloQFDB Жыл бұрын

I need to research and see if these methods can be implemented with AVX.

@user-yk7kq3rf7z 10 ай бұрын

Congratulation for a 100pi subscribers!)

@DrTrefor 10 ай бұрын

Oh cool, thank you!

@notgabby604 Жыл бұрын

The Walsh Hadamard Transform is kind of the fastest dense matrix multiply you can do. It's a fixed thing but you can somewhat unfix it with cheap operations like prior sign flipping of the input information or prior permutations etc. The number of multiples required then is zero.

@D_VAULTZ Жыл бұрын

does this only apply to large ints? Im wondering if there is an improvement available in much smaller values, such as 8bit integers.

@jonatanlind5408 Жыл бұрын

These algorithms only tend to make sense for large ints. The algorithms presented in this video were developed to speed up computation, your 8bit example can already be calculated increadible quickly and even if you have a large amount of calculations it is possible to calculate every single 8bit multiply possible and simply look up the answer when you need it. I am unaware of any algorithms that can beat such a method.

@jebarijihed Жыл бұрын

This reminds a lot of the FFT and the DFT algorithems !

@swagatochatterjee7104 Жыл бұрын

Hey the Alpha Tensor record in 9:50 has been beaten by two researchers at the university of Linz who showed that you can multiply 5*5 matrices in 95 multiplications instead of 96 multiplications which the paper showed.

@jcsjcs2 Жыл бұрын

The reason why you don't care about additions is not because additions are "much faster" than multiplications. An FPU will usually take the same amount of time for either and will probably even feature combined "multiply and accumulate" functions. The reason is that numbers of operations is of the order n^3. Therefore, adding both together is still of the order n^3.

@jimwinchester339 Жыл бұрын

Interesting! I was class of 1980, so I learned only of the Strassen algorithm.

@Moinsdeuxcat Жыл бұрын

8:40 THIS HAS TO BECOME A SUMMONING SALT VIDEO

@jaimeduncan6167 Жыл бұрын

If one is working with not so big numbers (in terms of number of relevant digits ) , like 64 bit b floating point numbers modern computers will produces multiplications in the same time of addition for this kind of problems (fully pipelined) in fact many will produce Ax + y 1 per cycle per pipeline. That means That additions can’t be ignored.

@ultimatedude5686 Жыл бұрын

I think the reason reducing multiplications is so important is that multiplying matrices is way more computationally expensive than adding them. That means when you’re working with block matrices, you want the minimum number of multiplications of the components; extra additions are fine and won’t really affect the asymptotic complexity.

@lunalildragon241 Жыл бұрын

Well calculations in itself don't really need a long time, the biggest problem is the memory read time that you have if you read and write more data than the cpu can store. Meaning if you are calculating that huge operations you would have to care about spliting the data into chunks that fit in that area without having to reread.

@aliuncu2815 Жыл бұрын

You forgot to mention that not a week after the AlphaTensor group's paper, Kauers and Moosbauer improved their results on 5x5 matrices.

@rob876 6 ай бұрын

Here's a better version of Strassen's algorithm due to Winograd for A.B = C: a = A21 + A22 b = a - A11 c = B12 - B11 d = B22 - c s = A11.B11 t = a.c u = s + b.d v = u + (A11 - A21).(B22 - B12) C11 = s + A12.B21 C12 = u + t + (A12 - b).B22 C21 = v - A22.(d - B21) C22 = v + t This has 7 multiplications and 15 additions/subtractions.

@U.Inferno Жыл бұрын

"This doesn't sound like a big improvement." Me, a computer scientist: "No. This does."

@GreatJO Жыл бұрын

Is SIMD faster?

@soonts Жыл бұрын

Modern computers don't necessarily do additions much faster than multiplications. A computer I use to watch your video has AMD Zen 3 CPU. For 32-bit floating point numbers, on my computer the performance is absolutely equal. Specifically, both vaddps (adds 8 numbers) and vmulps (multiplies 8 numbers) AVX instructions have 3 cycles latency, and 0.5 cycles throughput.

@pierreabbat6157 Жыл бұрын

What happens to the accuracy of computation with these matrix multiplication algorithms? I use the standard n³ algorithm, with pairwise addition for accuracy.

@calvindang7291 Жыл бұрын

It's assumed that you want an exact result (and so won't get hit by rounding errors) when doing this - if you only need approximate answers and so are using an approximate data type, that's something else entirely.

@Dawg1561 Жыл бұрын

Didn't knew Dr. Strange left sorcery and started teaching mathematics. Btw, This was a great explanation

@wilurbean Жыл бұрын

I feel like I've made it in my physics education when that slightly extended decimal exponential got me excited

@DrTrefor Жыл бұрын

lol

@wilurbean Жыл бұрын

@@DrTrefor fr tho I went back to school at 30. Started at a tiny community College in rural northern Michigan a few and got accepted to a top tier engineering physics program (#1 in the US in my particular sub-field). My current physics classes are Modern Physics and lab where we're doing a lot of modeling/data analysis with big matrices. And, E&M based around Griffiths Electrodynamics. Wouldn't have passed Vector Calc or Linear Algebra without your videos. It's coming full circle now with Modern and Linear Algebra and EM with the vector Calc. Ty!

@lucavogels Жыл бұрын

Can you also explain the Alman, Willimas Algorithm please! 🤓

@deeveshjain4246 2 ай бұрын

is this the same thing as batch matrix multiplication?

@dougpark1025 Жыл бұрын

An interesting set of algorithms. However, as it turns out getting the numbers from memory into the CPU is a bottleneck that is easy to forget. If you can arrange to get data to load into the cache of the processor in contiguous blocks there is a great deal of performance gain to be had over just pulling the numbers from memory without any consideration. This was a technique used decades ago to do matrix multiplies efficiently when computer memory wasn't large enough to hold all of the matrices. Blocks would be loaded from disk, multiplied and then blocks written back out to disk. Same idea, but memory is a lot faster than writing to disk. Today we usually don't run into anything that big, but the concept still works. Next step, enter the multiple core processor. You can take the same idea and subdivide the work of multiplication into parts that run independently on separate cores. In the end you can approach an N times speed up where N is number of cores. (Subject to Amdahl's law of course). The nice thing about the matrix multiply is it falls into the category of being embarrassingly parallel. Which is to say, it doesn't require much in the way synchronization constructs. I have not tried any of these faster algorithms in a parallel implementation. It might be interesting to benchmark that sort of implementation.

@frankyin8509 Жыл бұрын

Love his T-shirt, 👕 topology meme🎉

@rumasc9854 Жыл бұрын

dude, I was waiting for the cool dance that the thumbnail portrayed was about to happen

@Petch85 Жыл бұрын

What if you have a matrix x vector, are there similar tricks?

@dickybannister5192 Жыл бұрын

a nice work-through. for some heavy lifting, there is a nice new video from IAS called "On Matrix Multiplication and Polynomial Identity Testing" on here. I didn't get all of it, but I'm fairly sure there are some nice concepts in there which could be explained: the uses for MM are various after all. but PIT always fascinates me as a subject.

@jimjacobs2817 Жыл бұрын

Is there a quicker way to calculate a 4 x 4 matrix? That is, there a similar approach to increasing the speed of 2 x 2 matrices?

@iwersonsch5131 Жыл бұрын

At 5:32, if I look at matrices of size n=2k+1 for k approaching infinity, the complexity with this naive extension approaches (2n-1)^log2(7) = (n-0.5)^log2(49) > (n-1)^5, doesn't it? Of course there are ways to reconcile this but this argument appears to be insufficient for the claim

@ivanxdxd Жыл бұрын

which algorithm is implemented on MatLab?

@DrTrefor Жыл бұрын

The normal one by default I would presume. It takes truly gigantic matrices for this kind of asymptotic difference to make a measurable advantage.

@saitougin7210 Жыл бұрын

7:05 "using some log rules": starting from base change formula (log_b c) / (log_b a) = (ln c) / (ln a) (log_b c) * (ln a) = (log_b a) * (ln c) ln[a^(log_b c)] = ln[c^(log_b a)] a^(log_b c) = c^(log_b a) q.e.d. You're welcome. Hm, maybe someone should add this formula on Wikipedia or something. It might come in handy, if one just knew it.

@holyshit922 Жыл бұрын

Standard multiplication maybe is not the fastest but it is still not bad because it has polynomial growth Compare it with such algorithms like calculating determinant by cofactor expansion of by summing products over all permutations which have factorial growth If we want to calculate coefficients of characteristic polynomial by expanding determinant via minors to avoid symbolic computation we will have exponential growth 2:16 and here I catch you we accelerate algorithm but with cost of memory but both RAM and disk space or paper and ink cost in dollars Was it worth , acceleration isnt significant you wont be able to accelerate algorithm to O(n^2)

@jimnewton4534 Жыл бұрын

i'm surprised that if the number of rows is not a power of 2, then you can simply add rows and columns of zeros. I've always thought I needed adjoin rows and columns of zeros but 1's on the main diagonal.

@artus9022 Жыл бұрын

But what about SIMD? Isnt this n²? How would the algorithms perform in this case?

@abdulshabazz8597 Ай бұрын

I would think the computational lower bound is less than O(N^2) if any digit repeats in more than one quadrant...

@joshualickey3426 Жыл бұрын

I think i found a mistake at 2:44. In the bottom lower right it might be p1-p2+p3+p6.

@codethinki Жыл бұрын

title: fastest matrix multiplication algorithm. Video explains: The first improvement of matrix multiplication 😆. Oh and on top of clickbait a fat sponsorship. I love quality content

@cantkeepitin Жыл бұрын

What about rounding errors?

@jessstuart7495 Жыл бұрын

Do modern CPU's detect when you are trying to multiply by zero and return a result (zero) faster than a multiplying two non-zero numbers?

@DrTrefor Жыл бұрын

Yes I believe so, but likely depends on exact algorithm one uses.

@jonatanlind5408 Жыл бұрын

Yes speed of computation is dependent on the data itself. Floating point numbers actually have 5 distinct data representations they can assume each of which must be handled seperately. These are Zero, denormal, normal, infinity and NaN. I believe computations regarding denormal numbers usually are slower that normal or zero and significantly so.

@chxyike Жыл бұрын

you can improve from n^3 to n^2.807 which is not significant, but the complexity increases a lot. The gain is small with this algorithm, to my opinion.

@MrHamof Жыл бұрын

Say you have a matrix that's 100x100. With N^3, you need 1 000 000 multiplications to solve it. With n^2.807 you need 411 150 multiplications, and with the current record holder of 2.3728596 we're down to 55 683 multiplications. I don't know enough to speak to real world performance difference, but it definitely has a huge impact on the number of multiplications needed for large matrix multiplication. Depending on how much harder implementation is, I would agree that it's probably not worth it for smaller matrices, like most of these the difference between O(n^3). O(n^2) and O(n) isn't that big if n is a small number.

@chxyike Жыл бұрын

@@MrHamofyou are right. It makes sense for large matrix.

@ankurraj6560 Жыл бұрын

Can we somehow use the concept of a Matrix as a linear transformation of a space to simplify our Matrix multiplication problems!? 🤔🤔

@DrTrefor Жыл бұрын

Well composition of linear transformation is just matrix multiplication so they are definitely related.

@David-gu8hv Жыл бұрын

No wonder donuts and coffee go so well together...

@dmitrykazakov2829 Жыл бұрын

This misses the question of accuracy. Additions (and subtractions) suffer precision loss in floating point arithmetic. If you have accumulating additions in your algorithm you are risking catastrophic precision loss for some inputs. So shuffling multiplication into a bunch of additions might be not such a bight idea as it appears. In general floating-point operations are not associative, this all optimization methods based of reshuffling operations like this one must be done with great care.

@telotawa Жыл бұрын

9:09 but you can do integer multiplication in n log n tho

@drwizard2875 Жыл бұрын

Any advancement in non commutative behavior of the matrix multiplication.

@jawadshinwari7365 Жыл бұрын

hi sir! plz sir this algorithm is explined is in example..🙏🙏🙏

@alekthpariso8606 Жыл бұрын

still trying to figure out where i am gonna need this.

@DrTrefor Жыл бұрын

haha probably never tbh:D

@mathematicia Жыл бұрын

This could be easier method but learning , memorising and use this in practica scenerios is really painfull . As long as calculations is concerned it could be somewhere usefull but when we do any type of analysis with metrices , it couldn't replace the ordinary multiplication.

@gnomeba12 Жыл бұрын

Ok now show us the best algorithm for SVD!

@DrTrefor Жыл бұрын

ha actually I've been wanting to do a SVD video

@Rin-qj7zt Жыл бұрын

apparently the link for brilliant is still active cuz it gave me the discount lol

@DrTrefor Жыл бұрын

Awesome! Hope you enjoy:)

@farfa2937 Жыл бұрын

If you hardcode a matix, then you can get the result in constant time, and you'll be right as long as you only try to multiply matrices that result in your chosen one.

@tissuepaper9962 Жыл бұрын

This is the well-known "divine intervention" algorithm which is also employed in "miracle sort" and the "birthday attack" on hashing algorithms.

@TheViperZed Жыл бұрын

multiplication and addition having a different impacts from a computational complexity standpoint is incorrect from the perspective of modern hardware, if you're not doing this on embedded hardware the multiply, int or fp, is going to take one cycle, sometimes even just fractional cycles. Most often it's a fractional cycle multiply accumulate doing both the mult and the add. Computational complexity really needs to be combined with an analysis of memory throughput and memory footprint to be a viable analytical strategy these days, on its own it's an anachronism from a time where the main performance bound wasn't IO throughput into the the CPU. It's really worth mentioning these facts if you're going to approach a topic from a computational complexity standpoint.

@dmitryincog7455 Жыл бұрын

Is there a matrix multiplication library that uses this algorithms? And if not - what would that men?

@MrTyty527 Жыл бұрын

I can come up with an O(1) algorithm for this, but the result might be less accurate!

@dabbler1 Жыл бұрын

correction: it is p1 + p3 - p2 +p6. not p1 + p2 - p3 +p6

@Petch85 Жыл бұрын

What if I know most of the numbers in my matrix is just 0? Sparce matrix

@SupinePandora Жыл бұрын

GPUs are multiplying matrices every moment, unfortunately, changing matrix multiplication from "mat4 * mat4" to "..." will remove SIMD and built-in matrix multiplication instructions. (Applicable to both cpu and gpu)

@GradientAscent_ Жыл бұрын

Can you show a proof of n^2 being the theoretical minimum?

@L2M2K2 Жыл бұрын

Depends on what you mean with the minimum. The trivial proof: There are n^2 unique elements to the input and they all are used at least once to compute the output. As such, one cannot possibly compute all n^2 output elements with less than n^2 operations in total (this would require a fixed number of operations for each input element to generate all n^2 output elements, a very tough task). Note, this does not say anything about the ability to get anywhere near this limit of n^2 operations. It is simply a strict lower limit one cannot possibly beat. (Improving the theoretical lower limits, as in a proof that it must be worse than here the trivial n^2 limit, is its own field of mathematics. And, once the theoretical lower limit and the best known algorithm meet, we complete a problem. Well, kind of, one can still improve the practical applications a lot. And, not all problems are even completable.)

@iwersonsch5131 Жыл бұрын

Asymptotically could mean "asymptotically if the number of trials with 3x3 matrices approaches infinity" but I think i get the idea