1. Introduction and Matrix Multiplication

Рет қаралды 191,489

Күн бұрын

MIT 6.172 Performance Engineering of Software Systems, Fall 2018
Instructor: Charles Leiserson
View the complete course: ocw.mit.edu/6-172F18
KZfaq Playlist: • MIT 6.172 Performance ...
Professor Leiserson introduces 6.172 Performance Engineering of Software Systems. The class examines an example of code optimization using matrix multiplication and discusses the differences between programming languages Python, Java, and C.
License: Creative Commons BY-NC-SA
More information at ocw.mit.edu/terms
More courses at ocw.mit.edu

Пікірлер: 95

@leixun 3 жыл бұрын

*My takeaways:* *1. Why do we study performance engineering **0:57* 1.1 Why we can't improve the frequency of hardware anymore 10:00, because static power consumption dominates the power consumption in modern hardware, and it is hard to reduce it. 1.2 Parallelism is used to increase performance 11:49, but the performance is no longer free since we have to write programs that can run in parallel. *2. An example of performance engineering: Matrix multiplication **15:35* 2.1 Hyperthreading can improve performance a little bit, but it makes performance measurement difficult 17:17 2.2 Python code running time: *~6 hours* 19:38 2.2 Java code running time: *~46 minutes* 23:44 2.3 C code running time: *~19 minutes* 24:23 2.4 Why Python is slow 25:30, because Python is interpreted, C is compiled and Java is in between. 2.5 Further improvement on the C code. - Change the order of loops: *~3 minutes* running time, because cache locality is exploited 29:50, and we can use measurement tool $ cachegrind 33:54 *-- In C, a matrix are stored row by row in memory, therefore access data column by column can exploit cache locality.* *-- In C[i][j] += A[i][k] * B[k][j], we want to maximise the access to the column index which are [j], [k] and [j], and because there are two [j], put [j] in the inner loop could maximise cache locality.* *-- We also want to minimise the access to row index because it can break cache locality, and since there are two[i] and one [k], put [i] in the outer loop could maximise cache locality.* *-- Put [k] in the middle loop.* - Change the compiler flag: *~54s* running time 34:59 - Use more cores: *~3s* running time 36:24, rule of thumb is parallel outer loops rather than inner loops. - Manage cache for data reuse, tiled matrix multiplication with parallelism: *~1.74s* running time 40:10 - Tiling for two-level cache, recursive matrix multiplication with parallelism: *~94s* running time 45:45, because the overhead of recursion is large, but we can coarsening the recursion to get *~1.3s* running time 49:00 - Using vector hardware, SIMD with compiler: *~0.7s* running time 51:35 - Using vector instruction directly: *~0.4s* running time 55:55 *3. Matrix multiplication is a good example, but we generally won't see such magnitude of improvement in other cases **58:58*

@vishnusingh4118 3 жыл бұрын

You're God's gift to humankind. Thanks a ton for this Lei Xun! Much appreciated!

@leixun 3 жыл бұрын

@@vishnusingh4118 You are welcome

@Kingpin.7 3 жыл бұрын

Can anyone reply if all these videos are to be watched in order or are they all independent of eachother?

@leixun 3 жыл бұрын

@@Kingpin.7 Mostly in order

@user-mr-m12312 3 жыл бұрын

Great takeaways!

@hengchunsong301 4 жыл бұрын

From 6 hrs in python down to less than half a second. Indeed impressive optimization!

@takasurazeem 4 жыл бұрын

And you said that exactly 6 months back from today.

@jiaming5269 4 жыл бұрын

spoiler warning zzz

@jiaming5269 4 жыл бұрын

spoiler warning zzz

@awsumpersun321 3 жыл бұрын

most of it attributed to parallelization and language though (trivial changes)

@mytech6779 2 жыл бұрын

@@awsumpersun321 Only trivial an an absurdly high level abstraction. Just try to implement either. If you take those two out as not everything can be parrallel(or alternately the task is already parallel), and if you are not totally daft(regarding high demand computation) and assumed a compiled language from the start; then you are still looking at a 92x improvement. Now consider super compute machines that consume $10'000 per day and many scientific processes that need many days or even months of runtime. Then even 2% is a substantial savings, let alone cutting it to 1/92 (~1%) of the original.

@josephgibson2548 4 жыл бұрын

Thank you MIT! Once again a great video and lecture.

@windmaple 4 жыл бұрын

This is a truly amazing class.

@BinuVargheseamd Жыл бұрын

One of the best classes I have attended. Absolutely fabulous. Thank you for the upload.

@OTadashi162 2 жыл бұрын

I have not been expecting it to be a performance optimization class (i didn't read the description before, i was looking for game dev related stuff), but i have found very interesting. I never had this kind of advanced concept before about how computers REALLY works.

@MIT60346 4 жыл бұрын

Glad to see Prof Leiserson again. Btw, CLRS is always awesome!

@user-ns6jz8je9b 2 жыл бұрын

hha invoker

@kythrathesuntamer9715 2 жыл бұрын

This is incredible. Thank you so much for this course MIT.

@ajayrajan8882 4 жыл бұрын

The material taught here is gold. Zero dislikes, just how it's supposed to be!

@merlinarthur4661 2 жыл бұрын

I think Cuz of your comment it got 2 dislikes🥲🥲

@ajayrajan8882 2 жыл бұрын

@@merlinarthur4661 plausible

@larry_the 2 жыл бұрын

still no dislikes a year later

@mytech6779 2 жыл бұрын

@@larry_the KZfaq turned off dislikes a couple months ago, its rainbows and unicorns forever....

@sarantsogtmunkhbaatar6974 Жыл бұрын

u fucking jinxed it

@takshpatel8109 Жыл бұрын

I really appreciate this open source courses

@gabrielagalindo6339 3 жыл бұрын

Thanks MIT, amazing lecture

@veramentegina 4 жыл бұрын

what a great lecture!!

@mandcali7613 10 ай бұрын

Wow! Thank you… a lot of the question marks that I’ve encountered before was answered here…

@npe_exception 3 жыл бұрын

2:59 Changed my whole view of SE

@Originalimoc 3 жыл бұрын

38:16 because possible racing condition? since the use of +=

@FernandoJavierSosa 4 жыл бұрын

Thanks for sharing this excelent lesson!

@bob-the-constructors9912 2 жыл бұрын

Thank you MIT

@thehardikraheja 2 жыл бұрын

As to why we cannot cross 40% peak, is the reason memory accesses? Since the whole matrix of such size cannot be cached, I am curious if there are more reasons as well

@user-hn4sf7cn1r 3 жыл бұрын

It's a cool lecture!

@chessimprovement8804 2 жыл бұрын

Great resource.

@pravinjoshi8978 4 жыл бұрын

Thanks MIT!!! keep up good work...

@seunghwacho5889 4 жыл бұрын

Thank you for sharing great lessons.

@RonMartinJr 4 жыл бұрын

That was amazing

@FreedomIsNeverEverFree 2 жыл бұрын

how long does a numpy matrix multiplication take?

@estervojkollari4264 Жыл бұрын

Amazing lecture! Why does it go only up to only 41.6% of peak?

@MysterySemicolon 2 жыл бұрын

Great class. I had to laugh at how much just two years have given over to specs as he describes the AWS large machine as a "honkin' big machine". Enterprise VD environments are really pushing what constitutes a honkin' big machine upward.

@mytech6779 2 жыл бұрын

There were bigger AWS options even in 2018, the point is that most folks are using less than 1% of it and think they are really cooking or that they just need a faster machine. Seeing what 1980s programmers could push through an 8bit console with 2k of memory (like the NES) with various optimization tricks, now those machines had a good utilization ratio.(Yes, they were simpler than modern workstation architectures and thus easier to optimize.)

@Er.Sunil.Pedgaonkar 7 ай бұрын

A course in System Software and Programming is must for CS / CpE majors

@Adit86 4 жыл бұрын

@14:30 - 2000 "OH OHS" = oughts

@hanw544 4 жыл бұрын

This guy wrote Intro to Algos, OMG😱

@tear728 2 жыл бұрын

Wow that's legendary

@Soulixs 3 жыл бұрын

Thankyou

@manfredbogner9799 5 күн бұрын

Sehr gut

@MathNerdGamer 2 жыл бұрын

Was this video unlisted until just 30-ish minutes ago? I saw the title and was excited to see another updated 6.172, but then I noticed that the thumbnail showed that I'd already watched it before.

@mitocw 2 жыл бұрын

Good catch! Yes it was. We're glad you got to enjoy it anyways :D

@asdfghjkl36958 2 жыл бұрын

Can someone explain why the k loop can't be parallelized?

@thehardikraheja 2 жыл бұрын

Because it's the A[i][k] * B[k][j] so k is common between A and B and needs to be executed sequentially for obtaining the result C, this is my understanding of the reason

@stephenodogwu7359 4 ай бұрын

I went through this very engaging lecture and thought I understood it. Maybe I'm missing something, because I ran the matrix multiplication on my PC and my results came back very fast. Without optimization, C performed the operation in just 0.03 seconds. Please am I getting something wrong as to why it took longer to perform in this lecture

@laurianangelescu5928 2 жыл бұрын

What is the source/paper/study behind the sldies @ 14:00?

@mitocw 2 жыл бұрын

Visit ocw.mit.edu/6-172F18 for the course materials. Best wishes on your studies!

@regulus8518 4 жыл бұрын

what is the pescribed reading for this course ?.

@mitocw 4 жыл бұрын

6.172 has no required readings. There is a list of supplemental readings on MIT OpenCourseWare at: ocw.mit.edu/6-172F18. Best wishes on your studies!

@maloxi1472 4 жыл бұрын

Thanks @@mitocw

@Kingpin.7 3 жыл бұрын

Can anyone reply if all these videos are to be watched in order or are they all independent of eachother?

@mytech6779 2 жыл бұрын

It is recorded class lectures, as it clearly says in the description. They are a series, not independent.

@21fad24 2 жыл бұрын

what was changed in this re-upload?

@mitocw 2 жыл бұрын

This was not a re-upload. We just forgot to make this lecture public. Everything else was public, except this one lecture. 🤦‍♂️

@SFXFF 2 жыл бұрын

Chalkboard is so cool

@diego44551 4 жыл бұрын

Do someone knows some book of software? I wanna study an other engineering but I would like to learn software too.

@serkardis292 2 жыл бұрын

These optimizations are amazing, python code isn't great though. I managed to write a pure python version that runs in 300s(using pypy). No optimizations other than loop reordering. Still slow, but it's just 5 loc that were written in 3 minutes.

@miguelescutia5556 Жыл бұрын

You used concepts from his lesson anyway.

@serkardis292 Жыл бұрын

@@miguelescutia5556 yes, it's an amazing lecture. Just wanted to highlight that barely optimized python is less than 2x slower than C at the same level of optimization.

@tolga1503 2 жыл бұрын

VAHAB BEYAAZ AHMET ÇAKAAR

@This.Object 7 ай бұрын

Did he jus say "well i took this 1.3hr class to prove you don't need performance, just write your god damn code in such a way that it makes some fucking sense" 😂

@xiaoxiashirleywu2799 2 жыл бұрын

check

@abpccpba 2 жыл бұрын

Why not try Forth programing language; supper fast.

@aaronrenfroe5199 4 жыл бұрын

what is absolutely bonkers is python's numpy.matmul holds it's own up until #9 on my 5 year old macbook pro, and it looks like, C = np.matmul(A,B). Lol. So yes, plain python is slow but it can be fast. Good Lecture.

@sauravtiwary8513 3 жыл бұрын

@Aaron, The core logic of NumPy that actually does the multiplication is written in C and not Python. github.com/numpy/numpy/tree/master/numpy/core/src

@mytech6779 2 жыл бұрын

@@sauravtiwary8513 Yep, can't get around the laws of physics. (Same algorithms pre-compiled vs interpreted in real time.)

@JP-vg8vl 2 жыл бұрын

7:00 so michael jackson is a computer scientist huh

@DistortedV12 2 жыл бұрын

Bro why does this have so many views??

@p0187 Жыл бұрын

program vs software vs application 🤔

@Pabloparsil Ай бұрын

Web monkeys mentioned

@rileydavidjesus Жыл бұрын

Run it in Rust though

@eagledove9 3 жыл бұрын

If I wanted to make an Amish-like computer, using very simple and primitive parts, and if my goal was to produce all of it in the United States instead of in China, it might make sense to build computers and appliances that had less power, less memory, a frugal approach to programming. You might make a goal of building a computer that doesn't use certain kinds of rare earth elements or something, using only elements that could be mined locally, so you would give up a lot of the advanced, modern hardware.

@mytech6779 2 жыл бұрын

You can make computers with water and mechanical valves. So what?