AVX Explained - Performance and Syntax Analysis

  Рет қаралды 17,477

Proceu Tech

Proceu Tech

2 жыл бұрын

// Join the Community Discord! ► / discord
// Hotrate Content! ► www.hotrate.com/articles/nvid...
The Advanced Vector Extension, A.K.A. AVX, is an extension to the x86 instruction set architecture, designed to make SIMD possible within the CPU core itself!
Building a Budget PC can be tough. Not only are GPUs and CPUs so incredibly expensive, but they can be hard to find on a budget... But, there are tips and tricks to finding you your dream Budget GPU, and pairing it with a CPU that will give you the performance you want!
Also, if you're reading this far - I've got an i7-11700k review coming!
Have a Great Day!
- Proceu
#AVX #Intel #AlderLake

Пікірлер: 55
@parad0x1cal83
@parad0x1cal83 2 жыл бұрын
With this level of content, it's a matter of time before your channel blows up! Thank you for the explanation!
@salvageddoor
@salvageddoor 2 жыл бұрын
Damn this should've got more views... I was just searching for AVX offset feature on KZfaq and this video just came by. I'm not familiar with any kind of image processing in C++, I'm more into embedded stuff but your content has served me quite a lot of interesting knowledge! Keep it up and your channel will blow up really soon!
@LegendLength
@LegendLength 10 ай бұрын
First hit for me for avx2. Great video too.
@ItsAkile
@ItsAkile 2 жыл бұрын
This video has been in my browser for about a month+, finally watched it. pretty dank video, thanks brother I'm still getting into the groove
@ProceuTech
@ProceuTech 2 жыл бұрын
Glad you enjoyed! It’s admittedly a pretty niche programming concept.
@ItsAkile
@ItsAkile 2 жыл бұрын
@@ProceuTech That it is, I had it on the list of things I dont fully understand
@anshumandhuliya
@anshumandhuliya Жыл бұрын
Very nice and gentle introduction to the topic :)
@anonymouscommentator
@anonymouscommentator 2 жыл бұрын
Amazing video! I was interested in what AVX512 (and AVX2 in general) actually are and i found your great video explaining more than i hoped for!
@ohmygosh6176
@ohmygosh6176 23 күн бұрын
Update. Any AMD Zen 4 and up has AVX512 support. The game held diverse 2 uses AVX
@treelibrarian7618
@treelibrarian7618 10 ай бұрын
I thought it would be worth noting that just because AVX512 instructions work on 16 floats in one instruction doesn't make them faster than AVX2 instructions in practice, since as far as I know, in desktop and laptop CPU's the avx512 instructions are limited to a single execution port in the CPU, whereas the AVX2 instructions can execute on 2 ports simultaneously (duplication of the fast add and FMA capabilities) and for simpler 256-bit vector ALU functions like and, xor, blend and integer arithmetic, there's 3 ports they can execute through for 3 instructions per clock. The biggest benefit of the AVX512 instruction set seems to be versatility, with selective operation on partial vectors via the k-registers. I believe sapphire-rapids server and workstation CPU's have 2 AVX512 execution ports though. zen4 does avx512 instructions in 2 clocks, putting each half through the same 256-bit pipeline in turn.
@dagoberttrump9290
@dagoberttrump9290 3 ай бұрын
what happens if you align the simd processed vector to cacheline boundaries?
@charlieike8414
@charlieike8414 3 ай бұрын
I'm taking C++ in college and we're currently learning about arrays. Only looked this video up after an LTT video where they turned AVX off in BIOS to mess up the pc. Fate brought me here to maybe ignite a deeper passion for programming.
@ProceuTech
@ProceuTech 3 ай бұрын
The other Avx video I made more recently is a much better video than this one if you want better info- appreciate the support tho!
@salvageddoor
@salvageddoor 2 жыл бұрын
Just one small question: How can you return the array ret[16] in the function linear::vector_add()? It's a local variable so how can it be returned? Or am I missing something that is possible in C++?
@ProceuTech
@ProceuTech 2 жыл бұрын
Let me do some coding real quickly and do some tests. I’ll get back to you in a few minutes!
@ProceuTech
@ProceuTech 2 жыл бұрын
Ok so I just reran the function in order to see what was actually going on in the array. Turns out it wasn’t returning proper values! Thanks for catching that! I’m so used to working with vectors (which can be returned), and don’t have as much experience with arrays. Sorry for the confusion!
@salvageddoor
@salvageddoor 2 жыл бұрын
@@ProceuTech Thanks for clearing up my doubt! At least I know that it is possible to return a local vector in C++.
@ProceuTech
@ProceuTech 2 жыл бұрын
Vectors can still be processed using AVX aswell- you just have to use “_mm512_set_ps(i[0], i[1], etc., i[15]);”, which takes up more space in your program but offers identical performance!
@vytah
@vytah 2 жыл бұрын
@@ProceuTech With std::vectors, you can just use i.data(), which is the pointer to the internal array. As for returning AVX values, you can just return __m512 directly, or populate a std::vector via data() and return it.
@vinstontan9502
@vinstontan9502 Жыл бұрын
Excellent video! Effectively explains AVX
@KristianDjukic
@KristianDjukic 11 ай бұрын
thx for excelent video !
@Quancept
@Quancept Жыл бұрын
Very underrated video!
@opoxious1592
@opoxious1592 5 ай бұрын
Up to this day, i have never seen a real benefit of a game that needed avx instructions. A good example is Cyberpunk 2077. In the very beginning it would only run with cpu's with avx support. And a few months later they were also made the game run without avx support. There is not a single bit of difference with or without avx regarding graphics or performance in fps. It's a good thing, that more and more games do not require avx anymore, due to the fact that it asks for more resources and energy of your system without any visible gain in performance
@RoboticusMusic
@RoboticusMusic 9 ай бұрын
I came here because I vaguely remember someone mentioning something that can cause a CPU to overheat insanely fast. Is there something else that can overheat a CPU even faster, or was this it?
@sean8102
@sean8102 17 күн бұрын
Well AVX is very demanding, so the CPU uses a lot of power when executing AVX heavy instructions. And of course more power = more heat. Burn in apps like Prime95 I believe use or have the option to use AVX/AVX512 during the burn in test to really push the CPU as hard as possible. As for causing a CPU to overheat. Not it should not do that if you have a stable setup.
@realforest
@realforest 2 жыл бұрын
Your explanation at the end was very helpful! Me: "Why the hell would I ever use AVX instructions?" AVX: "Umm, you can skip an extra loop to transverse a vector, giving you a lot of performance if you do a lot of vector arithmetic!"
@LegendLength
@LegendLength 10 ай бұрын
How important is volatile when coding with AVX?
@treelibrarian7618
@treelibrarian7618 10 ай бұрын
no more than normal. Volatile is for when something (like another thread) might possibly modify the memory of a variable without the knowledge of the current thread, so the compiler should treat it as a volatile (subject to unpredictable change) value and re-read it whenever it needs to use it, and not assume it's value will stay the same if it hasn't changed it which prevents certain compiler optimisations that would assume the value is unchanging. AVX memory reads and writes happen in a single cycle like normal register reads and writes so there's no real difference. should also be noted there are no "locked" versions of AVX instructions, so if you are trying to operate on vector data with multiple threads, you should work out some other way to prevent race conditions, like data segmentation or mutexes (preferably with lock elision since the hardware memory synchronization involved in locks/mutexes is quite slow)
@naveediqbal5600
@naveediqbal5600 2 жыл бұрын
is there a way to remove AVX instruction from a game
@ProceuTech
@ProceuTech 2 жыл бұрын
Some implementations have a toggle where you can switch between AVX and “Non-AVX” algorithms. Not all of them have this though :(
@mkvalor
@mkvalor 2 жыл бұрын
I know, I'm adding to this comment section nearly two years later BUT... AVX-512 was almost certainly more than 77.5% faster than scalar. The values for the arrays were read "cold" from RAM for the AVX-512 function call, but the memory reads for that operation placed those values in the L1 data cache for the scalar loop. Benchmarking is HARD!
@ProceuTech
@ProceuTech 2 жыл бұрын
Is there an explanation as to why?
@mkvalor
@mkvalor Жыл бұрын
@@ProceuTech The first program to load a file from disk pays a time penalty for the disk I/O operations; however, the OS then keeps as much of that file in the system RAM as possible and some of the file even resides within the fast cache of the CPU itself. The next program you run which needs to read that file will retrieve the data very quickly from the CPU cache and system RAM. So that second program doesn't pay the same time penalty for disk I/O operations.
@lupsik1
@lupsik1 Жыл бұрын
@@mkvalor I have a problem understanding what you mean by the loading from disk. When the program is loaded those values are going straight into RAM. When the variables got initialised they get pushed onto the stack. When the AVX function is called all that happens is the address of a gets copied into the RAX register, and the address of b gets copied into RDX. The exact same thing happens when we run the linear function. Are you suggesting that the page containing this tiny program gets unloaded mid-execution?
@treelibrarian7618
@treelibrarian7618 10 ай бұрын
For sure there's a lot wrong with the test. first, the input data is unchanging, so the compiler should optimize out the loop entirely, or maybe just the memory reads. but this would also almost completely invalidate your argument about caching - which would be valid if the test actually had a significant volume of data and was storing the result somewhere. As someone already noted, though, the compiler may well have used vector instructions for the simple loop as well - more likely with clang I think - giving the somewhat poor showing of 70% speedup. It would all depend on compiler flags for optimization level and target architecture. If it didn't optimize everything well, then there may instead be a whole lot of overhead from the function calls and extra memory reads/writes involved. If I were to write assembler code to do what is presented in this test (on multiple data) it wouldn't take 80µs on a 5Ghz CPU. afaik these CPU's are capable of 2 reads 1 add and 1 write per clock, even at 512bit, so the whole process should take < 1µs with avx512 instructions. even with scalar instructions (which still execute on the vector alu, just through 1 channel) it should have happened in 15µs - slowed from 8µs only by the scalar reads of memory. to get a more reliable result, probably a significant chunk of data, and >1000000 iterations would be needed - and likely 100's of repetitions of the whole process to account for variations of CPU load, clock frequency (OS usually keeps clocks low till something starts happening - but it takes a few ms for it to respond), interrupting operations etc. and check the disassembly to be sure of what is being executed.
@ProceuTech
@ProceuTech 10 ай бұрын
@treelibrarian7618 I made an updated video with this information in it; the tests done in this video were flawed
@MrMonkeyZMemeZ
@MrMonkeyZMemeZ 2 жыл бұрын
I too am a fan of AVX
@ponchobob
@ponchobob 3 ай бұрын
@7:33 did I miss something? returning pointers of local variables is unsafe and leads to unpredictable behavior of the program.
@ProceuTech
@ProceuTech 3 ай бұрын
Yeah this video is honestly not great- check out my more recent AVx video if you want to look into the syntax more deeply! I explain it way better and without any of these goofy mistakes on my end
@Antagon666
@Antagon666 2 жыл бұрын
You should make sure your memory is aligned to 32 bits when using load function. Also chances are, the non avx version got auto-vectorized by the compiler to use avx/2.
@juanme555
@juanme555 2 жыл бұрын
i7 10700F vs 11700F , which one is better at AVX512 ???
@ProceuTech
@ProceuTech 2 жыл бұрын
The 11700F! The 10700F only features AVX2!
@juanme555
@juanme555 2 жыл бұрын
@@ProceuTech Is 11700F the same as 11700? I know the F doesnt have iGPU , but does the iGPU help with AVX512???
@ProceuTech
@ProceuTech 2 жыл бұрын
No, AVX-512 units are in the CPU cores themselves!
@subbastionbastion2167
@subbastionbastion2167 3 ай бұрын
Sorry sounds like cuda would be way faster and you can have thousands of threads at once running at the same time in higher chunks of data
@ProceuTech
@ProceuTech 3 ай бұрын
I've also got a video exploring CUDA and it's syntax- it's much more well put together in my opinion than this video! :)
@MrRayopt
@MrRayopt Ай бұрын
Where is the beginning tutorial ? This makes no sense
@ProceuTech
@ProceuTech Ай бұрын
The more recent video I made about AVX512 (linked in the first few seconds of the video) explain the concept and program much better
@SystemCrasher113
@SystemCrasher113 3 ай бұрын
I still have no clue what avx does after you explained it in detail. 😂 Don't worry though, it's me, not you.
@ProceuTech
@ProceuTech 3 ай бұрын
I have another AVX video that goes more in depth as to what the instruction set entails, as well as a better guide on programming for it. Might be worth a watch if you’re confused!
@youtubeshadowbannedmylasta2629
@youtubeshadowbannedmylasta2629 Жыл бұрын
avx hinders performance it can take things that used to work and by putting AVX into programs (even decades old) it now makes it so they no longer even launch.
@sean8102
@sean8102 17 күн бұрын
As for AVX hindering performance. I'm not a programmer. My only guess is maybe the "AVX offset" a lot of motherboards have where it downclocks the CPU by some amount when using AVX (though I'm pretty sure that can be turned off on most motherboards). As for AVX being a problem because of compatibility, I guess if you have a really old CPU. On the latest Steam hardware survey (June 2024), 97% of steam users have a PC that support AVX. From what I understand Intel and AMD started shipping CPU's with AVX support in 2011.
@NihalSingh-ld2en
@NihalSingh-ld2en Жыл бұрын
these video are distraction
@sean8102
@sean8102 17 күн бұрын
Then don't watch them?
AVX512 Properly Explained! - Performance and Syntax Analysis
16:17
SIMD and vectorization using AVX intrinsic functions (Tutorial)
1:06:15
Joel Carpenter
Рет қаралды 23 М.
Alex hid in the closet #shorts
00:14
Mihdens
Рет қаралды 10 МЛН
Beautiful gymnastics 😍☺️
00:15
Lexa_Merin
Рет қаралды 15 МЛН
RTX 3090 Ti + 15 Yr Old CPU
16:32
UFD Tech
Рет қаралды 1,8 МЛН
Next-Gen CPU Acceleration: AVX For Generative AI
16:07
TechTechPotato
Рет қаралды 23 М.
How To Identify A CPU Bottleneck - Is Your CPU Bottlenecking Your GPU?
13:11
Mostly Positive Reviews
Рет қаралды 38 М.
AVX-512 on Alderlake CPUs is less hot than AVX2
9:20
Actually Hardcore Overclocking
Рет қаралды 10 М.
The Intel Arc A310 is AMAZING - Perfect Plex GPU
8:11
Raid Owl
Рет қаралды 54 М.
Nvidia GPU Architecture
17:23
Bradon Fredrickson
Рет қаралды 88 М.
Nvidia CUDA in 100 Seconds
3:13
Fireship
Рет қаралды 1,1 МЛН
Intel HD 4000 | As BAD as People Say??
22:43
zWORMz Gaming
Рет қаралды 485 М.
Where Did Arch Linux Come From?
16:21
Action Retro
Рет қаралды 30 М.
x64 Assembly Tutorial 58: Intro to AVX
14:08
Creel
Рет қаралды 12 М.
$1 vs $100,000 Slow Motion Camera!
0:44
Hafu Go
Рет қаралды 27 МЛН
Телефон-електрошокер
0:43
RICARDO 2.0
Рет қаралды 1,3 МЛН
Как правильно выключать звук на телефоне?
0:17
Люди.Идеи, общественная организация
Рет қаралды 1,8 МЛН
تجربة أغرب توصيلة شحن ضد القطع تماما
0:56
صدام العزي
Рет қаралды 59 МЛН