X86 Needs To Die

  Рет қаралды 323,222

ThePrimeTime

ThePrimeTime

26 күн бұрын

Recorded live on twitch, GET IN
Article
hackaday.com/2024/03/21/why-x...
Guest
/ cmuratori
My Stream
/ theprimeagen
Best Way To Support Me
Become a backend engineer. Its my favorite site
boot.dev/?promo=PRIMEYT
This is also the best way to support me is to support yourself becoming a better backend engineer.
MY MAIN YT CHANNEL: Has well edited engineering videos
/ theprimeagen
Discord
/ discord
Have something for me to read or react to?: / theprimeagenreact
Kinesis Advantage 360: bit.ly/Prime-Kinesis
Hey I am sponsored by Turso, an edge database. I think they are pretty neet. Give them a try for free and if you want you can get a decent amount off (the free tier is the best (better than planetscale or any other))
turso.tech/deeznuts

Пікірлер: 1 600
@raidensama1511
@raidensama1511 24 күн бұрын
@ThePrimeTime this was S-tier material! Please have Casey back.
@saturdaysequalsyouth
@saturdaysequalsyouth 24 күн бұрын
What are the tiers?
@follantic
@follantic 24 күн бұрын
SABCDEF
@CaseJams
@CaseJams 24 күн бұрын
True professional
@chickenonaraft508
@chickenonaraft508 24 күн бұрын
I second this
@zsytssk5176
@zsytssk5176 23 күн бұрын
@admiral_hoshi3298
@admiral_hoshi3298 24 күн бұрын
TLDR: More complex does not mean slower.
@christianremboldt1557
@christianremboldt1557 24 күн бұрын
Yeah, pretty much. The most complex soloutions had to be found to make stuff faster
@julkiewicz
@julkiewicz 24 күн бұрын
If anything it's TLDR: The article is wrong cause the author doesn't know what they are talking about.
@bits360wastaken
@bits360wastaken 24 күн бұрын
@@julkiewicz Did you read the article? It was about how ancient rarely used instructions, and the sheer bulk of instructions take up valuable space and increase complexity. The only times "fast" was mentioned was them saying speed was their only priority.
@henry_tsai
@henry_tsai 24 күн бұрын
​@@bits360wastakenBut modern chip designers also know that, and they only allocated the bare minimum amount of die space and power to those obsolete instructions. Deleting those instructions won't yield any visible change.
@LiveErrors
@LiveErrors 23 күн бұрын
I think amds 3d VRAM shows that
@jonathangraham5179
@jonathangraham5179 24 күн бұрын
Professor of computer science here. Nice work. I loved Casey's exposition. I think Casey is being too conservative in his criticism. The idea that fewer instructions is better is an argument from 1980s RISC proponents -- excusable in 1980 but today we know that's simply not true. If fewer instructions were always better, we would have observed RISC architectures -- the ones that people actually used back in the day -- such as the one used on the PowerPC stay largely static. That never happened, the instruction sets and number of transistors increased on the PPC from generation to generation. The original author also misses the fact that even if we are sacrificing die space to implement instructions you can't just consider the consumption of die space "bad". Implementing something on die can result in a huge performance increase. When x86 ISA was extended to add AES operations (first...second? generation i7's?) The result was a 10x improvement in performance. Given the massive use of AES, who in their right mind would consider that a poor use of die space. Also, while I don't know for certain the etymology of "die" in cpu developent. I suspect it's attributable to the use of the term in machining. Where it can be used for any purpose made tool. i.e., A letter from a type case in a old-fashioned printing press would be called a "die". Some of those were eventually made using photo-lithographic processes.
@litium1337
@litium1337 24 күн бұрын
Die is just inherited from the manufacturing process of chopping something into many smaller pieces or "dicing", a large piece of silicon get diced, so the smaller pieces of silicon become die, there is no big mystery. Same as dicing an onion.
@excitedbox5705
@excitedbox5705 24 күн бұрын
I think it is called die, because you dice a wafer into many die, not because of the printing "pattern" die.
@shableep
@shableep 24 күн бұрын
I think the “problem” is more complex these days than raw performance. When it comes to making portable work stations, raw performance is a factor, but it is not the only factor. I think ARM is winning because a balance between performance and performance per watt. The smaller instruction set allows more efficiency gains, and thanks to modern 21st century programming toolchains and compilers, the disadvantages of a smaller instruction set is not nearly as much of a cost.
@hctiBelttiL
@hctiBelttiL 24 күн бұрын
​@@excitedbox5705 Or maybe because when you print the dies you are casting a metaphorical die (old English singular for dice) that determines how many individual units (cores) of that die are error-free at the targeted frequency? I'm probably overthinking it.
@adrian_b42
@adrian_b42 24 күн бұрын
Having thousands of rarely used instructions on die, multiplied by the number of cores (dozens today) can be wasteful. There are good uses like your AES example and bad use cases. The rare instructions can be avoided by the compilers, replaced with decent equivalents. Decoding consumes transistors and power, termal throttling is a fact. Where is die coming from? Remember the times when circuits were printed on a board with the traces covered with paint, then the board is treated with iron chloride FeCl3 to remove non-covered copper (etching)? Now it is done with EUV light, but it started with die 50-60 years ago.
@cubbucca
@cubbucca 24 күн бұрын
just got talked out of buying a Washer Dryer Combo
@squishy-tomato
@squishy-tomato 24 күн бұрын
nah, it uses twice the space, it's more expensive, and you still need to move the clothes from washer to dryer yourself besides, who's doing more than one load of laundry a day?
@XDarkGreyX
@XDarkGreyX 23 күн бұрын
​@@squishy-tomato is that a single or someone in a two-person houshold talking?
@Cadaverine1990
@Cadaverine1990 23 күн бұрын
​@@XDarkGreyXtwo person? Have kids...
@cadekachelmeier7251
@cadekachelmeier7251 23 күн бұрын
They're pretty great since you don't have to bother moving the clothes half way through. So you can throw a load in before bed or whenever. The main thing is that the drum for a dryer is about twice as big as a washer for a given capacity. So you can easily add too many clothes for it to dry well.
@squishy-tomato
@squishy-tomato 23 күн бұрын
@@XDarkGreyX doesn't matter, doing one load a day is not a big deal
@tenisviejos
@tenisviejos 24 күн бұрын
You know a person is really smart when they can break down complex concepts to other people. The pipeline explanation was *chef's kiss*
@XDarkGreyX
@XDarkGreyX 23 күн бұрын
Chat went on about the transfer between the machines, which I noticed too but... should he have addressed that when it comes to the hardware pipeline?
@proceduralism376
@proceduralism376 23 күн бұрын
@@XDarkGreyX The transfer would basically be instant, it's just a bunch of clocked latches that separate each stage
@ApplesOfEpicness
@ApplesOfEpicness 21 күн бұрын
The laundry machine analogy is like the standard go-to for explaining pipelining. The buffet analogy also works.
@BrunodeSouzaLino
@BrunodeSouzaLino 21 күн бұрын
Learning how to teach is a skill which has nothing to do with the knowledge you're teaching.
@TheVoiceofTheProphetElizer
@TheVoiceofTheProphetElizer 19 күн бұрын
@@BrunodeSouzaLino I feel as if millions of tenured researchers with teaching loads cried out, then were suddenly silenced. Perfect way to sum it up. If only the vast majority of people that taught realized it was so much more than verbally repeating something to a room full of 20 somethings.
@pbentesio
@pbentesio 24 күн бұрын
Casey Muratori is on a short list of people who motivate me to keep learning. It is inspiring to see people this knowledgeable about the subjects I love.
@mjthebest7294
@mjthebest7294 22 күн бұрын
Him and Jon Blow are my top ones.
@nowaymyname
@nowaymyname 24 күн бұрын
As someone who is currently learning x86 ASM at college right now, I feel like I've learned more by Casey in one hour than I have all semester. Please bring him back, awesome content! Full-time content creator Prime has so far not disappointed.
@OpenGL4ever
@OpenGL4ever 22 күн бұрын
You might also search for "The Intel 80386, part 1: Introduction Raymond Chen" and read part 1 to n.
@RetroPcCupboard
@RetroPcCupboard 11 күн бұрын
They actually make you study x86 ASM at college these days? Back when I was at university they did cover ASM for a simple microprocessor (I forget which). But not the x86 architecture. That was in the late 1990s. ASM is irrelevant for most software developers these days. Compilers will typically produce more optimised machine code than you can do manually with ASM. Unless you really know what you are doing and there is a specific case that the compiler does badly at (or can't do). ASM is useful to teach you the inner workings of a CPU though.
@OpenGL4ever
@OpenGL4ever 11 күн бұрын
@@RetroPcCupboard Knowing assembler helps you understand what the compiler produces and how high-level languages work. I think this is very valuable knowledge, it's like Latin for languages.
@RetroPcCupboard
@RetroPcCupboard 11 күн бұрын
@OpenGL4ever Sure. If you are interested in that. I think most developers these days don't really care how the compiler works or even what the inner workings of a CPU are. I actually find it fascinating. Despite the fact I have been a software dev for 25 years I am only now learning x86 assembly. I have an old Pentium MMX PC that I am using for the purpose. I realise that I could do it on a Modern PC. But I feel that a slower PC makes more sense for seeing the impact of ASM vs older compilers.
@Pootie_Tang
@Pootie_Tang 10 күн бұрын
@@RetroPcCupboard man, someone study computer engineering, how can we not study x86 asm if we study how to develop said processors? =)
@kaizen8808
@kaizen8808 24 күн бұрын
I an ancient too. Programming professionally since 1988 :D
@joshuatye1027
@joshuatye1027 24 күн бұрын
Congrats
@nezbrun872
@nezbrun872 24 күн бұрын
My first paid for programming job was in 1979, but I wrote my first program in Algol on paper tape in 1976. Really like this guy because he speaks my language, and calls out the downsides and very real practical impact of today's fashionable sacred cow practices.
@veritypickle8471
@veritypickle8471 24 күн бұрын
ty for your service
@huso7796
@huso7796 24 күн бұрын
Oh cool, like honestly. What were you programming? How was the debugging process without fancy IDEs? What was it like to describe your job to other people? Could you elaborate more if you don't mind how different it was compared to today's way of software development?
@cylian91
@cylian91 24 күн бұрын
as old as turbo C ! (anyone know a good decompiler for dos ?)
@tamertamertamer4874
@tamertamertamer4874 24 күн бұрын
I‘m dyslexic. So I have a dyslexic dude reading for me lmaooo.
@cat-.-
@cat-.- 24 күн бұрын
I'm not dyslexic, but I'd like to think I am to justify me reading very little
@andrewdunbar828
@andrewdunbar828 23 күн бұрын
Man, I'm not dyslexic but I'm still the slowest reader I've ever met.
@ark_knight
@ark_knight 23 күн бұрын
If it helps, you can think of it as "pipelining". If you were just reading it yourself, you would be reading it yourself. But since you are hearing him read it, you can go do other task. Cue multi-threading. (Or get entertainment out of it while still learning)
@XDarkGreyX
@XDarkGreyX 23 күн бұрын
I can read fast but just as in school I may need to read a sentence 10 times even at low speed to even just barely get it.
@tamertamertamer4874
@tamertamertamer4874 23 күн бұрын
@@ark_knight lmaoooo true
@channel11121
@channel11121 24 күн бұрын
Casey was so disappointed when he didn't understand why little-endian was better, and also didn't care enough to understand it.
@andrewdunbar828
@andrewdunbar828 23 күн бұрын
I think it only occurs to us when we've done assembly programming, or bit-banging level C programming.
@YaroslavFedevych
@YaroslavFedevych 23 күн бұрын
@@andrewdunbar828or speak German, numerals from 21 to 99 are little-endian there. Or when we add/subtract/multiply on paper, somehow it’s easier to do little-endian.
@andrewdunbar828
@andrewdunbar828 23 күн бұрын
@@YaroslavFedevych Ja ich weiss es! In English numbers are big endian and can make writing functions that do things like decimal conversion, ASCII conversion, adding thousands separators a bit unintuitive or at least more tricky. I started assembly on the Z80 which was little endian but lost the feel for it so much after moving to the big endian m68k that little endian never felt natural again.
@realmarsastro
@realmarsastro 23 күн бұрын
@@YaroslavFedevych gimme 99 of them luftballons, amirite? In Norway we actually can pronounce the numbers 21-99 in both big endian and litte endian. The little endian variant is more common with older people, it's unfortunately dying out.
@arnesl929
@arnesl929 23 күн бұрын
Yeah, even I understood, I was a bit surprised by the lack of enthusiasm.
@sdwone
@sdwone 24 күн бұрын
So glad people like Casey are still out there fighting the Good Fight! Because the way things are going, computers and software development in general, will get so complicated that only an elite few will truly understand it all. And those elite few will have unprecedented power! So yes! I'm not saying that all developers need to get a degree in all this low level stuff... But that, the more of us that know, even roughly, how a computer actually works, the better!!!
@XDarkGreyX
@XDarkGreyX 23 күн бұрын
More and more people use hammers but fewer and fewer know how do build or at least understand them? Applies to countless fields, but would that be a valid metaphor?
@sdwone
@sdwone 23 күн бұрын
@@XDarkGreyX Yeah... That metaphor sounds totally reasonable to me! 👍🏼 Particularly in this industry.
@aliasjon8320
@aliasjon8320 24 күн бұрын
Are we also going to get a "X86 doesn't need to die" with Primes face photoshoped onto Mercy from overwatch as the thumbnail
@MrHaggyy
@MrHaggyy 24 күн бұрын
XD the mirrored Mercy from upcoming season would fit great.
@technomancer75
@technomancer75 23 күн бұрын
While riding on a fake horse ;)
@XDarkGreyX
@XDarkGreyX 23 күн бұрын
​@@technomancer75 should be cow he is riding on. He owned a cow once, or does own them still.
@hoeding
@hoeding 24 күн бұрын
Washer / Dryer metaphor for pipelining nailed it.
@Dongdot123
@Dongdot123 21 күн бұрын
Damn right we just understood it so easily with that explanation
@ylstorage7085
@ylstorage7085 21 күн бұрын
ford's assembly line could have served better
@Aberusugi
@Aberusugi 20 күн бұрын
Yeah, I finally have a way to describe the concept to other people. Very helpful.
@markteague8889
@markteague8889 19 күн бұрын
The fast-food drive thru is a pretty good one!
@_somerandomguyontheinternet_
@_somerandomguyontheinternet_ 19 күн бұрын
Yup! Using that from now on!
@haraldfielker4635
@haraldfielker4635 24 күн бұрын
20 years ago - same talk :) The solution in ~2000 was "Intel Itanium" 😂😂
@stepank1
@stepank1 24 күн бұрын
Yeah people blaming intel / AMD for "bloated" x86 and "tech debt" as if this decision was not made solely to please the customers is pretty incorrect
@AlecThilenius
@AlecThilenius 23 күн бұрын
I had this thought too. Intel tried to fix these issues in Itanium. It's now nicknamed i-tanic because no one wanted to recompile their code to run on Itanium. I'm not at all a fan of Intel, having worked there for 2 years back in college many moons ago, but you can't soley blame them for x86 legacy.
@stevesether
@stevesether 22 күн бұрын
@@AlecThilenius as I recall, Itanium never achieved the supposed performance gains, even if you DID recompile your code. In the Linux world, Debian alone supports 4 different CPUs. x86, MIPS, Arm, and PowerPC. Re-compiling isn't really a problem. I believe Itanium was supported. The servers themselves were freaking EXPENSIVE. I've honestly never seen one, used on, or worked anywhere that had one.
@Blackfatrat
@Blackfatrat 18 күн бұрын
X86s will probably be the actual solution for Intel and AMD. It can run older programs just fine, no problems at all with any 64 bit x86 code and will just strip out the unused part. Intel published it and I hope they move forward with the idea.
@timothygibney159
@timothygibney159 11 күн бұрын
@@stevesetheryou underestimate how much technical debt for legacy dos and Windows 95 software run on modern business. No compiling Deb files for PowerPC or arm won’t help run accountings old Oracle macros written vb 5 back in 1998 in Excel only.
@ketchrahalvard8134
@ketchrahalvard8134 24 күн бұрын
As a chip designer I would like to point out that when any article like this comes up about dropping x86, what they really mean is dropping the x87 floating point extensions (the one that is a stack architecture and runs in 80 bit precision mode), The is specifically what the new Intel spec is aimed at killing. For those of you interested in why just think about how you would do register renaming when your register numbers are all stack based.
@OpenGL4ever
@OpenGL4ever 22 күн бұрын
Then I have a question. I've also heard that Intel would like to throw out some things. But if a CPU has, say, 8 cores, would it be possible to just throw those things away at 7 cores and keep them at just one core? Especially the old stuff that was used in the DOS era was never written for multicore CPUs anyway. This old software would therefore only need this one full-fledged core.
@stevesether
@stevesether 22 күн бұрын
@@OpenGL4ever I had a similar thought. If you suddenly take away the x87 FP stack, what software is suddenly going to break without being re-compiled? It might make a lot more sense to just de-emphasize these old instructions, and make them work, but not as performant.
@asm_nop
@asm_nop 22 күн бұрын
​@@stevesether I don't know what Intel's proposed solution is, but I imagine they have a way to hook those instructions at execution time and deal with them. Since they're very old instructions, they have the benefit of only occurring in very old code. Sure, you could use a ridiculously complex decoder to convert them, but you could also do something crazy like raise a flag to the operating system and flag the code to be decompiled and rebuilt into equivalent compatible instructions by an OS process, and link it back into the original executable. The first run might be slow, but the second time would be real fast.
@Folsomdsf2
@Folsomdsf2 21 күн бұрын
yah, unfortunately the article author and even the commentators have reasons to not really be.. honest so to speak.
@giornikitop5373
@giornikitop5373 21 күн бұрын
makes sense, these are very old and i don't believe they ever been used, since the pentium era, if ever. the x86 compatibility goes a long way but i guess it's on the safe side, mmx/sse were being used instead. there are also some other legacy stuff that can be removed safely. as for the renaming, isn't intel already using, in lack of a better term, indexed locations for the registers? maybe you can shed some light here because i really don't understand exactly what they do, if that holds any truth.
@Maisonier
@Maisonier 10 күн бұрын
You should get an e-ink device for reading. I like the Boox Note Air 3C (android e-ink tablet), but you also have the remarkable, supernote, papyr.
@Nirsi
@Nirsi 24 күн бұрын
"I'm ancient" yeah sure "I was professionally programing since 1995" well you program longer than I'm alive, you earned that title
@darekmistrz4364
@darekmistrz4364 22 күн бұрын
He looks like he was born in 1988 at most. He must have been programming at the age of 7! No wonder he is a genius
@yt45204
@yt45204 22 күн бұрын
Wait until you get a load of us early 70s Gen-Xers who learned to program assembly in the early 80s, and went on to program things like Battlefield, Minecraft, Skype, etc. Oh, and Linux.
@PennsyltuckyPhil
@PennsyltuckyPhil 21 күн бұрын
I thought the definition of ancient was knowing of EBCDIC and having an IBM 370 gold card within arm's reach.
@TheVoiceofTheProphetElizer
@TheVoiceofTheProphetElizer 19 күн бұрын
Unless you're solving math problems using vacuum tubes, I'm not sure what exactly you all are defining "ancient" as.
@PennsyltuckyPhil
@PennsyltuckyPhil 19 күн бұрын
@@TheVoiceofTheProphetElizer Well I guess we could go back to where a program's bug is a moth caught in a relay ...
@TurtleKwitty
@TurtleKwitty 24 күн бұрын
I forgot those transparent board exist for a sec so when he started writing in the air I was so confused and amazed XD
@jewlouds
@jewlouds 24 күн бұрын
I was more impressed he 2as writing backwards
@tehwibe
@tehwibe 24 күн бұрын
@@jewlouds Nah, the camera is flipped horizontally
@XDarkGreyX
@XDarkGreyX 23 күн бұрын
The scroll-up got me
@nahkh1
@nahkh1 23 күн бұрын
For those who are curious, the tools used for cutting a specific shape out of a material is called a die: en.m.wikipedia.org/wiki/Die_(manufacturing) I'm assuming the cut pieces (e.g. CPUs) took on the name of the tool used to cut them over time. I'm also pretty sure that they don't use literal dies anymore to cut out the individual chips from the silicon wafer.
@blarghblargh
@blarghblargh 22 күн бұрын
didn't find anything that's authoritative on this, but I did find a few results referring to the process of cutting up a larger item into square pieces being "dicing". the results of that are "dice". an individual item is a "die", and found results saying that was the etymology for integrated circuit dies. CPUs aren't stamped/casted/extruded, and never have been, which is why I am not sure the linked wikipedia die concept applies. But I can't fully argue either way. What I can agree with is it is related to the manufacturing process. We can be pretty sure of that.
@nahkh1
@nahkh1 21 күн бұрын
@@blarghblargh the specific use of die I mean from the article is a stamping die. It's basically a matching set of "knives" with a complex geometry. The dies are pressed together with the material in between, and while I doubt that's how silicon chips are cut these days I can believe that's how it would've been done in earlier days.
@autarchex
@autarchex 20 күн бұрын
@@nahkh1 Integrated circuits are batch manufactured in a grid pattern on a wafer of (most commonly) pure crystalline silicon. The individual product pieces are separated from the wafer ("singulated" or "diced") using a saw, which converts the wafer into a large number of identical small parts collectively called "dies" or sometimes "dice" (older usage) and one of these is called a "die," and this term in a semiconductor industry context always refers to the product and not the tool. There are a few other rarely used singulation techniques other than a saw, like laser cutters, waterjet cutters, even particle beams that can cleave the wafer by precisely imparting millions of crystal defects in a line. As far as I know we never used stamping die-cut (where "die" means tool, not the product) techniques though; silicon crystal is quite hard and brittle, and the wafers chip and shatter more resembling thin discs of glass than thin discs of metal. Nonetheless, I'm sure the terms share a common history. There are other contexts too, where die and dice refer to the product and not the tool that made them - for example, playing dice.
@mansquatch2260
@mansquatch2260 10 күн бұрын
I looked it up on wikipedia. It's called a die, because: " Typically, integrated circuits are produced in large batches on a single wafer of electronic-grade silicon (EGS) or other semiconductor (such as GaAs) through processes such as photolithography. The wafer is cut (diced) into many pieces, each containing one copy of the circuit. Each of these pieces is called a die."
@Cmanorange
@Cmanorange 24 күн бұрын
39:35 casual magic
@TheJukeJuke2
@TheJukeJuke2 24 күн бұрын
I saw you drop a Wheel of Time reference in a video earlier today and now I have to watch you daily.
@Maxible
@Maxible 23 күн бұрын
This video was exceptional! Loved diving into the weeds. Also, kudos to your guest for having that board setup. Super helpful and so awesome!
@robgrainger5314
@robgrainger5314 24 күн бұрын
Wonderful exposition by Casey there. I had a decent understanding of x86 architecture, but still managed to learn something, which is always a pleasure. Ps. Please remember to put links in your videos.
@jaysistar2711
@jaysistar2711 24 күн бұрын
Actually, you have quite a few of us "ancients". I'm not even sure when I learned C, but 1996 is when I swiched from DOS to Linux. I got a Windows machine because people seemed to have them, so I had to recompile/test on it.
@PixelThorn
@PixelThorn 24 күн бұрын
What made you switch to Linux that early?
@idhindsight
@idhindsight 24 күн бұрын
@@PixelThornnot op but similarly old, when win95 came out, I felt it “hid” the true OS and hated it. When I heard about Linux in 99/2000, I was an instant convert.
@jaysistar2711
@jaysistar2711 23 күн бұрын
@@PixelThorn Windows 95
@darekmistrz4364
@darekmistrz4364 22 күн бұрын
@@idhindsight Hiding isn't bad. It's just a diffrent client use case. That helped Microsoft gain the popularity it has today.
@idhindsight
@idhindsight 22 күн бұрын
@@darekmistrz4364 ehh I didn’t say it was objectively bad. It sure as hell isn’t for me (and most developers)
@FallenStarFeatures
@FallenStarFeatures 24 күн бұрын
In reality, all generations of X86 and x64 assembly language have been emulated in microcode since the advent of the Pentium P6. The underlying cpu hardware contains banks of interchangeable 32-bit and 64-bit registers, along with RISC primitive instructions that operate on them. Both X86 and x64 assembly language instructions are parsed by a cpu hardware interpreter that converts them into streams of microcode instructions that are speculatively executed in parallel by multiple internal execution units. This is the actual Intel "machine code", and it is not possible to manually program with it. Human assembly language programmers can only use hardware-interpreted X86 and x64 instructions, the underlying Intel microcode is locked inside the cpu. Decoding X86 and x64 assembly language can actually run faster than an ARM cpu executing manually programmed RISC code. That's because Intel assembly language is more compact than RISC machine code, and can thus be loaded more quickly from memory, which is often the limiting factor in code execution speed. Underneath the hood, both Intel and ARM cpu's are highly optimized RISC machines. The difference is that ARM assembly code is executed directly by the cpu, while Intel assembly code is virtualized and emulated by internal microcode.
@astrixx
@astrixx 24 күн бұрын
That was a pretty useful explanation.
@theexplosionist2019
@theexplosionist2019 24 күн бұрын
I read they use 70~ bit registers to store flags + GPRs.
@adrian_b42
@adrian_b42 24 күн бұрын
You are right. The only complaints about CISC in x86 are the variable length of the instructions that makes decoding a more complex and also the rarely used instructions carried over the decades.
@mitrus4
@mitrus4 23 күн бұрын
Are you sure about your last point? With ARM you can start decoding instructions in advance, because all of them are 4 bytes each, so you don't need to wait for decoding to get the address of the next one. With x86, on the opposite side such a dependency of variable length doesn't allow for it. I think it can outweigh the increased pure count of simple instructions, especially if there is no code bloat and hardware instruction memory prefetching does its job well.
@FallenStarFeatures
@FallenStarFeatures 23 күн бұрын
@@mitrus4 - Program memory is loaded from RAM into processor caches, which on Intel chips are divided into interchangeable 64-byte cache lines. The cpu instruction decoder has no need to calculate the RAM address of the next instruction, it relies on the program counter to automatically keep its local instruction cache full. If that instruction cache ever underflows, the instruction decoder will have to wait until the cache is refilled from RAM. However, cpu execution units and data load/store units can continue to process previously decoded instruction micro-ops, which proceeds speculatively in parallel with instruction decoding. In practice, it is far more common for execution to stall due to logical or algorithmic dependencies than for speculative execution to outrun instruction decoding. With ARM cpu's, each machine code instruction is four bytes long. On Intel cpu's, the most common instructions take just one byte, though complex instructions can be up to fifteen bytes long. On average, Intel machine code tends to be about half the size of ARM code.
@revenevan11
@revenevan11 23 күн бұрын
Thank you for covering this and for having Kasey on! I'm fascinated by both the history and technical minutiae of processors, too many people take the fact that we've "tricked rocks into thinking using electricity" for granted when there's such complexity within the hardware itself. In particular, I'm fascinated by the era during which ARM was designed, because they had more affordable and compact personal computers on which they could design the chips of the future. The processor industry bootstrapped itself on a macroscopic scale!
@ScottGrunwald
@ScottGrunwald 22 күн бұрын
Amazing Video!! Suggestion: Prime and Casey do a long form video on all the details of ARM vs x64/86.I would love to learn more about the differences of the two and why current ARM chips like M3 Max are just as performance as top intel mobile chips yet insanely more efficient. I know the node differences but that can't be all of it. Is the reduced instruction set really giving them that much of an edge? This video was amazing and I learned a shit ton, but have more questions. Sounds like decoding performance might be a huge factor comparing the two.
@Dom-zy1qy
@Dom-zy1qy 24 күн бұрын
At the washer dryer and washer/dryer analogy, i thought he was going to say the washer dryer combo was faster because I always forget to swap the laundry to the dryer when it finishes... So it ends up taking like 2 hours extra. It was a good analogy though, I actually didnt even know about micro ops before this, but it makes a lot of sense.
@darekmistrz4364
@darekmistrz4364 22 күн бұрын
I also wanted to say that I forget the laundry or that you need a specific person to sit next to the washer not to forget to move it from washer to dryer
@kamikaz1k
@kamikaz1k 24 күн бұрын
this is was so freaking helpful to understand. Prime thanks for forming your relationship with Casey; Casey, you should definitely piggy back off more creators' reach to share your wisdom. This is a win-win-win arrangement. 👏
@MrDivinePotato
@MrDivinePotato 11 күн бұрын
Great chat, love this kind of geeking out! I don't understand it fully but I feel like I got a slightly clearer picture from this.
@BilalHeuser1
@BilalHeuser1 17 күн бұрын
When I started to write assembly languages for my TRS-80 which was Z80 based, I also learned about the Intel 8080, which is what the 8088/8086 CPUs were based on. Much of the register structure was duplicated on the Z80, but Zilog added extended instructions. Knowing that makes understanding the 8086 and subsequent generations that much easier.
@piotrj333
@piotrj333 24 күн бұрын
This is garbage article. First, x86 is also RISC architecture for long time. CISC instructions become RISC internally in CPU, and fact we don't use RISC directly, only costs us 5% at most 10%. That is essentially entire cost of X86. 2nd. AMD ryzen for laptops in energy efficient configurations can compete with Apple M processors, at least in work done per joule. There is some problems with idle draw, but those can be attributed to Windows and itself SoCs (like Apple has soldered RAM really close to CPU) not to architecture again. 3rd. Spectre and Meltdown affected x86, ARM and even IBM power architecture.
@abbe9641
@abbe9641 24 күн бұрын
Yeah chuck a good mobile Ryzen processor to linux instead and the difference i heard is night and day with battery life improvements in the 10's of %
@gigitrix
@gigitrix 24 күн бұрын
Don't disagree but that 5/10% on something that could be delegated to compilers in a post-moore's law world is more relevant than you might think
@neoqueto
@neoqueto 24 күн бұрын
Modern ARM and x86 and even RISC-V ISAs are pretty much all nearly identical. They can all do the same things and the notion that ARM is a "less complex" architecture because "ARM is RISC and x86 is CISC" should only be made fun of. However we are at the point when we have to scrape the bottom of the barrel for miniscule gains.
@TheSulross
@TheSulross 24 күн бұрын
Yeah, I don't want the main RAM as a permanent, unchangeable fixture of my computer the way Apple does things now. Now is true that one could produce a lower transistor count x86 if stripped out all of its legacy stuff and only implemented the instructions used by modern software and operating systems. That would be an interesting project (and am suggesting cleaving more than just the CPU coming up into 8086 real mode). After all, any modern CPU per its performance can more than adequately emulate vintage CPUs of the '70s, '80s, up to mid 90s - for those that want to play their fav retro games. Retro computer emulators on ARM - like the Pi - prove this every day. So x86 doesn't need the transistor real estate that is dedicated to supporting stuff that are vestiges from 40 years ago. Git rid of that fat to lower the power draw. And aren't there too many redundant SIMD instruction sets - why not trim that down to what is only in vogue in, say, last decade?
@neoqueto
@neoqueto 24 күн бұрын
@@TheSulross especially with EFI being so commonplace, a lineup of "stripped down" x86 parts could maybe be viable... but then again they have hundreds of engineers wracking their brains all day long about performance and power efficiency improvements so it's not like they haven't thought of the same ideas as us smoothbrains have (and there's probably reasons they concluded they're stupid).
@coolworx
@coolworx 24 күн бұрын
I'm not nearly a programmer. I'm just an arborist, who gets up at 5am and works in the orchards of the Okanagan, But don't get me wrong, I love the job I fell into. I'm a fruit tree doctor, and I've saved many a patient! But in the evenings I'm a hobbyist hack, building personal projects because it's fun. I fkking love this channel. And this episode was outstanding. Prime was asking a lot of questions I was having, and the presenter was excellent. Ohh I'm sure the information will never be put to direct use, by myself. But it's good to know how the things in your life actually work. I certainly have a better understanding of the term _microcode_
@XDarkGreyX
@XDarkGreyX 23 күн бұрын
Huh, really interesting that you are here. Kudos. *clap clap*
@loo_9
@loo_9 23 күн бұрын
most of programming are techniques only an engineer truly needs to do. but computer science is a beautiful subset of math that anyone can appreciate. as an arborist you are likely implementing the same optimization techniques that CPUs use without thinking about them, multitasking, task overlap, task reordering, pipelining, etc. it’s just a definitive way to think about the world
@coolworx
@coolworx 23 күн бұрын
@@loo_9 Like i said, I've been building side projects for a while. One of them is a nifty nodejs and nedb tree notes app that lets me keep track of progress and search for any key words or time intervals. Now I want to get some charts going to show trends. I have almost 10 years of data, that I've been gathering. Including daily weather conditions. So... ya. I tree surgean by day, and code at night. And the best part is, I have no deadlines, no HR directors, no tiresome teammates... ;-)
@lritzdorf
@lritzdorf 22 күн бұрын
> But it's good to know how the things in your life actually work. This. This is the reason I enjoy computing, from both a hardware and software perspective. Yes, the tech is cool, but what I truly love is understanding what the heck these magic boxes of lightning and sand actually _do._
@allmycircuits8850
@allmycircuits8850 21 күн бұрын
As an arborist, you're officially a HACKER :)
@romanstingler435
@romanstingler435 12 күн бұрын
The term "die" for a silicon chip originates from the process of manufacturing these chips. Here's the connection: Starting Material: Integrated circuits are typically fabricated on wafers, which are thin slices of electronic-grade silicon (EGS). Circuit Formation: The desired circuit patterns are formed on the wafer surface using various techniques like photolithography and etching. This creates a network of tiny transistors, wires, and other components that make up the integrated circuit. Dicing: Once the circuits are formed on the wafer, it's cut into individual pieces, each containing a single copy of the circuit. These individual pieces are called dice (plural: dice or dies). The term "die" comes from the process of dicing, which is similar to how dice are cut from a larger block of material. The wafer is essentially diced into smaller functional units, hence the name "die." Generated by Gemini
@mateusgodoy5060
@mateusgodoy5060 10 күн бұрын
I consider this video a gift. Thank you, Prime and Casey. Also the synergy between you two was amazing, almost like you had everything planned in advance! Phenomenal job!
@miroslavhoudek7085
@miroslavhoudek7085 23 күн бұрын
I worked on this rocket launcher once, and the on-board software was compiled for both the ARM based embedded in the rocket - but also for Intel PC development workstation for easier testing. So we had a little-endian code and a big-endian code produced. And we were doing bitwise arithmetic and network transfers that had to work in both environments, all that in Ada 2012. I'm confused still until today.
@OurSpaceshipEarth
@OurSpaceshipEarth 14 күн бұрын
nasa loves those coc pc on chip ecc ram [x3 mCHINES IN CVASE SUBSPACE TREKKING BIT FLIP DISAGREEEMENT]
@andrewdunbar828
@andrewdunbar828 23 күн бұрын
Even in the 8 bit days plenty of us used to program in machine code, simply because we didn't have assemblers. We had documentation of the bitfields of the instructions and addressing modes and wrote BASIC programs that put all the bytes we figured out on paper into memory.
@635574
@635574 15 күн бұрын
This sounds like medieval torture of computers. I wasn't aware someone could even program in machine code. Big copium.
@cloakbackground8641
@cloakbackground8641 24 күн бұрын
I've wondered from time to time if it'd be easier to just write μops directly and peal back the abstraction, but Casey explaining it as _compression_ made it suddenly make sense: CPUs are much more limited by data transfer than processing.
@warpspeedscp
@warpspeedscp 23 күн бұрын
You'd be going back to itanium and ps3 cell processor level if you did that, haha
@DavidRodrigues
@DavidRodrigues 22 күн бұрын
Great show! Knowledge, curiosity and energy always trump. Thank you.
@jaysistar2711
@jaysistar2711 24 күн бұрын
Von Neuman architecture basically means that code and data can be in the same address space. Harvard architecture, like an 8051, has code and data in 2 seperate address spaces.
@thewhitefalcon8539
@thewhitefalcon8539 24 күн бұрын
And Modified Von Neumann means they have separate caches.
@catcatcatcatcatcatcatcatcatca
@catcatcatcatcatcatcatcatcatca 24 күн бұрын
To me this was taught with emphasis on different busses: instruction bus and data bus can be accessed simutaniously, and can have different widths. For example 14-bit PIC controllers have 14-bit instruction bus, but 8-bit databus. Any modern microcontroller needs an ability to write instructions while running and execute them later: I don’t want to rewrite the Rom of my laptop everytime I install something. And the kernel needs some way to give userspace code access to the CPU (the alternative would be virtualisation, I think). Either way the relevant consepts of privilege are so far removed from the hardware that the distinction between Von Neuman and Harvard architectures isn’t as meaningful as their orginal definitions.
@ern0plus4
@ern0plus4 24 күн бұрын
The idea of von Neumann's idea is not about address spaces, but about the representation of the program: if it's represented as data, we need no extra mechanism to handle it (load, save, execute). The executor unit (CPU) already should access memory, loading and storing data from/to, so if we represent program as data, it's not a big deal, we can use common mechanisms for it. I don't know how computers looked before this idea, AFAIK, the programming was done by re-wiring the whole computer physically for the specific task (aka. program). Storing program in memory is a brutally significant improvement: need longer program? just add memory; bug in program? just modify some bytes - etc.
@Eugensson
@Eugensson 24 күн бұрын
There's also Mill architecture
@MrHaggyy
@MrHaggyy 24 күн бұрын
@ern0plus4 in the really old days, when computer where mostly mechanical devices like the enigma machine, you had mechanisms for "instructions" and another medium like paper cards for "data". When they started moving from mechanical to electrical they still handled instructions differently from data. It wasn't until the first silicon transistors that you made and handled data on the same material. Today it's a bit odd as on chiplets we go back to using different processes on the same wafer, so we somehow got different materials again. But basically any CPU today doesn't distinguish between code and data. But it might come back if people really do care about security. Harvard architectures are far more robust by design against injection attacks.
@freyja5800
@freyja5800 23 күн бұрын
One thing about the conclusion that stood out to me is that the article makes out of order, speculative execution & superscalar seem like they are bad things, while they are things you actively want in your compute hardware. Like, to make things faster you can either a) do the same stuff, but faster, or b) do more stuff, but in the same time (i.e. faster clock speeds vs. parallel computation). and while there definitely are situations where the first is the only way (dependent operations etc), if you have the choice the second is always more preferable, because the energy required for running it at a lower speed but in parallel are much more favourable, to the point where it is useful to have dedicated low-speed hardware for certain tasks (accelerators, in particular gpu's are a prime example of that)
@Sythemn
@Sythemn Күн бұрын
I was worried you guys weren't going to get to the decode complexity. Glad you got to that. Die space, electricity, engineering time, etc do put x86 at a disadvantage to an instruction set with hindsight such as RISC-V. But I think the much, MUCH, better argument for why RISC-V will be and should be the standard in the future is the "Anyone can make one of these." aspect. Competition for the HPC side, vendors targeting niche products, and a generally larger ecosystem of choices and solutions. I don't see a world where computing doesn't end up better for the end user with RISC-V as the standard. Also, I agree with the RISC-V designers. Vectorized code is just soooooo much more scalable than SIMD is and x86 went all in on SIMD.
@BenTzionZuckier
@BenTzionZuckier 24 күн бұрын
(optimizing compiled language argument) people write for the llvm using their language and the llvm writes itself for the “API” of the cpu which is the instruction set like x86-64 and the instruction set writes for the microcode and the cpu can only “do” certain actual things at the end of the day optimizations at every layer, it’s turtles all the way down until the actual cpu architecture
@darekmistrz4364
@darekmistrz4364 22 күн бұрын
And then you need to remember that all that computer is doing is juggling memory: from disk to ram, from ram to L3 cache, from L3 to L2, from L2 to L1, from L1 to CPU and all the way back to disk.
@craiggazimbi
@craiggazimbi 24 күн бұрын
The Name 🚀
@AregGhazaryan
@AregGhazaryan 24 күн бұрын
fireship?
@timseguine2
@timseguine2 23 күн бұрын
One thing I think the author of the article doesn't seem to get, is that if you follow the arguments they actually make to their logical conclusions, you don't end up with RISC-V or ARM, you'd more likely end up reinventing Itanium.
@mrbigberd
@mrbigberd 21 күн бұрын
You can't go that far because the halting problem gets in the way. By the last iteration, Itanium was a bog-standard architecture that basically ignored the whole VLIW aspect entirely.
@SimonBuchanNz
@SimonBuchanNz 20 күн бұрын
Just encode the processor control lines directly! Pay no attention to how you actually read the instructions...
@mrbigberd
@mrbigberd 20 күн бұрын
@@SimonBuchanNz the ISAs don’t just vary in syntax, but in semantics and in guarantees. These guarantees must be fulfilled.
@acheleg
@acheleg 9 күн бұрын
Conroe
@colinmaharaj50
@colinmaharaj50 17 күн бұрын
I was there live. I started my tech program in 1988 with the 8085. I got a job as a Telecoms tech, but they had a 8086 XT and I bought a 80286 and started gw-basic / turbo pascal then turbo C then C++. I am still with C++ 30 years later. I also did assembler. The telecoms section went bust but only long after I moved to I.T., I wish I were still in the telecoms side doing PABX stuff. I was the first non-manager / non executive staff to get a PC and Laptop in TSTT.
@mezza205
@mezza205 4 күн бұрын
dude great content sub'd, really wish i had what it takes to be in on the industry this make so much sense.
@drooplug
@drooplug 24 күн бұрын
Ok. The glass board was an awesome surprise. I've seen it before, but it's still cool. Someone needs to tell me how he scrolled the the previous writing up.
@c0dy42
@c0dy42 24 күн бұрын
its probably just a massive piece of glass that can move up or down with the help of motors
@APaleDot
@APaleDot 22 күн бұрын
@@c0dy42 I believe it's a sheet of plastic that's rolled up on the top and bottom.
@Muskar2
@Muskar2 22 күн бұрын
I take his course and he uses it all the time. I'm not exactly sure how he does it but probably a foot pedal. We saw the full height of the board in this video, so he has to wipe it when he needs to use more space than this.
@c0dy42
@c0dy42 22 күн бұрын
@@APaleDot I thought that at first as well. but I think that we would be able to see that, because then there would have to be a big piece of acryl or Glas behind it, so he has a solid surface to write on.
@nicostein9875
@nicostein9875 23 күн бұрын
Having a good teacher successfully explain something novel to you feels like watching a magician.
@koijoijoe
@koijoijoe 23 күн бұрын
love to see good view count on a 1hr plus video! hope you are feeling strong at your full time start!
@von_nobody
@von_nobody 23 күн бұрын
For decoding, but in some tight loops it do not matter if whole code block fit CPU cache and need only be decoded once. If 90% your code sit in couple of hot spots then all penalty from decoding is 10 time smaller and will not matter much. Its only matter for linear code where each instruction is hit once and is long time before program go back to this line.
@ristekostadinov2820
@ristekostadinov2820 24 күн бұрын
The person who wrote the article about risc-v taking piece of x86 forgets to mention that no company who makes risc-v processors use it vanilla, si-five not only design soc architecture they develop extensions (more complex instructions) to solve the problems they need. Maybe tiny microcontrollers use vanilla risc-v that's what makes them cost 20 cents (besides free isa), but for high performance computing they does similar stuff ARM/x86.
@johnyepthomi892
@johnyepthomi892 24 күн бұрын
Wow.. I I like this format. Very engaging and inviting.
@AbeDillon
@AbeDillon 17 күн бұрын
This is a great video. I agree with everything said. One thing I will say is that the RISC-V instruction set is designed quite elegantly for simple hardware implementation. I don't know how it compares to ARM, but you're right that the draw of RISC-V is the $0 licensing fee.
@williamskiba6786
@williamskiba6786 15 күн бұрын
re: time frame when real mode (vs. protected mode) became a thing was with the introduction of the 80286 (Feb 1982). Until then, the Osborne/McGraw-Hill "The 8086 Book" published in 1980 was my primary reference. With the 80286, it was relegated to my pile of "once-relevant but now valuable mainly for historical reasons" books. (Before the interwebz, we all used these mult-hundred page bound things called "books.")
@theondono
@theondono 23 күн бұрын
The guys at the Oxide Computer podcast (ep 1) covered the reason for Little Endian with Jeff Rothschild, and apparently the reason for that choice had to do with earlier cpus having 1-bit length ALUs. This meant that you had to “stream” the bytes into the ALU, and little endian helped simplify the carry logic.
@warpspeedscp
@warpspeedscp 23 күн бұрын
Its nice that LE still ends up being useful when things are scaled up
@cybermuse6917
@cybermuse6917 24 күн бұрын
Suggestion for the reason they call it a die/dye , is when coins were minted you would use a die/dye to cast a particular pattern from the metal being formed. I imagine its merely a reference to the pattern being used.
@OurSpaceshipEarth
@OurSpaceshipEarth 14 күн бұрын
CEPT W MORE UV LIght less heavy hamnmer head bangers:)
@obkf-too
@obkf-too 24 күн бұрын
This takes me back to university when we studied the 8086, the teacher gave us a sheet of the original Assembly instructions and asked us to memories the basic ones, we did some coding in a DOSBOX emulator, the exam was really fun XD (I was at the top of the class). since then we moved to C then Java, ..... and now I forgot most of it writing high level languages.
@CookieBeaker
@CookieBeaker 24 күн бұрын
Loved this! I know you can’t go hyper technical every day but please sprinkle it in every so often! This was highly educational and on top of that reducing misinformation the article (intention or not) shared.
@nezbrun872
@nezbrun872 24 күн бұрын
8008 and 8080 8 bit registers are A, B, C, D, E, H & L, plus 16 bit SP and PC. H & L are commonly combined to make a 16 bit memory pointer. On the 8080, the B & C, D & E as well as the H & L register pairs can be combined, allowing up to three 16 bit registers, typically for memory pointers, especially HL, and to a lesser extent DE and BC due to the non-orthogonal instruction set. Furthermore, on the 8080, you can also do in-place 16 bit increments & decrements on these register pairs, but results don't affect the flags. HL can also be used as a 16 bit accumulator for 16 bit addition.
@X.A.N.A..
@X.A.N.A.. 23 күн бұрын
GameBoy?
@jaysistar2711
@jaysistar2711 23 күн бұрын
@@X.A.N.A.. No, but a few arcade machines. The GameBoy uses a Rico clone of a Zilog Z80, which is related to the 8008, but not compatible, just as the 8086 is related to the 8008, but not compatible. The book called "Nailing Jelly to a Tree" is a good Z80 starting point.
@tconiam
@tconiam 20 күн бұрын
I was looking for this comment! The special purpose uses if the registers and limited addressing modes are the 8080 legacy issue. Compared to the Motorola 68000 series consistent registers and addressing modes makes programming the 68K a dream compared to Intel. Sadly they couldn't keep up in performance and lost out to x86.
@PaulPetersVids
@PaulPetersVids 24 күн бұрын
Wow, maybe a top 5 video on the channel. This was amazing.
@darekmistrz4364
@darekmistrz4364 22 күн бұрын
What are other 4?
@PaulPetersVids
@PaulPetersVids 22 күн бұрын
@@darekmistrz4364 lol idk.
@lesterdarke
@lesterdarke 23 күн бұрын
I would be interested for you to have this guy come on and talk more about the differences between arm vs x86. In particular what actually makes arm more energy efficient - everyone always makes it sound like its down to cisc vs risc, but this video makes it sound like this may not be the case.
@retronoby
@retronoby 23 күн бұрын
There’s perhaps also a similar problem with the newest instructions (newer than the oldest supported CPU for a given software). Most software don’t use them and detecting the supported instructions at runtime isn’t always practical. The legacy instructions are there because software is a very expensive thing to produce and corporations need to at least recoup the costs of old software before they either buy or produce new software. I wish Hackaday would focus more on DIY hacks like they used to, and less on opinion articles. Thank you for the great and in depth explanations, it was very useful.
@dan_loup
@dan_loup 14 күн бұрын
If you kill x86, computers will turn into the same stupid thing that is the android. Imagine, having to crack your computer like a console just to run linux on it, or even a "clean" version of Windows. It's a quite horrifying future.
@Bokto1
@Bokto1 9 күн бұрын
This is a valid concern, but on the positive side, we survived the UEFI transition, and I'd argue it mattered more to the IBM PC ecosystem than the ISA
@dan_loup
@dan_loup 9 күн бұрын
@@Bokto1 It's about the complete package indeed. And UEFI was probably only "survivable" because it had to compete with the ol good MBR in terms of not being a locked down hell.
@truegemuese
@truegemuese 5 күн бұрын
While not knowing that much about the topic, I feel the main reason installing something else on a phone or console is the hardware and the UEFI (or whatever tf they use on those) are meant to be sold as one single piece. I don't think that has a chance of coming to PC and servers as both have to be customizable.
@dan_loup
@dan_loup 5 күн бұрын
@@truegemuese It would be a bit harder to come up with a system that allows it to be locked down to the OS and still allow you to change the hardware, but it's sadly not impossible.
@treelibrarian7618
@treelibrarian7618 24 күн бұрын
I used to be in the "x86 is bad" camp a few years ago, so I think I get where the author is coming from: and it is simply ignorance. I have since learned to love x86 with all it's quirks and features, legacy register names, weird complex string instructions and huge number of vector instructions... It was simply that the sheer scale of the instruction set was intimidating for me when I was first starting with assembler and ARM32 seemed so much more logical, but once I started looking at ARM64 and some of the best features of ARM32 were missing (presumably because they don't fit well with pipelined, superscaler, out-of-order processing) plus the ARM vector instructions seemed to be a complete mess, I started looking seriously at x86_64 and realizing the practical logic of it, and the value of all the work that has gone into analyzing the future needs and wants of the computing world in the effort to provide for them. And on the note of instruction decode bandwidth: while re-encoding the instruction-set could be beneficial to reduce the size of the decoders, all modern x86 CPU's have some kind of decoded µop cache that allows loop code (most of the instructions executed in all apps and os's are in loops) to be repeated without having to re-decode the instructions, so it becomes a non-issue, which could be much more effort to fix than anyone wants.
@mrbigberd
@mrbigberd 21 күн бұрын
x86 is objectively bad both for hardware and for designers/users. x86 decode takes too much energy. 12 ALU instructions make up 89% of all x86 code. Despite this, average x86 instruction length is 4.25 bytes which is LONGER than pretty much any RISC ISA you can name. A study looking at Haswell a few years ago showed that when processing ALU instructions (remember, just 12 of those are 89% of all code), the decoder used over 20% of total core power. ARM's A715 went from 4 to 5 decoders while reducing decoder size by 75% simply by removing ARM32 support and ARM32 is still WAY easier to decode than x86. x86 has memory ordering that is far stricter than necessary which disallows optimizations. x86 instructions are bloated with all kinds of weird exceptions. A register is still special as the accumulator. A+D are used for MUL/DIV results (except 8-bit which use just A). 2-register only means you have extra MOV when you don't want to overwrite a register. 16 GPRs (more like 12) also bloats instructions. Then there's really weird stuff like the parity flag that definitely doesn't do what you'd expect with larger values. Most important though are the designer and dev costs. The ISA bloat flows throughout the entire chip design. This makes component design and testing take way longer and cost way more. If x86 were the only option, we wouldn't be seeing the startups that we see with RISC-V. Likewise, the complexity of x86 means that even most low-level programmers don't understand it past maybe a handful of instructions and the idea of actually understanding the binary format is basically inconceivable. Something like RISC-V is understandable to these programmers. This becomes even more important and obvious with SIMD where compilers suck so much. x86 SIMD is a minefield of some 17 different extensions (if you count AVX-512 as just one extension) and none of them are easy to use. In contrast, RISC-V vectors can be understood by normal coders which means they will be much more willing to use those vectors which should result in faster code for consumers. I'd also note that most universities have switched or are switching to teach RISC-V, so there's reams of programmers entering the market who already know the ISA basics (meanwhile, there's no hope that any school is going to fit x86 into undergrad coursework).
@bionicseaserpent
@bionicseaserpent 17 күн бұрын
@@mrbigberd x86 is backwards compatable since 1981. point invalid.
@thesenamesaretaken
@thesenamesaretaken 17 күн бұрын
​@@bionicseaserpentif your point is that changing x86 would break compatibility then yeah, I don't think anybody would deny that. But the rising adoption of ARM/RISCV does the same anyway (as does changing to newer OS or hardware for that matter).
@treelibrarian7618
@treelibrarian7618 16 күн бұрын
​@@mrbigberd Firstly, thankyou for your response - it makes me happy that someone took the time to read my words, even if we disagree. As I expressed, everything you're saying I could have been saying myself ten years ago, I too was a true believer in the church of RISC since I first used my friends Acorn Archimedes back in 1992 (I presume you know ARM used to mean Acorn RISC Machine...), but that was before I actually started using x86_64+avx512 extensively at an assembler level. Sadly, my large yellow-feathered friend, much of what you say points me to put you in the "biased by inexperience/outdated college tutors opinions" category as well, but I'm willing to entertain the ideas you've presented for a while, for the sake of consideration... Firstly, where did you get that "12 ALU instructions make up 89% of code" statistic, and which instructions are they? Does this mean 89% of ALU instructions (ports 0, 1, 5, 6), or are memory instructions (ports 2, 3, 4, 7, 8, 9) included? are branch instructions included? are we counting encoded instructions of µops? When was the study done, and what codebase? It should be noted that there are many excellent instructions in x86 that are simply inaccessible to the vast majority of programmers due to their operations not having a C/C++/"basically any language that does normal math" invocation. there's +-*/%^~ and that's about it, no room for tzcnt, lzcnt, pdep, pext, and many other more interesting bitwise operations except where the compiler/interpreter can discern that their use might be appropriate, or one must resort to compiler intrinsic functions, which aren't portable and discovering them is even more work than learning asm... Haswell is getting Very old by this point... I can't imagine there are many 4th gen CPU's still keeping up with even the basics of using the internet. My 3rd gen i7 laptop died a year ago, and before that it was reallllly struggling even with youtube... The re-ordering restrictions imposed by strictness of memory ordering is offset by speculative fetch execution, with backtracking if a speculative fetch subsequently is found to overlap with a forthcoming write. Whilst this process is not free, I do not feel I would want to have the risk of a subsequent iteration of a loop reading memory that has yet to be written by a previous iteration that hasn't completed yet, and processing based on incorrect data. The silicon for backtracking is required also for speculative execution of branches, so the cost is minimal. It occurs only occasionally so the performance hit is also minimal, and having everything always work as expected is invaluable, so I don't see your point. I'm not sure what you mean by "weird exceptions", unless you're referring to the 16-bit and 32-bit mode weirdness, but that is practically irrelevant nowadays. while the MUL and DIV instructions do have fixed register assignments, it's usually easy to work around without needing extra mov's, and they provide an option for 128-bit operands not available elsewhere. There's also IMUL which has 2-operand and 2-operand+immediate versions, and although it's intended for signed integer use, the result is identical for unsigned use except for the flags. And then there's MULX that is a 3-operand 64-bit multiply that doesn't set flags. Take your pick. the DIV instruction is quite rarely used, since dividing is one of the slowest operations in a CPU, but getting both the result and remainder in separate registers is very handy when you do (arm on the other hand has no modulo output at all, it has to be done by multiplying up the result of the divide and subtracting - though it also might be possible to get it more directly from a common multiply-based optimization of the divide operation). Of the 16 GPR's 15 can be used for basically anything, except the mul and div mentioned before, and the 2-operand variable shift instruction that can only use cl (low byte of rcx) as the shift amount. There's 3-operand mulx and sarx shrx shlx instructions that provide full register flexibility. The only GPR that's usage is special is the stack pointer, because it affects the stack engine that keeps track of call-return addresses. It's not the upper 8 registers that require the extra byte, it's using 64-bits, since the byte is also needed for the first 8 registers if you're doing 64-bit operations. Parity only checks the low byte of the register, but it's a legacy operation that only really applies to UART serial communication -- so what? I cannot speak to the cost to Intel or AMD of supporting the legacy instruction set, but it doesn't "flow through the whole design". Legacy instructions are decoded into modern µops for execution. There's a bit of extra ROM that has the µops for the legacy operation modes that no-one really uses any more. They look at what is being used, and make that fast, and support the rest by turning it into more of what is being used. But what I can speak to is that the instruction set isn't that hard to understand. The trick is to filter out the noise and focus on one section at a time. The Intel manual doesn't help with that since it is alphabetical rather than sectionized, but the process of sorting through that yourself is actually very helpful in getting an overview of the whole. I personally separate it into the following sections: 1 Ignorables: - legacy 16 and 32-bit instructions that are no longer relevant or even encodable in 64-bit code - x87 fpu instructions. SSE2 is minimum spec for x86_64 and everyone uses it instead - stuff to do with mmu and security levels that's only relevant to writers of OS's and HW drivers 2 Basics: - most basic GPR instructions, mostly used for address calculation and branch decisions 3 Advanced: - fancy bitwise and math related GPR instructions that make some interesting things possible 4 Basic Vector: - read vector from memory, do math, write vector to memory 5 Advanced Vector: - all the different ways of permuting/filtering/gather/scatter etc - fancy bitwise operations (ternlog, multishift, doubleshift, popcnt...), K-masks etc. 6 Optionals: - instructions that are only relevant to a specific but valuable ($B) use-cases, like the aes encode/decode instructions, fp16 instructions and tile instructions. Know they're there, look them up if you find yourself engaging in their use-case. I got fairly fluent in using 2,3,4 & 5 in about 4 months. It's not that hard. With the intel sdm & architectures optimization manual, and Agner Fog's instruction_tables.pdf, I got to the point where I could write code that makes full use of the hardware in about 6 months. I agree that it's a bind having to detect which instructions are available before starting work, but at least the progression has been linear. If you have avx512 you can basically guarantee that you'll have everything that came out before it, FMA, BMI2, BMI, avx2, avx, sse4,3,2,1... whereas risc-V's plug-in approach means that all you can rely on is the most basic base set of instructions, equivalent to group 2 above. The x86 ISA is not finished, it is a work in progress and always will be. Intel have been feeling their way forward into new ways of doing compute with the avx series, providing facilities and seeing what people would like to do with them and iterating on what they discover. With the most recent level of avx512 it seems almost complete (I can imagine a couple of instructions to make it even better but it's possible to do without), you can vectorise almost anything if you are willing to change how you think about it - and this is the biggest issue, that people struggle to change how they think, to let go of old data structures, algorithms, and ways of thinking that just don't translate into the new paradigm. And this is, in my opinion, the biggest issue with the avx512 instruction set: that old programming languages simply don't have the features to properly make use of the full capability of vector processing. Interestingly, it's functional language features like higher order functions that are closest to being able to make use of them, since they hide the details of iterating the dataset and can convert the given lamda to appropriate vector instructions. Expecting a C++ compiler to vectorize anything but the most basic loops is unreasonable. That universities are stopping teaching x86 is not a comment on x86, but on Universities. It's probably more to do with that it's easier to teach than that it's actually valuable to their students futures. RISC-V's simplicity is mostly a result of idealism, whereas x86's complexity is mostly a result of practical application.
@mrbigberd
@mrbigberd 16 күн бұрын
@@treelibrarian7618 oscarlab.github.io/papers/instrpop-systor19.pdf That paper examines all the binaries of the entire Ubuntu 16.xx repo, so it's hardly a tiny study (though it doesn't examine dynamic instructions, looking into that yields similar results). I don't have the haswell study handy, but the fundamental situation isn't dramatically different today. Core size grows MUCH slower than Moore's Law (otherwise we'd still have just one core and it would use all those billions of transistors). You simply cannot make the case that strict ordering is better because of speculation. With weak memory ordering, you can do away with all that speculation circuitry entirely (and still have the option of including it if it boosts performance, but you'll still need less hardware). Put simply, there is nothing to lose and everything to gain from weaker default ordering. I'd also note that speculation is limited and will necessarily introduce false positives. This is especially interesting for JITed code where the JIT can potentially examine the runtime code and skip fences entirely for certain pathways. x86 manuals are several thousand pages with all kinds of weirdness through them if you take a look. It is considered extremely non-controversial to say that the ISA is weird. MUL and DIV are in tension with the A register. For example, let's say I have a loop that goes over an array and multiplies each number. I really want the incrementer to be stored in the Accumulator register and use fast, one-byte increment instructions, but if I do that, I must then add MOV each loop so it doesn't get overwritten by the MUL. I'm quite sure that there's another approach that does some other stuff, but that simply proves the point about weirdness. As to GPRs, A is special as accumulator. B is special as the base pointer for the segment register (more weirdness there). C is special as the counter register for shifts. D is special for MUL/DIV. SP and DP are special for stack instructions. SI/DI are special as index registers. I was being generous saying that 12 registers were actually general purpose. The response will be "but that's old stuff nobody uses", but that just proves the point about weirdness AND the point about inefficiency as all this stuff is guaranteed to the programmer by the ISA and winds up complicating hardware designs. Legacy isn't just about instructions. The PF flag (parity) that I mentioned is a very simple example. It adds another bit to EVERY SINGLE REGISTER. It also adds another pathway through the entire CPU design. This stuff isn't generic. ARM is making this exact argument right now in Qualcomm v ARM. They stated that the Nuvia (now Oryon) core is fundamentally so tailored to the ARM64 ISA that no part of it is non-ARM. Qualcomm made a big push to completely change RISC-V to basically become ARM64 because all that stuff DOES go through the entire pipeline and they'd be left redesigning most of the core otherwise. "Easy to learn" is relative. 4 months is a VERY long time when you consider that I could teach you all of RISC-V GCV in a week or so. I'm also 100% sure that after 4 months you didn't have a clue about all the weird tradeoffs that matter. There's reams of completely unintuitive places where there's multiple ways to do something. In the best case, you must learn which ones are footguns. In the worst case, you must learn when one is better and when another is better instead. Once again, RISC-V solves this issue because there's only ONE way to do common things and you can be completely sure it'll be well-optimized by the CPU. Along those lines, the extension argument about RISC-V doesn't make any sense for a few reasons. The first and most obvious one is profiles. github.com/riscv/riscv-profiles/blob/main/profiles.adoc If you implement RVA22S64, I know EXACTLY which extensions to target. This is the same as targeting i686 vs amd64 except the RISC-V stuff is better specified and actually gets decent updates. You mention "AVX-512 implies FMA", but does it imply FMA3 or FMA4? Which AVX-512 extensions? NOTHING supports all of them. Even if we disregard profiles, checking for extension support is easy. It's baked into your compiler and is already done in EVERY major ISA you can name. x86 is mostly finished if for no other reason than its running out of encoding space that uses less than 6-7 bytes making new instructions massively inefficient. RISC-V was beating x86 in code density 9 years ago with C instructions making the average instruction just 3 bytes long. Since then, new instructions like bit manipulation have further increased the code density lead. I-Caches aren't really growing in size and x86 instructions can't really improve. This is another major advantage that proves the "every ISA is the same" argument to be false. en.wikipedia.org/wiki/Turing_tarpit All Turing Machines are equivalent in capabilities, but are self-evidently not the same in reality. Ascalon is an 8-wide architecture that Jim Keller expects to compete with Zen5. The entire Tenstorrent company is 280-something employees. Many of those employees aren't engineers and many of the engineers are working on smaller RISC-V cores or their AI accelerators. In reality, it's more like 100 engineers and QA at Tenstorrent are making a chip that requires thousands of engineers at AMD or Intel. This isn't just being a startup. It's because RISC-V is designed for this purpose. We know all about headspace in the programming world. You can only hold a handful of things in your mind at one time. The more complexity, the more bugs and the slower the development progresses. This is why the first 80% happens so much more quickly than the last 20%. We reached the limits of assembly very quickly and moved on to FORTRAN then C. We hit the limits of those and moved to the OOP craze with languages like Java. Even that wasn't high enough, so we saw Python, Ruby, JS, and others bloom. Those are reaching their limits and we're now seeing a push into functional paradigms that trade a bit more efficiency for the ability to write even larger projects in a sane amount of time. Simplicity in the RISC-V ISA wasn't just about teaching. There's a lot of research going on about how to increase IPC of chips because we've maxed out realistic clockspeeds in silicon (even theoretical limits of silicon are around 10GHz IIRC). You MUST go a lot wider, but going wider gets you into crosscutting concerns which immediately trash your chip development pace. All the decisions to reduce the interface with the programmer also helps reduce the need for these speedbumps. Even more importantly, the ISA was explicitly designed to reduce side effects instructions and by proxy, keep the ISA separate from the uarch as much as possible so decisions you make in the uarch don't bleed over to the consumer. RISC-V is interesting in that it is both easier than x86 and also practical. Qualcomm shipped 1B RISC-V cores in 2023 and they aren't even the biggest RISC-V company and most of their chips are still ARM. Nvidia and Western Digital have shipped billions more. RISC-V has basically done a complete takeover of FPGAs because you not only get free high-quality cores, but an entire high-quality software ecosystem to match. This is all without mentioning China or India which are making absolutely massive investments into the ISA as are companies like Google. Even Intel and AMD paid for seats at the consortium table with Intel manufacturing a SiFive core combined with all Intel's proprietary IP in an attempt to get more non-Intel chips made at their fabs. I fully expect to see either Intel or AMD update their core to allow RISC-V execution (this is easier than going the other way courtesy of RISC-V trying to isolate the ISA from the uarch). I'd encourage you to take some time to learn RISC-V then compare. I think you'll be shocked at how much easier it is and that 4 months would render you an expert rather than simply knowing the basics.
@maximebeaudoin4013
@maximebeaudoin4013 8 күн бұрын
Amazing content! Please have him back on. His explainations are great!
@Waitwhat469
@Waitwhat469 23 күн бұрын
28:30 I feel like that is a great example, but also in the actual laundry world, my rebuttal is that a special device that washes and one that dries with each taking up the same space as Washer/Dryer it would be 50% slower in total throughput. So, all I am saying is that we need to increase our WD core counts to dual core so we can finally wash/dry at a 50% increase!
@Dungeonseeker1uk
@Dungeonseeker1uk 24 күн бұрын
Fun Fact: Intel used Pentium over 586 in a stupid attempt to stop the clone market. They couldn't trademark a number so they used Pentium & added the MMX extension which was also trademarked. OFC AMD and VIA just used 586, they didn't get MMX till much later when it was basically redundant, AMD created 3D Now as a response when they adopted the K6 moniker (K6 was 686).
@kahnzo
@kahnzo 24 күн бұрын
I had completely forgotten that fact, but that's right!
@craigpeacock1903
@craigpeacock1903 23 күн бұрын
Ah, the K6... the first cpu I bought myself was the K6-2 600...
@betag24cn
@betag24cn 22 күн бұрын
afik, amd still has those instructionsets from k6 era
@OpenGL4ever
@OpenGL4ever 22 күн бұрын
@@betag24cn No, 3dnow! is obsolete and got removed from newer AMD CPUs. There was an agreement about the SSE family of instruction sets, thus 3dnow! was no more needed and it was incompatible with SSE.
@betag24cn
@betag24cn 22 күн бұрын
@@OpenGL4ever now that youmention it, i have not checjed that in many years, probably you are right
@jaysistar2711
@jaysistar2711 24 күн бұрын
In the original design, AX (16-bit) is AH (8-bit) and AL (8-bit). That's the `A`ccumulator. AX has overflow into the `D`ata register (DX, which also has DH and DL) for MUL and DIV instructions. There is a `B`ase index register and a `C`ount register. The other 4 are Stack Pointer (SP), Stack Base Pointer (BP), Source Index (SI), and Destination Index (DI). Some instructions only work with certain registers in Assembly (very CISC). EAX is 32-bit. In x86_64 RAX and other Rxx registers are 64-bit, and those original 8 register have general functionality.
@thewhitefalcon8539
@thewhitefalcon8539 24 күн бұрын
64 is RAX
@plaintext7288
@plaintext7288 24 күн бұрын
Assembly level debugging and optimization are black magic
@virno69420
@virno69420 24 күн бұрын
Most calling convention uses RAX for return value, the naming conventions are just legacy, it's not actually used as an accumulator. RBX is not the base pointer, that's RBP, there is no RSB register. What instructions only work with certain registers? This just isn't true I believe. Atleast for the general purpose 16, maybe SSE, AVX, or XMM registers are instruction specific idk we haven't got that far in my class yet.
@teodorkostov365
@teodorkostov365 23 күн бұрын
@@virno69420 base in rbx doesnt mean stack base, it means memory base which is what it originally was intended for. and some instructions do use the registers with their original intention, like div, rep, movs etc...
@jaysistar2711
@jaysistar2711 23 күн бұрын
@@thewhitefalcon8539 I edited to clear some things up, and fixed that typo.
@redpillsatori3020
@redpillsatori3020 23 күн бұрын
38:30 - Chat Gippity on micro-operation ports: In the context of computer architecture, particularly with regard to superscalar processors and instruction execution, a micro-operation port, often denoted with labels like 1*, refers to a specific functional unit within the processor's execution pipeline. Each micro-operation port represents a pathway or channel through which certain types of instructions or operations can flow within the processor. These ports are associated with different execution units or functional blocks within the processor, such as arithmetic logic units (ALUs), floating-point units (FPUs), load-store units, and so on. The 1* notation typically signifies the primary execution port or primary ALU port in a superscalar processor. Superscalar processors are capable of executing multiple instructions simultaneously by dispatching them to different execution units or ports, exploiting instruction-level parallelism. When an instruction is decoded and dispatched for execution, it may be directed to a specific micro-operation port based on the type of operation it performs and the availability of resources within the processor. For example, arithmetic operations might be directed to the ALU port, while memory operations might be directed to the load-store unit port. The presence of multiple micro-operation ports, including 1* ports, allows the processor to execute multiple instructions concurrently, improving overall throughput and performance. In the context of the discussion about Apple Silicon M-series chips and x86 processors, micro-operation ports play a role in the design and architecture of the respective processors. The number and capabilities of micro-operation ports are key factors in determining the processor's ability to execute instructions efficiently and exploit parallelism effectively.
@sjzara
@sjzara 16 күн бұрын
Great discussion. The problem as I understand is that the x86 decoding demands lead to more complex decoding (especially with variable length instructions) and that means more active circuitry and that means more power and so more heat. It’s not about the instruction set itself, and it’s not about pipelining or out-of-order or speculative execution or scaling. It’s about power use and heat generation.
@Maescool
@Maescool 24 күн бұрын
Looking forward on you doing the Ben Eater Project
@fernansd
@fernansd 24 күн бұрын
It was amazing to have someone with such deep knowledge. Definately learned useful things I wouldn't have known otherwise. Keep it up with the great guests!
@justadude1495
@justadude1495 19 күн бұрын
I have no previous knowledge on CPUs nor am I normally interested in them but THIS VIDEO was, as said in the beginning of the video, A THING (what is this thing? I don't know what you call this thing but it was amazing, that's what it was) :D There is no explanation why I watched it and I probably only took away 10% from what others might but it was still such a productive and vivid exchange that I had to watch it!
@Oler-yx7xj
@Oler-yx7xj 24 күн бұрын
I just have been poking around with NES assembly stuff (in Vim btw) and Prime releases a video on a low-level topic. Nice!
@ttcc5273
@ttcc5273 24 күн бұрын
A die is the pattern used to cast the final product… from shop class. Metal working… “Tool and Die Corp.” Edit “Dies are only those tools that functionally change the shape of the metal. Dies are typically the female components of a larger tool or press.” Edit 2: a die in chip making is the “master.” Like vinyl records were pressed from a master, chips are etched from an image of the die.
@demmidemmi
@demmidemmi 24 күн бұрын
Yes and they used to make them using lithography before they went with photo-lithography so then there was another literal die.
@infastin3795
@infastin3795 24 күн бұрын
Even Intel themselves tried to get rid of that legacy but didn't succeed.
@radfordmcawesome7947
@radfordmcawesome7947 24 күн бұрын
are you talking about itanium or something else?
@infastin3795
@infastin3795 24 күн бұрын
@@radfordmcawesome7947 yes. Intel has also recently proposed a new X86S architecture.
@betag24cn
@betag24cn 22 күн бұрын
it was mostly microsofts fault really, microsoft windows is a legacy of code, nothing modern in it, not since windows 2000 basically with the arm windows project that should deliver some devices this year, perhaps that finaly happens, without intel it seems
@colton2432
@colton2432 20 күн бұрын
I feel like the articles argument at 54:35 is that if you have a smaller set of instructions it is easier to optimize the out of order processing that is done by the scheduler. However as Casey mentioned right before hand, these schedulers need to be incredibly performant and therefore the algorithms used to identify dependencies are no where nearly as complicated as the author feels they should be.
@SvenHeidemann-uo2yl
@SvenHeidemann-uo2yl 23 күн бұрын
I Programm a gb color game in asm, and watched computers explained risk-5 Videos. Didnt expect that it would enable me to follow this cpu dive. Very informative, thanks.
@CrassSpektakel
@CrassSpektakel 21 күн бұрын
Today there are no RISC or CISC CPUs anymore outside the lowest performance class. All Middle- to Upper Class CPUs are --- Microcode. Which usually means "Peep-Hole-Translating Plattform Code into a VLIW internal Microcode inside the CPU". For the Centaur C6 you could actually use x86 and MIPS code by simply exchanging the Translation Layer inside the CPU which itself was Microcode. Once people suggested (Itanium???) to simply use the internal VLIW code outside the CPU without using the layer of Plattform code but VLIW code is HUGE. A simple "increase register A1" can easily use 256Bit. It doesn't make sense, uses tons of RAM and Cache and Bandwidth on the bus. So if you have to use a compact intermediate code... why not use the one which is already widely used anyway? Therefore it basically doesn't matter if your CPU is ARM, RISC-V or AMD64.
@quas-r
@quas-r 24 күн бұрын
Two phenomenal teachers talking about what they passionately love and explaining it to us is pure pure gold.
@spencercurtis86
@spencercurtis86 7 күн бұрын
I had no clue what any of this was. I also watched the entirety and learned stuff without learning stuff. This was incredibly entertaining and digestible. Great video!!
@raizdesamauma8607
@raizdesamauma8607 24 күн бұрын
Those are really great explanations on such complex low level stuff! I've learned a lot watching this, thank you Casey! Love this kind of video, Prime, keep it up
@darlokt51
@darlokt51 24 күн бұрын
Great Talk! The article is kinda brain dead to be true. The Chips and Cheese article way better captures it, and thanks to ARM lighting a fire under x86, x86S and AVX10 are coming. A bit to the RISC CISC architecture part, in general architectures are converging, the idea of RISC and CISC from a hardware perspective is truly stupid. RISC has turned a lot CISC and CISC learned from RISC. For the AI folks, you can see an ISA as tokenization for your chips. The frontend and backend are nowadays mostly decoupled and a CISC-like architecture is generally better for branch prediction and prefetching as such ARM has become with every version more CISC and x86 now has to remove their legacy 8,16, etc support to make the decoder fresh and new again, which is coming.
@UltimatePerfection
@UltimatePerfection 24 күн бұрын
Unfortunately that's not going to happen because of all the business software and games that requires x86. So any "replacement" would need to be backwards-compatible with x86, at which point it would be just x86 with extra steps, so why bother?
@jaysistar2711
@jaysistar2711 23 күн бұрын
I don't think so. The Apple Macintosh stated on the M68000, then switched to PowerPC with support (an OS built-in emulator) for M68000 apps, which then switched to x86, which still had both PowerPC and M68000 support, then they switched to x86_64, which you may think was an easy jump, but probably required quite a bit of code page management in the OS, then they switched to ARM. Although they've dropped some of that emulation along the way, you can still get it, and things still can work as they always have. I think Windows and Linux could do the same. with QEMU in Docker, containers already do some emulation with buildx, so it's not too far of a jump.
@UltimatePerfection
@UltimatePerfection 22 күн бұрын
@@jaysistar2711 I assure you that gamers unwilling to take the performance hit associated with the emulation will stop the switch dead in its track. With Mac the thing is that a) it's a single machine by a single company so people literally have no choice but to switch if they want to use newest and best (debatable...) products, not thousands machines by thousands companies, some even being DIY affair, and b) Mac never really did games very well. Try running Cyberpunk 2077 on a mac. You can't. Or at least, you can't run it well.
@dragonproductions236
@dragonproductions236 22 күн бұрын
@@jaysistar2711 The only reason why apple can do arm is because it's a horrible monopoly and people complained about the very real issues the switch caused. You're basically saying "The company store can switch and deprecate currency every year, why can't the government?" The answer is that it only exists due to artificial reasons ( You being horribly in debt to the coal company or either being too dumb to leave apple or unable to due to them grabbing your work software in their malformed tentacles).
@jaysistar2711
@jaysistar2711 21 күн бұрын
@@UltimatePerfection I agree that Apple products are overpriced versions of inferior hardware, but I disagree about a performance hit being required to exist if the PC switches to another CPU because I don't know of any native version of the x86_64 at this point; the whole x86_64 platform is emulated on both AMD and Intel chips. Also, REDengine (Cyberpunk 2077) is a cross platform engine. It can run on the Switch (ARM), but doesn't because GPU performance and capability (ray tracing, API support, etc.) is more important for games than what CPU ISA they use. I've ported hundreds of games commercially, and I can tell you that Apple's mistake was in making Metal the only option moving forward. That's just more work, and, as a game engine dev, we're an overworked bunch as it is.
@UltimatePerfection
@UltimatePerfection 21 күн бұрын
@@jaysistar2711 Yeah... x86_64 being emulated on an x86_64 processor... Nice try, but no dice.
@InterDimensionalOwl
@InterDimensionalOwl 21 күн бұрын
Thanks for taking the time to make and share.
@rookandpawn
@rookandpawn 17 күн бұрын
I am astonished how clear and concise your guest speaker was today. As someone who grew up with assembly and had to learn CISC and also worked on VGA assemby graphics, this is outstanding content ❤❤❤
@ThePlayerOfGames
@ThePlayerOfGames 24 күн бұрын
x86_64-v2 as standard when?
@LouisDuran
@LouisDuran 24 күн бұрын
soon
@williamhinz9614
@williamhinz9614 24 күн бұрын
X86_64ex*
@deth3021
@deth3021 24 күн бұрын
Unlikely, intel has tried twice to change the x86 platform. Itanium, and pentium 4. Both failed, so anything they do will have to be backwards compatible.
@Kaznovx
@Kaznovx 24 күн бұрын
In 2020. However, x86_64-v2 is a codename for CPUs with at least SSE4_2 and SSSE3. (which was supported already by 2008 CPUs). For comparison, x86_64-v3 means roughly support for AVX1 and AVX2 (CPUs from 2014 and later) x86_64-v4 is with support for AVX512, but this one is a can of worms
@deth3021
@deth3021 24 күн бұрын
@Kaznovx OK you meant that. But that is an unrelated topic, as it is about common baselines.
@swdev245
@swdev245 24 күн бұрын
I was surprised that Prime seemed to be ignorant about the lower level (like virtual memory) and the x86 legacy stuff. Isn't at least the former part of every computer science degree?
@rusi6219
@rusi6219 24 күн бұрын
He could have forgot it happens
@sheikhshakilakhtar1865
@sheikhshakilakhtar1865 24 күн бұрын
Not in every institute. Nowadays, lower level stuff are studied more by electrical engineers than computer scientists. Also, people forget.
@chupasaurus
@chupasaurus 24 күн бұрын
IIRC virtual memory explanation isn't a part of courses required for BSc in CS🙃
@monsterhunter445
@monsterhunter445 24 күн бұрын
​@@bbourbakiyou don't need to be a low level guy unless embedded and even then there is still abstraction
@sheikhshakilakhtar1865
@sheikhshakilakhtar1865 24 күн бұрын
@@bbourbaki Do you know what fixed parameter tractability is?
@chaitanyakumar3809
@chaitanyakumar3809 23 күн бұрын
Loved this discussion! If others are keen on more in-depth related discussions, I would highly recommend the following two articles by David Chinall - There’s No Such Thing as a General-purpose Processor: And the belief in such a device is harmful (2014) - How to Design an ISA (2024) "How to Design an ISA" goes into the impact of having a relatively complex instruction set like that of x86, the resulting complexity in decoder logic etc, why x86 doesn't excel in the mobile market while ARM does etc. I think it would be peak if Prime and Casey could cover one of these in a stream
@xarisfil58
@xarisfil58 24 күн бұрын
Program Counter not mentioned got me sad. But the thing were micro ops and instruction pipe-lining mentioned is fine. RISCV and ARM even have thumb instructions 16bit. IF Tomasulo algorithm was mentioned it would be awesome (out of order). VILW,Superscaller are on the opposite side as mentioned
@davidspagnolo4870
@davidspagnolo4870 24 күн бұрын
Yeah, and we all need to use metric too.
@Eren_Yeager_is_the_GOAT
@Eren_Yeager_is_the_GOAT 24 күн бұрын
i hate it that i only have 2 options when i want to buy an x86 CPU
@joseoncrack
@joseoncrack 24 күн бұрын
Yes but it's better than no option at all.🙃
@betag24cn
@betag24cn 22 күн бұрын
when we had via, it wasnt a option reallyrigth now you have two, but i wouldnt touch anything from intel
@betag24cn
@betag24cn 22 күн бұрын
​@@joseoncrackwell, in apple, there is no option and people is happy on android smartphones you dont choose, you choose the whole device it is nice to have options but sometimes things do work like that
@boptillyouflop
@boptillyouflop 22 күн бұрын
Only 2 options for x86 CPUs is down to US government being feckless and derelict in its duty to break down monopolies (well, duopolies here but you get the gist). They let Intel/AMD use the x86 patent pool to completely own x86, to the detriment of everyone else. Citizens United lets big companies bribe politicians and this is how we got here.
@joseoncrack
@joseoncrack 22 күн бұрын
@@betag24cn It's good to have options. But yes, obsessing over it doesn't change a thing either: like, here, many people seem to absolutely despise the x86 architecture (often without even really knowing what it is now), but it just works well enough (at least on the desktop and server markets) and you'll be hard-pressed to find anything with the same performance at this price range. In terms of performance, one exception (really an exception for now) are the Ampera CPUs (128-core ARM-based), and these are still niche products, very expensive. The day there are alternatives to x86 for the same kind of applications, with as much performance and for the same price, but with more "modern" architectures, people will eventually switch en masse. But that's just not the case yet. It is for mobile devices, and has been for years, though, which is a market Intel has always struggled with.
@102728
@102728 20 күн бұрын
I'm a complete noob but everything I see about low level stuff, I immediately fall in love with it. Loved Casey's passion and knowledge, do get him back on for some more collabs!
@CrassSpektakel
@CrassSpektakel 20 күн бұрын
@cmuratori is is perfectly right but I would like to extend: The only bottleneck are a couple of x86 code blocks not working well for parallelism. But that is a compiler problem most of the time. If your compiler is smart it will avoid putting such blocks of dependend code out. Oh, and that is exactly what very early RISC and VLIW CPUs did all the time back in the 1980/1990: Depend on the compiler to deconflict your code. And now the sucker punch: RISC utterly depends on a smart compiler. CISC and superscalar Microcode... not so much. Having sub-par compilers were the main reason Intels Itanium never performed as promised (Itanium is an odd mix of RISC and VLIW, just like the much older AMD29000 and is pretty much the worst case about "needing a very smart compiler").
@user-eg6nq7qt8c
@user-eg6nq7qt8c 24 күн бұрын
"I'm too stupid to understand, let's move on". I relate
@supdawg7811
@supdawg7811 24 күн бұрын
Lil Endian is what they know me as in the streets
@adrianmoisa2281
@adrianmoisa2281 22 күн бұрын
Top lecture. Where can I find more of his videos about CPUs? Cheers!
@MasterHigure
@MasterHigure 23 күн бұрын
I would personally love a deep-dive into the original 8086 instruction and microinsteuction set. The only thing I know about that part of computer science is what I learned from watching Ben Eater construct an 8-bit breadboard computer, and it has been a while. Also, the highest authority on the subject, namely Chat-jippity, said that a "die" is called so because it's a small, usually square, piece of semiconductor material on which electronic circuits are fabricated. The term "die" originates from the singular form of "dice," reflecting the small, discrete nature of these semiconductor pieces. Thus it is LLM'ed. Thus it shall be.
They got away with this??
1:21:04
ThePrimeTime
Рет қаралды 1,1 МЛН
Reflections On A Decade Of Coding | Prime Reacts
28:59
ThePrimeTime
Рет қаралды 104 М.
Balloon Pop Racing Is INTENSE!!!
01:00
A4
Рет қаралды 12 МЛН
C Skill Issues -  White House Is Wrong And Here's Why
47:52
ThePrimeTime
Рет қаралды 215 М.
Unpatchable Apple Exploit Found!!!
30:44
ThePrimeTime
Рет қаралды 277 М.
YOU ARE NOT DUMB - Impostor Syndrome in Software Engineering
9:12
The Most Amazing Software Ever Created
20:02
ThePrimeTime
Рет қаралды 268 М.
Single Language Productivity Is Fake
41:06
ThePrimeTime
Рет қаралды 150 М.
Scams In Software Engineering
31:44
ThePrimeTime
Рет қаралды 431 М.
Apple's Silicon Magic Is Over!
17:33
Snazzy Labs
Рет қаралды 786 М.
Why Linus Torvalds Insults People | Prime Reacts
17:53
ThePrimeTime
Рет қаралды 304 М.
The Slow Death of Windows
17:22
TechAltar
Рет қаралды 1 МЛН
Is 2024 The Year Of Zig ?
48:20
ThePrimeTime
Рет қаралды 87 М.
Phone charger explosion
0:43
_vector_
Рет қаралды 41 МЛН
M4 iPad Pro Impressions: Well This is Awkward
12:51
Marques Brownlee
Рет қаралды 3,7 МЛН
Start from 0 at any point on the T1 Digital Tape Measure
0:14
REEKON Tools
Рет қаралды 33 МЛН