Пікірлер
@roymerkel8008
@roymerkel8008 7 сағат бұрын
So, watching the series so far, nice work on the use of derivatives, I didn't consider that when trying to noodle out the problem myself. Question I have is what we can do about those square roots and squares as those might be pretty slow themselves. I'm at start, so I'm hoping this gets addressed in this video, though that may have to wait for part 4.
@roymerkel8008
@roymerkel8008 7 сағат бұрын
ok, this week appears to be about the range of the constants (though knowing the sign of these, I think, is going to open up some optimizations later, ditto with knowing the ranges on the standard and skew elipses you also did this episode -- I'm just not sure what yet.)
@pcretroprogrammer2656
@pcretroprogrammer2656 6 сағат бұрын
We will definitely look at some fast code for the square roots in a later video. They are relatively slow, but fortunately we only need 8 of them per ellipse, which in the grand scheme of things doesn't turn out to be much.
@MonochromeWench
@MonochromeWench 22 сағат бұрын
Nice of IBM to let you disable DRAM refresh and do things like this. Can it be done on other contemporary systems? Once per frame refresh would be a big improvement over anytime during a frame refresh that it does by default. Even if doing it during frames you can cycle count your custom dram refresh and exit it if you need to do something at that time. It would be tricky to time your cga register writes using dram refresh code but it might be doable if there was no other choice but the dram chips seem very forgiving so it is probably more trouble than it's worth. Testing temperature dependence might be an idea, blast the chips with very warm/hot air and see if they decay faster. A hair dryer on low heat would probably be good enough and shouldn't get hot enough to melt things
@pcretroprogrammer2656
@pcretroprogrammer2656 16 сағат бұрын
I believe it is possible to change DRAM refresh on more recent systems. But of course it is done differently. I have thought about timing the DRAM refresh at the normal rate (or close to it), and people have definitely done that. We may end up needing to do something similar for some effects because the PIC cannot be exactly put in sync with the CGA card due to the fact that they start up at random times to one another and the dividers they use have a common factor in the number of cycles per division that they use. The hot air gun is a nice idea which I must admit I didn't think of.
@fradd182
@fradd182 2 күн бұрын
Nice. Any chance of using this method for drawing a partial ellipse (elliptic arc between 2 angles)?
@pcretroprogrammer2656
@pcretroprogrammer2656 Күн бұрын
Yeah I haven't figured out a nice way to do that. Obviously you'd have to drop the reflections, and then it's probably just a bunch of cases to decide which arcs to draw and which to partially draw and which to not draw. Then do some computations for the irregular end points. It's doable, but I see no clean way.
@fradd182
@fradd182 15 сағат бұрын
@@pcretroprogrammer2656 Yeah, the reflections are definitely out. I do have few ideas for the algorithm, but im sure somebody already did it.
@danielkowalski7527
@danielkowalski7527 2 күн бұрын
so how often? ^^
@andrewdunbar828
@andrewdunbar828 6 күн бұрын
b is related to both skew and rotation
@pcretroprogrammer2656
@pcretroprogrammer2656 5 күн бұрын
That seems about right. Apart from F, they all have different behaviours in different situations. I only showed some examples. But you get an idea pretty quickly from playing around with them.
@andrewdunbar828
@andrewdunbar828 5 күн бұрын
@@pcretroprogrammer2656 It reminds me of how when you're trying to work out 2D rotation, at least back in the 8-bit days when there was bugger all computer graphics resources, it was easy to skew on the X or Y axis and it was apparent that getting the two skew ratios might lead to rotation, but the answer turned out to be sin and cos which do provide that missing magic ratio where each affects the angle parameter in a different way and you need both to get what you want. Something reminded me of this again when I was trying to learn affine/perspective mappings where one parameter seemed to relate to two things in the results, but I never managed to fully grok that stuff (-:
@pcretroprogrammer2656
@pcretroprogrammer2656 7 күн бұрын
Watch out for a typo on my slides. I have - Ex in one place that should obviously be - Ey.
@ChrisJackson-js8rd
@ChrisJackson-js8rd 8 күн бұрын
careful in these tests that length between refreshes and length the system has been running (therefore the temp) don't correlate in any systematic way not that it would change the results. but if you did want to quantify the time to corruption more precisely you would have to incorporate both temp and time between refreshes into your analysis very nice video. i loved the systematic and logical approach you took to the question :)
@pcretroprogrammer2656
@pcretroprogrammer2656 8 күн бұрын
Yes, I think one would have to characterise the variation with temperature before attempting to understand the time to corruption.
@georgegonzalez2476
@georgegonzalez2476 14 күн бұрын
The refresh doesn't touch every memory location. It only has to touch the row or column addresses.
@DerIchBinDa
@DerIchBinDa 13 күн бұрын
Only the row, column does not play any role in refresh.
@Torbjorn.Lindgren
@Torbjorn.Lindgren 14 күн бұрын
It's my understanding that the refresh period is bound not by how long it takes to decay in isolation (what you test), but how long it takes to decay when memory rows around it are accessed. But I don't know how big of an effect this is on chips this old, it can be quite pronounced on newer chips but they're many orders of magnitude denser. But you may find that as you use the memory the retention time creeps down, so a healthy safety margin might be in order. But even with modern (ultra-dense) chips it's often possible to get away with setting refresh rates way below the official numbers in practice, "overclockers" often maxing out the register allocated for this in the memory controller which translates to something like 5-20 times longer than the official refresh rate which is IIRC spec'd at 85C. At least for some memory it's documented to require refresh four times as often at 125C ("military") over 85C ("commercial") so there's definitely a temperature component - perhapos a doubling every 20C? I've never seen any manufacturer project this to lower temperatures but it sounds possible that it might hold - it's known you can store memory content almost indefinitely at cryogenic temperatures. IE, reading a row ALSO depletes nearby rows "a bit" and writes also "leak" somewhat into nearby rows - this is the basic idea behind the RowHammer attacks on recent DDR memory (unfortunately with DDR4/5 things are getting so cramped NO reasonable refresh-interval might be safe and other remedial actions has to be taken). I do know that how long memory lasts without refreshing can be extremely variable depending on brand and model of memory chips, there's examples of 8-bit micros where some will survive a few seconds while others doesn't survive a brief flick (200ms?) of the power switch. This "accelerated decay" can be hard to profile since rows may not be layed out how you think, often both row and address lines are routed based on "easiest" path rather than A0/A1/... since it doesn't matter. For the memory you show I guess you could try hammer the "next" physical row by trying incrementing A8 to A15 (given the 8+8 setup) and try reads and write (inverse bit value?) - that's "only" 16 (8 address lines, read/write) sampling tests (or 18 for a 256kbit chip) but as you mention it's already slow and this would work as a multiplier, but you can use your existing results as a guide to narrow it down to find out if it has an appreciatable effect or not on your XT's memory.
@pcretroprogrammer2656
@pcretroprogrammer2656 14 күн бұрын
Interesting. I forgot to mention in the video that someone had suggested after watching the previous video that it may depend on things like how many 1's are in the row or how many 1's are nearby and so on. Some experiments along these lines would be interesting, though as you point out, not necessarily indicative of how things go on the whole.
@Vegemeister1
@Vegemeister1 10 күн бұрын
Hah, yeah, this video got recommended to me and I ran down to the comments to yell about Rowhammer! Targeted Row Refresh! The KZfaq algorithm is magic sometimes. There's also a part in recent DRAM specs (I don't remember if it came in DDR 4 or DDR5) where there's a temperature threshold above which refresh frequency is doubled.
@Vegemeister1
@Vegemeister1 10 күн бұрын
See also www.csl.cornell.edu/~martinez/doc/isca13-mukundan.pdf
@JJFX-
@JJFX- 6 күн бұрын
Yeah one of the 'easiest' ways to improve memory performance is still maxing out the refresh interval (tREFI) and/or speeding up the cycle times (tRFC) as much as possible. On good DDR5 chips we rarely see issues with the interval cranked up to ~16 microseconds or so and DDR4 often handled it too. Interestingly, I recall having to be more careful in the DDR3 days. This can be one of the scarier changes though because errors don't always show up in testing as you'd imagine. You could test it for a week straight but if the environment warms up enough a few months later you could end up with problems. Memory has really become one of the final frontiers of traditional overclocking now that CPUs and GPUs are pushed so close to the edge out of the box. Even profiled memory kits are often so badly tuned that squeezing another 20-30% performance out of cheap kits is still fairly common. I expect this to change and be less relevant as dynamic refresh features are actually implemented and we see CPU cache sizes increase.
@sandman9601
@sandman9601 15 күн бұрын
We used to do a fun trick in our lab back around the DDR2 days. Write a pattern to memory up in an area DOS doesn't use. Then power off the system, remove the DIMM, pass it around if you'd like, and put it back in. Depending on how long you took, you could see various amounts of the 1's in the pattern decay to 0's.
@Heckatomba
@Heckatomba 14 күн бұрын
Ever tried to use cold spray? Not my idea, back in 2009 security released a paper where they used cold spray to extend the time before the data in DRAM decayed. (2009, Cold boot attack)
@sandman9601
@sandman9601 14 күн бұрын
@@Heckatomba We did try that, and it worked. Cold definitely reduces leakage.
@JJFX-
@JJFX- 6 күн бұрын
Out of curiosity, did you notice a consistent pattern for which bits seemed to degrade the fastest?
@sandman9601
@sandman9601 6 күн бұрын
@@JJFX- Didn't really look, but nothing stood out.
@volodumurkalunyak4651
@volodumurkalunyak4651 16 күн бұрын
Setting tREFI to 262k clock cycles on DDR5 is way more interesting. 2Gbit/32banks/8192bytes per row (maximum)=minimum of 8192rows. Refresh - takes at most 1 row in each bank at a time -> 262k*8192rows/(5200MT/s)=0,413s for whole memory to refresh.
@Roxor128
@Roxor128 18 күн бұрын
Downside of trying to include error-correction back in the 1980s is the number of extra bits you need. For parity, it's just one extra bit, and therefore one extra chip. If you want to protect 8 bits of data, you need 13 bits for ECC, needing 62.5% more chips. Really making use of ECC came later when memory was being accessed in larger blocks from chips that would read out multiple bits at a time. With 9 chips of 8 bits each, you can do a 72-bit code with 64 bits of data, which can correct one error and detect two. Though this isn't using the code to its full capacity, and neither was the 8-bit example from earlier. That 72-bit code is just a truncated version of a 128-bit one, but that wouldn't have a nice power-of-two number of data bits in it (120). The 13-bit code is truncated from a 16-bit one, which would have 11 data bits. It took me a while between finding out about Hamming Codes and figuring out how the 72/64 one used for ECC memory would actually work. It's basically just calculating the error-correction bits with bits 0-71 normally, and acting as if bits 72-127 are always zero, and as that last range of bits would all be data bits, it doesn't need to bother storing any of them.
@pcretroprogrammer2656
@pcretroprogrammer2656 18 күн бұрын
Ah, very interesting. That's a nice trick. I've used ECC and once a long time ago read about how the idea basically worked, but never looked at it in that much detail.
@Roxor128
@Roxor128 18 күн бұрын
@@pcretroprogrammer2656 What got my head around it was a combination of 3Blue1Brown's videos about it, plus a lot of fiddling around in Logisim Evolution implementing it. I went as far as 16 data bits with a 24-bit communications channel (truncated from the 32-bit code, with 2 bits unused), but just finding that you can truncate a code and have it still work was what finally got my head around things.
@IExSet
@IExSet 19 күн бұрын
Wow, you uncover super topics ! I am not tired to like your videos !
@josephlunderville3195
@josephlunderville3195 20 күн бұрын
After all my speculation it's incredibly gratifying to see the subsequent thorough testing. I'm happy you felt compelled to go down the rabbithole and thanks for taking us with you!
@pvc988
@pvc988 20 күн бұрын
I don't know about DRAM but when I was working with SDR SDRAM on FPGA, memory contents easily survived reconfigurations which take couple of seconds. There are no refresh cycles nor memory accesses during that time. Every pin is in HiZ state.
@adriansdigitalbasement
@adriansdigitalbasement 21 күн бұрын
I'll be testing: -) You don't need to refresh the whole chip by the way. Just the part you're using. So no issues only refreshing 64k on a 256k bank. In fact you can use 256kbit chips in place of 64kbit chips and addresses line A8 are just not used.
@adriansdigitalbasement
@adriansdigitalbasement 21 күн бұрын
Also for visual fun why not copy the RAM under test to the VGA framebuffer in 640*200 mode so you can the pattern it days to.
@pcretroprogrammer2656
@pcretroprogrammer2656 20 күн бұрын
I think the issue with only refreshing 64k is the way you'd do that. Basically you'd just refresh 128 rows (or 256 rows, depending on the kind of chips you have). This would mean that 128 bytes out of every 256 is refreshed. DOS itself would not only be using those bytes. (Consecutive rows correspond to consecutive bytes.) So really, you have to refresh all the rows in the chip, unless you are using memory in a fairly weird way.
@pcretroprogrammer2656
@pcretroprogrammer2656 20 күн бұрын
@@adriansdigitalbasement That's a nice idea!
@tighematt
@tighematt 21 күн бұрын
Those results seem odd, surely the decay can’t always be identical yet you wait 10 times the same duration and only see 1 error. Even on the larger test the error stays almost constant after the first wait? If you use a slightly longer wait or more iterations do you see more random behaviour?
@pcretroprogrammer2656
@pcretroprogrammer2656 21 күн бұрын
Remember that reading the values out to check them has the effect of refreshing them. So it is basically the same experiment run 10 times. I'd expect the results to stabilise at some point, with all the bits that are going to fail in a given interval of time eventually failing, and all the ones that can hold their contents for that long never failing. That was one of the conclusions of the video. I'm sure as it heats up things would be different, but a couple of minutes is not enough heating for it to show up in the results.
@tighematt
@tighematt 20 күн бұрын
Thanks, yes I understood that. The results just seemed too consistent? It’s certainly piqued my interest! It would be interesting to try to log which byte or ideally bit fails each time - perhaps that is different. Just seems odd that the ram would decay identically every time, but maybe that is just how it is? I’ll try your code on my XT later on. Thanks for interesting video.
@pcretroprogrammer2656
@pcretroprogrammer2656 20 күн бұрын
@@tighematt I guess there could be a bug in my code somewhere. It is pretty rough code. I'll be interested in what you find, and I hope the video spurs some interesting follow-ons from various people, even if it does just turn out that I made a silly mistake somewhere.
@tighematt
@tighematt 20 күн бұрын
I ran your code on my XT, it has a V20 so I had to tweak ITERS a little….but I got the exact same result as you! I checked the utility I wrote some years ago to reduce ram refresh speed for performance. It sets period to 14ms, I went as far as I could go without getting parity errors. So seems odd that is so different!
@pcretroprogrammer2656
@pcretroprogrammer2656 19 күн бұрын
@@tighematt It's also odd that I always got parity errors when turning NMI back on. I'm not sure what accounts for the difference, other than possible bugs in my very rough code. Given that I can't think of anything else, I'm just supposing that some bits fail very quickly, but most bits take a long time. I guess we need more data (and more careful code). I like Adrian Black's idea of copying the data to the screen memory in mode 6 so we can actually see the decay after each interval.
@robertapengelly
@robertapengelly 23 күн бұрын
What changes will need to be made for Mode 6 (640x200) if any?
@pcretroprogrammer2656
@pcretroprogrammer2656 23 күн бұрын
Only a few. In mode 6 there is only one colour and 8 pixels per byte. That means if you want to deal with each pixel individually, instead of in pairs, you have to do the arithmetic slightly differently to convert the x-coordinate to a byte location (divide by 8 instead of 4 and do single bit shifts rather than by a multiple of two bits when putting the bits in the right place in the byte). An improvement over mode 4 is you don't need to deal with multiple colours, so there's no need to store a colour. Just set the appropriate bit and you are done.
@robertapengelly
@robertapengelly 23 күн бұрын
@@pcretroprogrammer2656 I'm still a little confused as to the logic cause I can't get it to work. Does the following stay the same? ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;; Offset += 4 * 8192 if y odd. ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; xor di, di shr ax rcr di Also, does anything need changing in: ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;; Add 4 * 80 * (y / 2) to offset ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; xchg ah, al add di, ax shr ax shr ax add di, ax I'm guessing it does cause you said that it's 8 pixels per byte but I can't figure out what part needs altering. I can't figure the following out either: ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;; Compute x mod 4 and divide di by 4. ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; add di, cx and cl, 3 shr di shr di I've tried adding extra "shr" instructions but it doesn't work, I removed "add" instructions and still doesn't work.
@robertapengelly
@robertapengelly 23 күн бұрын
@@pcretroprogrammer2656 My replies keep disappearing so let's try without code and links. If possible could you make a point3.asm in in your PCRetroProgrammer repo on github targeting mode 6 as I tried a couple of different things based on your reply but I still can't figure it out.
@andrewdunbar828
@andrewdunbar828 Ай бұрын
If you were to use square roots, what would be the input range? Would it depend on the screen resolution? If so, a lookup table is an option.
@pcretroprogrammer2656
@pcretroprogrammer2656 Ай бұрын
Unfortunately the square roots would be of rational values, not integers, so lookup is probably not practical. But maybe there is a way to make it work somehow. Definitely worth thinking about.
@andrewdunbar828
@andrewdunbar828 Ай бұрын
@@pcretroprogrammer2656 There might be a version of the Carmack fast square root trick that will work?
@pcretroprogrammer2656
@pcretroprogrammer2656 Ай бұрын
@@andrewdunbar828 For floating point, certainly.
@tehdasi
@tehdasi Ай бұрын
The problem is that he's trying to do this on limited memory, and a lookup table with small enough error to be useful would prolly be too large to fit int he limited memory of an XT. Then again if he does get this working, I reckon at some point he's going to have to introduce error that he'd rather not have, so it might be a case of which method has the better trade off between speed and error. Anyways, we'll see.
@andrewdunbar828
@andrewdunbar828 Ай бұрын
A filled general ellipse is much harder than a one-pixel outline general ellipse. You convert the rotated ellipse formula to a sheered circle and then you can go through getting the left and right extend for every Y value. You can get any ellipse this way. for ellipses that are very skinny you often have to worry about precision artefacts so you'll want some logic to decide whether to do horizontal or vertical scanlines. I seem to remember that when you get to a certain point that general conics come into play. My maths skills fail me at this point. I never had an intuitive understanding for parabolae and hyperbolae that I had for ellipses either, which doesn't help.
@pcretroprogrammer2656
@pcretroprogrammer2656 Ай бұрын
I may end up doing things this way too for comparison. I'm pretty convinced of the benefits of both methods at this point. I hope we'll get to parabolae and hyperbolae. But on this early hardware, there are no guarantees.
@andrewdunbar828
@andrewdunbar828 Ай бұрын
@@pcretroprogrammer2656 I was also thinking, if you're going to go to the trouble of implementing floating point, would implementing fixed point bignums be another alternative?
@pcretroprogrammer2656
@pcretroprogrammer2656 Ай бұрын
@@andrewdunbar828 I think arbitrary precision is too general here and would be too expensive. You lose the benefits of fixed point. Of course, bignums might be a possibility for the corner cases. I think it sounds like a lot of work though. At least for floating point you could start with some routines used by a C compiler, e.g. TASM and just improve their code (if possible; more likely you would just adapt it to the specific requirements you actually need).
@koteth4505
@koteth4505 Ай бұрын
I think it would also be possible (or at least interesting) to explore Feynman's construction. Feynman uses it for Kepler's orbits. This method allows the construction of ellipses and hyperbolas. Square and compass construction uses a reference circle and lines. The tools required are straight lines and the ability to draw an orthogonal line from the centre of another line. It sounds complex, but there may be arithmetic and algorithmic shortcuts. en.wikipedia.org/wiki/Feynman%27s_Lost_Lecture
@pcretroprogrammer2656
@pcretroprogrammer2656 Ай бұрын
That's certainly an interesting one. I don't know if this could be done without square roots, but it would definitely be an interesting one to look at. Of course, back in the time of Kepler, everything was done the way the ancient Greeks did it, with a few additional tricks that were developed subsequently.
@JimLeonard
@JimLeonard Ай бұрын
You may want to fire up DeluxePaint II for the PC to see Dan Silva's solution to this. In that program, you first draw the standard ellipse, then hold down a mouse button to rotate it to the number of degrees you want. This implies that the algorithm is 1. calc regular ellipse, then 2. rotate all previously generated points around the ellipses' center point. This seems pretty fast to do since the rotations are only one axis. For ensuring there are no gaps in the rotated points when drawn, you draw lines between the points rather than the points themselves.
@pcretroprogrammer2656
@pcretroprogrammer2656 Ай бұрын
It's really a surprising coincidence that the person who designed the general ellipse algorithm I used was Dilip Da Silva and the person who wrote Deluxe Paint was Dan Silva. They are definitely not the same person and it seems doubtful they are related. Really fun coincidence. Anyway, I will definitely take a close look at the ellipse rotation in Deluxe Paint II. It will be interesting to see how the pixel gap problem was solved, if at all.
@benjamindeharo314
@benjamindeharo314 Ай бұрын
Do you plan to make a tutorial about coding an emulator for an 8 bit pc like the zx spectrum someday ?
@pcretroprogrammer2656
@pcretroprogrammer2656 Ай бұрын
I don't have plans for that as there are very good emulators already. I don't think I could improve on them.
@benjamindeharo314
@benjamindeharo314 Ай бұрын
@@pcretroprogrammer2656 Oh I didn't mean for you to do something nobody has ever done. There's only 1 tutorial about how to make a zx spectrum on youtube, and I find it hard to follow. So it could be useful for educational purpose, on top of just being a fun exercise.
@adriansdigitalbasement
@adriansdigitalbasement Ай бұрын
Hi there. Some RAM chips have extremely long retentions. I've had machines retain most of their RAM contents several seconds after power off with just minor decay. It seems to vary wildly from brand to brand, series to series. I have others that 1 second after power off are totally cleared. I assume the stock refresh time is setup to be safe across all types of tested chips from back in the day. Unless your XT has 256k total on the motherboard, banks 1 and 2 are always 41256, with 3 and 4 being 4164. And then only the very esrliest 16-64k 5150 machines use 4116 the all the laters ones use 4164. Parity errors only happen on reads as the circuit generates a non maskable interrupt. If you dont hit one then the contents of DRAM matches what was originally stored in the parity chip so your refresh timer is good. I have not confirm this but I'm pretty sure the ram that's on the CGA card itself is refreshed by the 6845 redrawing the screen 60 times a second. I'll need to dig into it a bit more to be sure.
@pcretroprogrammer2656
@pcretroprogrammer2656 Ай бұрын
Thanks for confirming my suspicions regarding long retention times and the other info. Regarding the CGA RAM, yes it is definitely refreshed by the CRTC. No parity bit though of course. The info you gave regarding the XT motherboards has me wondering now. I'll have to check again tonight. I see in the IBM documentation, the 4164's are usually in banks 0 and 1, but if you install more then you are supposed to switch the chips. That leads to a few possibilities: 1) I misread the notations on the board 3 times (most probable) 2) the chips are installed incorrectly or with some hack, 3) it's not an IBM board. I will have to check now that you've pointed this out.
@adriansdigitalbasement
@adriansdigitalbasement Ай бұрын
@@pcretroprogrammer2656 here is the specific info on 640k on an XT from IBM: www.minuszerodegrees.net/5160/motherboard/5160_upgrading_256k_motherboard_to_640k.pdf
@adriansdigitalbasement
@adriansdigitalbasement Ай бұрын
Banks 0 and 1 are closest to the slots
@pcretroprogrammer2656
@pcretroprogrammer2656 Ай бұрын
@@adriansdigitalbasement There's absolutely no question about it (I just checked again), and banks 0 and 1 have 4164's in them and banks 2 and 3 have 41256's in them. It's labeled as a 64-256kb board with a sticker that gives the model number as 6323560. The layout of the board seems to be the IBM layout, and there are markings on the board which match what I see online. I also don't see any evidence of any kind of mod. I can certainly modify contents in 640kb of RAM and it retains them (I checked this for each 64kb block and then went back to check the contents were still different in each block). Bit of a mystery I'd say.
@adriansdigitalbasement
@adriansdigitalbasement Ай бұрын
@@pcretroprogrammer2656 fascinating! I had no idea that would work. Perhaps whoever did the 640k mod wired up the extra address line to bank 3 and 4? Definitely on all the 5160s I've seen the mod was done as instructed and they had the 256k chips in the first banks.
@dr_jazza
@dr_jazza Ай бұрын
OSU
@Waccoon
@Waccoon Ай бұрын
Coming from an Amiga background, it sounds odd to me that you can customize or disable memory refresh on a PC at all! OCS Amigas have a register for setting the current refresh row, so in theory banging it will stall memory refresh. I tried testing this on my AGA system, but I found out the hard way that on AGA systems the register is a dummy and writing it does nothing. Guess I'll have to dig my A500 out of storage sometime and try again. 8)
@GloriousCow
@GloriousCow Ай бұрын
I discovered something interesting investigating the V20 CPU - it will resume from halt nearly instantly (2 cycles) on interrupt, if interrupts are disabled. That's right, it resumes on interrupt even if the I flag is cleared; although it does NOT perform the interrupt in that case. It just immediately starts executing the next instruction after HALT, and so as long as you guarantee the next instruction is fetched going into halt, then you have an almost instant resume. Might have some intriguing applications for demos; not sure!
@pcretroprogrammer2656
@pcretroprogrammer2656 Ай бұрын
That is an interesting discovery. It seems sort of random, as it is hard to imagine there being some need for the V20 to optimize this specifically. I assume this is just a result of various other optimizations they were actually focused on. It's pretty wild that it resumes even if the I flag is cleared.
@leyasep5919
@leyasep5919 Ай бұрын
OMG I realise I have gotten older in the last 30 years... these things were my usual subject of exploration when I was younger. Today I couldn't stand all this complexity and even RISC machines make me angry 😛
@IExSet
@IExSet Ай бұрын
Like ❤
@tighematt
@tighematt Ай бұрын
I have a similar utility on my XT to slow down refresh. Came from a Peter Norton article I think. Same as yours just modifies the timer. I too could set it wildly out of spec. Gives a measurable speed boost, I do see the odd parity error once in a while so it’s definitely doing what we expect. Love the detail in this video series btw!
@josephlunderville3195
@josephlunderville3195 Ай бұрын
I have one last possibility for you -- is it possible that RAS is energized on ALL RAM banks even when they aren't selected? It's valid and shouldn't hurt anything if you don't otherwise enabled the chip.
@pcretroprogrammer2656
@pcretroprogrammer2656 Ай бұрын
No, that is unfortunately not the case. We can in fact keep the first bank alive and get parity errors on the other banks, which proves that. I don't know if there is a reason for this design. However, I think the story is different with the DMA controller. The PIT is running at 1.19MHz and with 18 PIT cycles per pulse to the DMA controller, that means it is getting a pulse at about 66.3kHz. If it had to energize 10*256 = 2560 rows at that rate it'd only be doing the whole of RAM every 38.6 ms. So that is clearly energizing all banks simultaneously. Actually, that would still be out of spec if all 256 rows in these KM4164 chips needed to be energized every 2ms. So the 128cycle/2ms figure in the datasheet must not mean precisely what I thought it meant. According to Reenigne's blog, they require energization of 256 rows in 4ms and the bigger chips 512 rows in 8ms. So I think the figure in the datasheet is actually a rate at which the row addresses need to be cycled. So that means the figures I gave in the video were actually off by 2.
@josephlunderville3195
@josephlunderville3195 Ай бұрын
Also, I know you say you don't observe too much margin in other test scenarios so maybe don't put too much stock in this speculation, but: my electronic designer spidey sense says thaf if I was the engineer designing that DRAM I would be specifying that refresh time VERY generously, because I wouldn't necessarily be very confident about the margin on the components as the silicon ages. Remember the RAM is probably specified to work at 80 degrees for 20 years or something, and with say 10x margin left at that point on both the amplifiers and the capacitors, with the supply voltages just barely in spec -- and your ram is probably operating at a nice cool temperature, has never really been stressed in its life, and maybe the process was just a little better dialed in on that batch and you can easily see how those margins could multiply out to a 1000x margin overall. But see my other comment -- if you're only checking parity, it could also just be that the RAM really is losing its brain and you're not detecting it because parity kind of sucks!
@pcretroprogrammer2656
@pcretroprogrammer2656 Ай бұрын
Yeah that Samsung RAM is rated to operate between 0 and 70 degrees C. I'm not sure what the effect of temperature is on decay. One assumes there is some kind of effect though.
@janglur
@janglur Ай бұрын
@@pcretroprogrammer2656 Mostly it encourages electromigration and delamination. The traces inside and interconnects get thinner and weaker as electromigration etches away, and components that are layered or bonded on the silicon will flex from expanding and contracting at differing rates from even minor thermal cycling, causing broken traces and even microfracturing. It's an extremely slow process, however, and is highly dependant on the frequency and voltage involved relative to the thickness of everything that conducts, and the temperature (both height and, cyclic different. Going from 32 to 80 and back every hour is going to wreck it way faster than a solid 80.) Things in this era were often overdesigned in many areas in beneficial ways for other reasons- like extra oversized interconnect points for making it mechanically easier to produce. So some will last through apocalyptic events compared to smaller, more micronized modern chips. Which is why you see a lot of 16-bit computers cooking their poor Z80's or whatever their entire lifespan, pure out of spec on temp, but they just kept marching. They were built different back then, and it took more abuse to cause the same damage! But like any chip, once they burn out, they're gone. RAM is usually either extremely hardy or extremely flimsy, depending on the area of prodding. There's a lot to do with the density of memory chips, too, and the relation of defects per square millimeter in the silicon making process that makes RAM a more random silicon lottery than most, as even within known good picks the likelyhood of a defect, if present, causing an issue is much higher. And not all- i'd say most- issues are readily apparent by outright failure. It can be like a blemish or dust mote spot on the mask that causes a trace to be 1/4 it's normal thickness at a spot, which will wear out faster or under more stress than normal. Etc.
@josephlunderville3195
@josephlunderville3195 Ай бұрын
If the parity is even, and all the bits degrade to 0, or if the parity is odd and they all degrade to 1, or in general if your DRAM pairs tend to degrade 2 or 4 or 6 at a time, the parity will still come out good. Parity isn't generally a very reliable way of detecting errors, which is why you dont really see it in new designs! You might see a parity error faster if you fill memory with random data or a mixed bit pattern (say 0xAA55CC33) before doing your DRAM parity tests, or alternatively actually write a program to fill with a specific pattern, halt refresh, wait, and check the precise pattern.
@pcretroprogrammer2656
@pcretroprogrammer2656 Ай бұрын
Yeah it's mainly designed to detect single bit errors (without correction) and isn't at all designed for this situation where we are turning off DRAM refresh. In our situation, it's actually likely to be most useful when many of the cells are in an indeterminate state and all the bits are essentially random, at which point there's a 50% chance of a parity error per byte accessed. After some time, it's probably the case that everything decays one way or the other (though I don't happen to know for sure), in which case it's possible that parity errors disappear. So there might actually be a point in time where parity errors maximise and then drop off after that.
@josephlunderville3195
@josephlunderville3195 Ай бұрын
Thinking about it more, it seems to me that parity was pretty well understood at this point in digital design and they should have chosen a scheme that was guaranteed to generate errors if refresh failed entirely, but I don't have a 5150 to check on the oscilloscope to be certain. Maybe later I'll try to find a schematic, but it seems more likely to me your low memory refresh is accidentally refreshing all banks somehow.
@pcretroprogrammer2656
@pcretroprogrammer2656 Ай бұрын
@@josephlunderville3195 You actually had me wondering for a bit. But I went back to some correspondence with Reenigne where he specifically pointed out to me that reading a memory address only refreshes the row on the bank being read and that this differs with what happens when the DMA controller does it. That certainly matches with what I've seen in practice, including when writing the code for this video, where refreshing just the first bank still allowed me to generate parity errors on the second bank. Quite why it only happens so far out of spec remains a mystery to me. Reenigne previously expressed considerable surprise that a PIT count of 76 (or something equivalent) would still give stable results. On his blog he mentions that changing this from 18 to 19 or perhaps 20 was fine back in the day, but that's about it. That was also my experience, though it is easy to fool yourself if you are not actually accessing banks that have decayed. I'm now starting to think that 30s of decay is sufficient to result in the RAM and parity bits all being wiped out and that if I had much smaller delays I'd get parity errors. I might need to do a follow up video if that is the case. For example, I could hijack the interrupt handler for parity errors to detect them and try various delays to see how long it takes before parity errors start appearing and possibly disappearing again.
@pcretroprogrammer2656
@pcretroprogrammer2656 Ай бұрын
According to Reenigne, in the XT the parity bit was inverted, so that if refresh circuitry failed and everything decayed to zero, the parity bit would be incorrect. And apparently the Samsung chips in my XT require only 128 rows rather than 256 to refresh the entire chip, despite what it says later in the datasheet. This explains how they were compatible with the DMA refresh arrangement in the IBM XT (which would otherwise be too slow).
@RedCMD
@RedCMD Ай бұрын
when a bit corrupts does it flip or always get set to 0 OR 1?
@pcretroprogrammer2656
@pcretroprogrammer2656 Ай бұрын
That's a really good question, and it is exactly what I was thinking too. I didn't want to sound off without knowing more though. There's supposedly a transistor and a capacitor, so one presumes the capacitor leaks charge. But I don't know if the data is stored inverted or not.
@janglur
@janglur Ай бұрын
​@@pcretroprogrammer2656 If it's getting stale it should in theory drop to 0 or whatever the system logic calls low, shouldn't it? I'm sure you can do it either way and have heard of those that invert the logic, but most go with no on low simply to prevent powering unused memory circuits. But if you knew you'd be using most or all of it during normal operation, I guess it may be wiser to invert that for the same reason. Worth investigating!
@pcretroprogrammer2656
@pcretroprogrammer2656 Ай бұрын
@@janglur Yes, that seems to be what happens in practice, though the designers of the XT apparently invert the parity bit so that if it goes stale it effectively goes to a parity of 1 meaning that staleness would eventually be detected when everything decays to a low enough charge (thanks to Reenigne for pointing this out).
@greyguy2
@greyguy2 Ай бұрын
cool stuff!
@bananaboy41
@bananaboy41 Ай бұрын
According to the 8042 keyboard controller datasheet the buffer is just one byte!
@pcretroprogrammer2656
@pcretroprogrammer2656 Ай бұрын
Ah I did try to look that up, but just got datasheets for some microcontroller with 256 bytes of RAM or something like that. My assumption was that it would be one keypress or one scancode or something like that, but just couldn't find the right info. Thanks for hunting that down.
@bananaboy41
@bananaboy41 Ай бұрын
​@@pcretroprogrammer2656 I found this one tvsat.com.pl/PDF/W/W83C42P_win.pdf not sure if this link will work in the youtube comments but it's a W83C42 which seems to be a more modern (relatively speaking) variant. In there it says "The output buffer is an 8-bit read-only register at I/O address hex 60. The keyboard controller uses the output buffer to send the scan code received from the keyboard and data bytes required by command to the system. The output buffer should be read only when the output buffer full bit in the register is 1." I think I found the same note in an older pdf but I can't find the link now for some reason!
@theALFEST
@theALFEST Ай бұрын
A long time ago I wrote a program that slowed down ram refresh to get a slight speed up. I remember that I could set very slow rate and pc still worked fine.
@pcretroprogrammer2656
@pcretroprogrammer2656 Ай бұрын
Yeah I remember using such a program back in the day. Such a program was very popular at some point. But I remember it crashing the machine for even very small increases. It'd work for a while and then boom, it just stopped.
@alexloktionoff6833
@alexloktionoff6833 Ай бұрын
Multiplication by 50 via LEA could be faster ?
@pcretroprogrammer2656
@pcretroprogrammer2656 Ай бұрын
LEA doesn't have the multiplications by a constant that the 386 has for example. But you can use it for adding various registers. So far I never found a situation where I could get anything more out of it, but it is presumably possible.
@IExSet
@IExSet Ай бұрын
always like !
@zyansheep
@zyansheep Ай бұрын
Me clicking on this vid thinking CGA stood for "conformal geometric algebra" 😭
@dsgowo
@dsgowo Ай бұрын
Same, it didn't help that there were circles here too
@webgpu
@webgpu Ай бұрын
oh that prompt ... i spent a dozen years looking at "c:\>" (until win95 -- win 3.1 still required DOS), so many days dealing with autoexec.bat & config.sys .... (my trauma from "command line" days are so huge i never touched linux OS...)
@stanb1455
@stanb1455 Ай бұрын
could something like GlaBIOS' CGA optimizations make things even faster?
@pcretroprogrammer2656
@pcretroprogrammer2656 Ай бұрын
I don't believe so. Drawing pixels is the only thing it really does faster, but using the BIOS to draw pixels is the last thing you want to do if you want high performance code. It is always going to be faster to write directly to video RAM and to simply update the information for drawing pixels as you go, rather than recomputing it every pixel (which is what the BIOS basically has to do).
@Hiphopasaurus
@Hiphopasaurus Ай бұрын
I bet it would be interesting to see the difference in speed between direct writes and using the BIOS routines to draw pixels. That and benchmarking the different BIOS ROMs doing it would be neat too.
@pcretroprogrammer2656
@pcretroprogrammer2656 Ай бұрын
@@Hiphopasaurus From experience, the standard BIOS routine is VERY slow, and the accelerated routines are only a couple of times faster.
@RLstavista
@RLstavista Ай бұрын
I don't have enough brain bandwidth for this... 🤪
@anon_y_mousse
@anon_y_mousse Ай бұрын
What language is that example code in? Why do you have multiple 1-bit shifts in a row in the assembly instead of combining them?
@pcretroprogrammer2656
@pcretroprogrammer2656 Ай бұрын
That is the language Julia. I use it because it is pretty close to pseudocode and readable enough. Note that in some videos I use a drawpixel function, which is not an actual command in Julia. I just made that up for the presentation. To combine multiple 1-bit shifts on the 8086/8088, one had to put the shift count in the CL register. There was no multibit shift by an immediate value. The problem with using CL is that it takes up an 8 bit register which we are using for other things, and it wasn't really faster anyway. So typically, unless you want to shift by a variable number of bits, instead of a constant number of bits, you use individual shifts. Each shift by a bit can then be counted as 4 cycles, typically.
@anon_y_mousse
@anon_y_mousse Ай бұрын
@@pcretroprogrammer2656 If you're stuck with an 8086, could you perhaps use some of the segment registers to store data? Only problem is that you make multiple function calls, so it'd be difficult to use ss, and you appear to have data all over making ds and cs difficult.
@theALFEST
@theALFEST Ай бұрын
@@pcretroprogrammer2656 usually it's more like 8 cycles, because each opcode byte fetch takes 4
@supercompooper
@supercompooper Ай бұрын
Elipses are like communist rectangles
@_ttaneff_
@_ttaneff_ Ай бұрын
KZfaq's algorithm at it's best - subscribe ! :) Brought back so many memories of my first steps in programming - on IBM/Apple clones in the early 90's (eastern europe thing); I miss those (failed) teenage attempts at 3D rasterization on a 80286/CGA... and 30 years later, I still look at the assembly output of my code and itch for (micro)optimizations. Thanks, I will have fun watching your videos!
@pcretroprogrammer2656
@pcretroprogrammer2656 Ай бұрын
Yeah the algorithm did surprisingly well finding people to watch this one. It's by far the largest number of viewers for a video on this channel.
@procactus9109
@procactus9109 Ай бұрын
Awesome
@erajoj
@erajoj Ай бұрын
Fond memories of early graphics and assembler :)
@D0Samp
@D0Samp Ай бұрын
Since we're running the PIT system counter roughly three times faster than before, we could count the number of times the interrupt is run and adjust the tick counter in the BIOS data area by about 30% of it before the program exits, so the DOS clock stays largely untouched. Alternatively, every third interrupt call could JMP FAR [handler_offset] rather than returning itself, although this is probably not great for avoiding jitter.
@pcretroprogrammer2656
@pcretroprogrammer2656 Ай бұрын
I agree you could update the tick counter upon exit. I suppose you could also call the existing routine, but I'm not sure this would be a good idea, as you guessed. I have at least considered doing the former in the past. But on my PC, I can't be bothered putting the date and time in every time I start the system, so it'd be a waste of time for me personally. Given that you are doing something so out-of-spec in the first place, I think you wouldn't be too concerned about this kind of defensive programming. It's not a business application, and nor is the sort of thing you'd ever be doing in a business application. But you are right that you could do this, and it wouldn't be too hard.
@TheAndreArtus
@TheAndreArtus Ай бұрын
Have you looked into Bresenham's algorithms?
@pcretroprogrammer2656
@pcretroprogrammer2656 Ай бұрын
Sure. I implemented Bresenham's line drawing one on this channel. The midpoint circle algorithm, which I here generalise for ellipses, is itself a generalization of Bresenham's line drawing algorithm. There are very many versions of it on the web. I haven't looked into any other Bresenham algorithms other than his line drawing routine and the various generalisations to circles and ellipses though. I'm aware there are generalisations to general conics, but these are pretty incomplete as far as actually usable scan conversion goes. Do you have some specific Bresenham algorithm in mind?
@TheAndreArtus
@TheAndreArtus Ай бұрын
@@pcretroprogrammer2656 Those are the ones. I implemented versions in ASM (with Turbo Pascal calling convention) in the 90s (based of descriptions in Richard Ferraro's book "Programmers Guide to the EGA and VGA cards", Chris D. Watkins' code and descriptions [various books] ) and they were quite fast compared to the standard graphics library that came with TP 5.5. Of course I am going by how I experienced it ~30ya, would probably feel slow today. We had to do bit plane switching for some [4 bit EGA/VGA] modes which complicated things a bit (you needed to know if you are in packed display or bit plane mode) beyond the bit masking required by having multiple pixels per byte. Of course you don't want to address a single pixel at a time (masking & switching) when drawing the horizontal runs (top and bottom 1/4). This was my the first video of yours I've watched, made me a bit nostalgic.