Advanced CGA Graphics: DRAM Refresh Mysteries

Рет қаралды 627

Ай бұрын

In this video we continue our quest towards doing cycle accurate graphics effects on the IBM CGA card for the IBM PC. In this episode we show how to do DRAM refresh with the usual DMA controller not doing it. We encounter some bizarre mysteries which remain unresolved.
CTRL-ALT-REES article:
ctrl-alt-rees.com/2021-01-06-...
Code for this episode:
github.com/wbhart/PCRetroProg...

Пікірлер: 31

@tighematt Ай бұрын

I have a similar utility on my XT to slow down refresh. Came from a Peter Norton article I think. Same as yours just modifies the timer. I too could set it wildly out of spec. Gives a measurable speed boost, I do see the odd parity error once in a while so it’s definitely doing what we expect. Love the detail in this video series btw!

@GloriousCow Ай бұрын

I discovered something interesting investigating the V20 CPU - it will resume from halt nearly instantly (2 cycles) on interrupt, if interrupts are disabled. That's right, it resumes on interrupt even if the I flag is cleared; although it does NOT perform the interrupt in that case. It just immediately starts executing the next instruction after HALT, and so as long as you guarantee the next instruction is fetched going into halt, then you have an almost instant resume. Might have some intriguing applications for demos; not sure!

@pcretroprogrammer2656 Ай бұрын

That is an interesting discovery. It seems sort of random, as it is hard to imagine there being some need for the V20 to optimize this specifically. I assume this is just a result of various other optimizations they were actually focused on. It's pretty wild that it resumes even if the I flag is cleared.

@theALFEST Ай бұрын

A long time ago I wrote a program that slowed down ram refresh to get a slight speed up. I remember that I could set very slow rate and pc still worked fine.

@pcretroprogrammer2656 Ай бұрын

Yeah I remember using such a program back in the day. Such a program was very popular at some point. But I remember it crashing the machine for even very small increases. It'd work for a while and then boom, it just stopped.

@Waccoon Ай бұрын

Coming from an Amiga background, it sounds odd to me that you can customize or disable memory refresh on a PC at all! OCS Amigas have a register for setting the current refresh row, so in theory banging it will stall memory refresh. I tried testing this on my AGA system, but I found out the hard way that on AGA systems the register is a dummy and writing it does nothing. Guess I'll have to dig my A500 out of storage sometime and try again. 8)

@adriansdigitalbasement Ай бұрын

Hi there. Some RAM chips have extremely long retentions. I've had machines retain most of their RAM contents several seconds after power off with just minor decay. It seems to vary wildly from brand to brand, series to series. I have others that 1 second after power off are totally cleared. I assume the stock refresh time is setup to be safe across all types of tested chips from back in the day. Unless your XT has 256k total on the motherboard, banks 1 and 2 are always 41256, with 3 and 4 being 4164. And then only the very esrliest 16-64k 5150 machines use 4116 the all the laters ones use 4164. Parity errors only happen on reads as the circuit generates a non maskable interrupt. If you dont hit one then the contents of DRAM matches what was originally stored in the parity chip so your refresh timer is good. I have not confirm this but I'm pretty sure the ram that's on the CGA card itself is refreshed by the 6845 redrawing the screen 60 times a second. I'll need to dig into it a bit more to be sure.

@pcretroprogrammer2656 Ай бұрын

Thanks for confirming my suspicions regarding long retention times and the other info. Regarding the CGA RAM, yes it is definitely refreshed by the CRTC. No parity bit though of course. The info you gave regarding the XT motherboards has me wondering now. I'll have to check again tonight. I see in the IBM documentation, the 4164's are usually in banks 0 and 1, but if you install more then you are supposed to switch the chips. That leads to a few possibilities: 1) I misread the notations on the board 3 times (most probable) 2) the chips are installed incorrectly or with some hack, 3) it's not an IBM board. I will have to check now that you've pointed this out.

@adriansdigitalbasement Ай бұрын

@@pcretroprogrammer2656 here is the specific info on 640k on an XT from IBM: www.minuszerodegrees.net/5160/motherboard/5160_upgrading_256k_motherboard_to_640k.pdf

@adriansdigitalbasement Ай бұрын

Banks 0 and 1 are closest to the slots

@pcretroprogrammer2656 Ай бұрын

@@adriansdigitalbasement There's absolutely no question about it (I just checked again), and banks 0 and 1 have 4164's in them and banks 2 and 3 have 41256's in them. It's labeled as a 64-256kb board with a sticker that gives the model number as 6323560. The layout of the board seems to be the IBM layout, and there are markings on the board which match what I see online. I also don't see any evidence of any kind of mod. I can certainly modify contents in 640kb of RAM and it retains them (I checked this for each 64kb block and then went back to check the contents were still different in each block). Bit of a mystery I'd say.

@adriansdigitalbasement Ай бұрын

@@pcretroprogrammer2656 fascinating! I had no idea that would work. Perhaps whoever did the 640k mod wired up the extra address line to bank 3 and 4? Definitely on all the 5160s I've seen the mod was done as instructed and they had the 256k chips in the first banks.

@josephlunderville3195 Ай бұрын

If the parity is even, and all the bits degrade to 0, or if the parity is odd and they all degrade to 1, or in general if your DRAM pairs tend to degrade 2 or 4 or 6 at a time, the parity will still come out good. Parity isn't generally a very reliable way of detecting errors, which is why you dont really see it in new designs! You might see a parity error faster if you fill memory with random data or a mixed bit pattern (say 0xAA55CC33) before doing your DRAM parity tests, or alternatively actually write a program to fill with a specific pattern, halt refresh, wait, and check the precise pattern.

@pcretroprogrammer2656 Ай бұрын

Yeah it's mainly designed to detect single bit errors (without correction) and isn't at all designed for this situation where we are turning off DRAM refresh. In our situation, it's actually likely to be most useful when many of the cells are in an indeterminate state and all the bits are essentially random, at which point there's a 50% chance of a parity error per byte accessed. After some time, it's probably the case that everything decays one way or the other (though I don't happen to know for sure), in which case it's possible that parity errors disappear. So there might actually be a point in time where parity errors maximise and then drop off after that.

@josephlunderville3195 Ай бұрын

Thinking about it more, it seems to me that parity was pretty well understood at this point in digital design and they should have chosen a scheme that was guaranteed to generate errors if refresh failed entirely, but I don't have a 5150 to check on the oscilloscope to be certain. Maybe later I'll try to find a schematic, but it seems more likely to me your low memory refresh is accidentally refreshing all banks somehow.

@pcretroprogrammer2656 Ай бұрын

@@josephlunderville3195 You actually had me wondering for a bit. But I went back to some correspondence with Reenigne where he specifically pointed out to me that reading a memory address only refreshes the row on the bank being read and that this differs with what happens when the DMA controller does it. That certainly matches with what I've seen in practice, including when writing the code for this video, where refreshing just the first bank still allowed me to generate parity errors on the second bank. Quite why it only happens so far out of spec remains a mystery to me. Reenigne previously expressed considerable surprise that a PIT count of 76 (or something equivalent) would still give stable results. On his blog he mentions that changing this from 18 to 19 or perhaps 20 was fine back in the day, but that's about it. That was also my experience, though it is easy to fool yourself if you are not actually accessing banks that have decayed. I'm now starting to think that 30s of decay is sufficient to result in the RAM and parity bits all being wiped out and that if I had much smaller delays I'd get parity errors. I might need to do a follow up video if that is the case. For example, I could hijack the interrupt handler for parity errors to detect them and try various delays to see how long it takes before parity errors start appearing and possibly disappearing again.

@pcretroprogrammer2656 Ай бұрын

According to Reenigne, in the XT the parity bit was inverted, so that if refresh circuitry failed and everything decayed to zero, the parity bit would be incorrect. And apparently the Samsung chips in my XT require only 128 rows rather than 256 to refresh the entire chip, despite what it says later in the datasheet. This explains how they were compatible with the DMA refresh arrangement in the IBM XT (which would otherwise be too slow).

@IExSet Ай бұрын

Like ❤

@josephlunderville3195 Ай бұрын

I have one last possibility for you -- is it possible that RAS is energized on ALL RAM banks even when they aren't selected? It's valid and shouldn't hurt anything if you don't otherwise enabled the chip.

@pcretroprogrammer2656 Ай бұрын

No, that is unfortunately not the case. We can in fact keep the first bank alive and get parity errors on the other banks, which proves that. I don't know if there is a reason for this design. However, I think the story is different with the DMA controller. The PIT is running at 1.19MHz and with 18 PIT cycles per pulse to the DMA controller, that means it is getting a pulse at about 66.3kHz. If it had to energize 10*256 = 2560 rows at that rate it'd only be doing the whole of RAM every 38.6 ms. So that is clearly energizing all banks simultaneously. Actually, that would still be out of spec if all 256 rows in these KM4164 chips needed to be energized every 2ms. So the 128cycle/2ms figure in the datasheet must not mean precisely what I thought it meant. According to Reenigne's blog, they require energization of 256 rows in 4ms and the bigger chips 512 rows in 8ms. So I think the figure in the datasheet is actually a rate at which the row addresses need to be cycled. So that means the figures I gave in the video were actually off by 2.

@josephlunderville3195 Ай бұрын

Also, I know you say you don't observe too much margin in other test scenarios so maybe don't put too much stock in this speculation, but: my electronic designer spidey sense says thaf if I was the engineer designing that DRAM I would be specifying that refresh time VERY generously, because I wouldn't necessarily be very confident about the margin on the components as the silicon ages. Remember the RAM is probably specified to work at 80 degrees for 20 years or something, and with say 10x margin left at that point on both the amplifiers and the capacitors, with the supply voltages just barely in spec -- and your ram is probably operating at a nice cool temperature, has never really been stressed in its life, and maybe the process was just a little better dialed in on that batch and you can easily see how those margins could multiply out to a 1000x margin overall. But see my other comment -- if you're only checking parity, it could also just be that the RAM really is losing its brain and you're not detecting it because parity kind of sucks!

@pcretroprogrammer2656 Ай бұрын

Yeah that Samsung RAM is rated to operate between 0 and 70 degrees C. I'm not sure what the effect of temperature is on decay. One assumes there is some kind of effect though.

@janglur Ай бұрын

@@pcretroprogrammer2656 Mostly it encourages electromigration and delamination. The traces inside and interconnects get thinner and weaker as electromigration etches away, and components that are layered or bonded on the silicon will flex from expanding and contracting at differing rates from even minor thermal cycling, causing broken traces and even microfracturing. It's an extremely slow process, however, and is highly dependant on the frequency and voltage involved relative to the thickness of everything that conducts, and the temperature (both height and, cyclic different. Going from 32 to 80 and back every hour is going to wreck it way faster than a solid 80.) Things in this era were often overdesigned in many areas in beneficial ways for other reasons- like extra oversized interconnect points for making it mechanically easier to produce. So some will last through apocalyptic events compared to smaller, more micronized modern chips. Which is why you see a lot of 16-bit computers cooking their poor Z80's or whatever their entire lifespan, pure out of spec on temp, but they just kept marching. They were built different back then, and it took more abuse to cause the same damage! But like any chip, once they burn out, they're gone. RAM is usually either extremely hardy or extremely flimsy, depending on the area of prodding. There's a lot to do with the density of memory chips, too, and the relation of defects per square millimeter in the silicon making process that makes RAM a more random silicon lottery than most, as even within known good picks the likelyhood of a defect, if present, causing an issue is much higher. And not all- i'd say most- issues are readily apparent by outright failure. It can be like a blemish or dust mote spot on the mask that causes a trace to be 1/4 it's normal thickness at a spot, which will wear out faster or under more stress than normal. Etc.

@leyasep5919 Ай бұрын

OMG I realise I have gotten older in the last 30 years... these things were my usual subject of exploration when I was younger. Today I couldn't stand all this complexity and even RISC machines make me angry 😛

@RedCMD Ай бұрын

when a bit corrupts does it flip or always get set to 0 OR 1?

@pcretroprogrammer2656 Ай бұрын

That's a really good question, and it is exactly what I was thinking too. I didn't want to sound off without knowing more though. There's supposedly a transistor and a capacitor, so one presumes the capacitor leaks charge. But I don't know if the data is stored inverted or not.

@janglur Ай бұрын

@@pcretroprogrammer2656 If it's getting stale it should in theory drop to 0 or whatever the system logic calls low, shouldn't it? I'm sure you can do it either way and have heard of those that invert the logic, but most go with no on low simply to prevent powering unused memory circuits. But if you knew you'd be using most or all of it during normal operation, I guess it may be wiser to invert that for the same reason. Worth investigating!

@pcretroprogrammer2656 Ай бұрын

@@janglur Yes, that seems to be what happens in practice, though the designers of the XT apparently invert the parity bit so that if it goes stale it effectively goes to a parity of 1 meaning that staleness would eventually be detected when everything decays to a low enough charge (thanks to Reenigne for pointing this out).