[CUBE] ECC when it come to memory

Bumala, Robert W robert.w.bumala at lmco.com
Tue Sep 2 09:56:52 PDT 2003



-----Original Message-----
From: Joost van de Griek [mailto:joost at jvdg.net]
Sent: Tuesday, September 02, 2003 2:06 AM
To: Cube List
Subject: Re: [CUBE] ECC when it come to memory


On 2003-08-18 18:13, Robert W. Bumala wrote:

> I believe you're thinking of EDAC (Error Detection and Correction).
We
> use that for RAM that's exposed to a high radiation environment, such
as
> space.  Normal EEC RAM just has a parity bit, and most IBM Wintel
> machines use it.  Way back when about 5 BM (Before Macintosh), IBM
> decided that the DRAM memory needed a parity bit to detect errors.  It
> didn't fix errors, just crashed the machine with a parity error.
Apple
> figured that if the memory was bad it'd crash all by itself, they
didn't
> need a parity bit.  This is why when you buy memory for a Mac, it has
a
> missing chip, that's the parity bit.  It's fairly useless, but wintel
> machines still use it for some odd reason.  People in the IBM-like
world
> jump through all kinds of hurdles to maintain compatibility with an
old
> obsolete architecture.  That's why they lived with the dreaded 1
> megabyte limit for so long.

>Well, that's stretching things a bit...

>First off, parity SIMMs detect data errors, not memory errors. 

I guess I don't appreciate the difference.  I guess you could say that
there are two sources of errors, the memory or the parity generator /
checker.  If the data error originates elsewhere the parity is applied
and checked without a problem.

>Secondly,
>while a data error in application code may cause a machine to crash,
hang,
>or behave erratically, a data error in actual data will cause it to run
just
>fine, but with errors in your results (garbage in, garbage out). This
is the
>main reason that many professional computer hardware manufacturers use
>parity memory, such as IBM, SGI and Sun.

If I set or reset a bit in the execution code I don't think it'll run
just fine.  All of this stems from way back when in the dark ages of
computing.  Back when they used core memories, and discrete ram cells.
Because the technology was so unreliable, they needed a way of checking
for errors, hence parity.  IBM put it in the PC, because they always
had.  Modern DRAM is so fast and reliable that it's just not needed.
I've never had a memory error on any of the Macintoshes I've owned (Macs
have never used parity).  They seem to be well engineered.  The only
time I've ever even seen a memory error was in the early days of the IBM
clones, when the clone makers kept cutting margins to be a little faster
than the other.

>I'm sure we can all appreciate the inclusion of data integrity checking
>mechanisms in the machines that control nuclear power plants and
weapons
>systems, right?

Most high reliability systems use EDAC if there's a possibility of a
problem with memory, otherwise you design in enough margin.

Bob.




More information about the Cube mailing list