Raid
ECC
ECC (either "error
correction ]or correcting[ code" or "error checking
and correcting") allows data that is being read or
transmitted to be checked for errors and, when necessary,
corrected "on the fly." It differs from parity-checking
in that errors are not only detected but also corrected.
ECC is increasingly being designed into data storage and
transmission hardware as data rates (and therefore error
rates) increase.
Here's how it works for data storage:
When a unit of data (or "word") is stored
in RAM or peripheral storage, a code that describes the
bit sequence in the word is calculated and stored along
with the unit of data. For each 64-binary digit word, an
extra 7 bits are needed to store this code.
When the unit of data is requested for reading, a code for
the stored and about-to-be-read word is again calculated
using the original algorithm. The newly generated code is
compared with the code generated when the word was stored.
If the codes match, the data is free of errors and is sent.
If the codes don't match, the missing or erroneous bits
are determined through the code comparison and the bit or
bits are supplied or corrected.
No attempt is made to correct the data that is still in
storage. Eventually, it will be overlaid by new data and,
assuming the errors were transient, the incorrect bits will
"go away."
Any error that recurs at the same place in storage after
the system has been turned off and on again indicate a permanent
hardware error and a message is sent to a log or to a system
administrator indicating the location with the recurrent
errors.
At the 64-bit word level, parity-checking and ECC require
the same number of extra bits. In general, ECC increases
the reliability of any computing or telecommunications system
(or part of a system) without adding much cost. Reed-Solomon
codes are commonly implemented; they're able to detect and
restore "erased" bits as well as incorrect bits