[G4] EIDE and Ultra ATA RAID

Wed Apr 27 18:48:20 PDT 2005

On Apr 25, 2005, at 4:40 PM, Ralph Garrett wrote:

> On Apr 25, 2005, at 5:51 PM, Philip J Robar wrote:
>
>> This is simply not true. Every drive you add increases the chance  
>> of the array failing.  See http://www.pcguide.com/ref/hdd/perf/ 
>> raid/concepts/rel_Rel.htm.
>>
>> Phil
>
> Sorry but that concept of reliability is based on flawed logic. (By  
> the same logic, buying multiple Lotto tickets would greatly  
> increase my chances of hitting it rich) For the RAID to fail, only  
> one drive has to fail. So the MTBF for the Array is the same as any  
> single unit. Adding more components doesn't increase the likelihood  
> of any single unit failing (as long as heat is being dealt with  
> properly).

No, it's not. And it's rather unkind of you to distort my claim in  
your analogy by adding the word "greatly" so as to make your position  
seem correct. Buying more than one lottery ticket does increase the  
chances of you winning. I don't have the figures to say by how much  
and since I'm not you I can't say as to whether the increase is worth  
the cost to you or not.

I made no claims as to the statistical strength of the failure rate  
as drives are added to an array, however there is no doubt that  
adding drives to any array does increase the chances of you  
experiencing the failure of an individual drive. This is why it is  
important to realize that a so called "RAID" 0 set is actually not  
RAID at all as it has no "Redundant" drives. As you increase the  
number of drives in a "RAID" 0 set up your chances of an individual  
drive failing increases and thus so does the chances of the entire  
array failing.

Assume an individual drive MTBF (Mean Time Between Failure) of  
1,000,000 hours. (Which is typical of server oriented drives and is  
becoming common for consumer drives.) You have a 4.29% chance of the  
drive failing within 5 years. Adding an identical drive increases the  
chance of a failure to 8.39%. Even more important to note is that the  
failure rate increases exponentially, not linearly.

True RAID works around this problem by having redundant copies of the  
data (e.g. RAID 1) or by check summing the data so as to be able to  
recreate it on the fly (RAID 5). For instance a two drive mirror with  
the above per drive MTBF has an array MTBF of 1,500,000 hours, which  
means a 5 year failure rate of 2.88%. In practical terms though most  
of us can count on a mirrored array never failing as long as we  
replace failed drives as soon as they fail.

To put things in a slightly different perspective, a site with a 112  
drive array will experience a drive failure about once a year. A site  
with 448 drives with loose 5 drives a year.

Since RAID 0 is not a performance win for most desktop users it  
really comes down to a personal decision. Does the convenience of  
having all of your drive space collected into a single volume offset  
the slightly increased, in the near term, chance of loosing and  
having to restore all of that data?

Phil

For a more detailed explanation of hard drive availability and MTBF  
(Mean Time Between Failure) see this excellent white paper:

http://www.zzyzx.com/products/whitepapers/pdf/ 
MTBF_and_availability_primer.pdf
http://tinyurl.com/7fp7q