[X Servers] XServe RAID 1 (mirroring)

Peter Clarke P.J.Clarke at westminster.ac.uk
Tue Jul 1 09:05:46 PDT 2003


>Peter Clarke wrote on Tuesday, July 1, 2003:
>>Here is an interesting scenario:
>>    You want to replace the faulty disk and re-raid.
>>    The server has to be taken off-line,
>>    The replacement disk fitted.
>
>This actually surprises me.  I thought all of the drives in the 
>XServe were hot-swappable?

	They are.

>  As such, I would have assumed that one could hot-swap out a bad 
>drive and let it re-mirror automatically.

You may be right - I did say I didn't have one

But it does not re-raid automatically - it has to be done manually.
- at least that's as far as I have been able to find out.

- This is an example of where Apple don't provide enough information
- I would like to see the 'exact procedure' fully documented in their
technical notes - but it's not, it's hardly even mentioned.

Instead they offer telephone support to tell you how to do it
- provided that you have taken out a maintenance contract with them
- (which is perhaps a good idea anyway)

Really though, such a procedure should be clearly written down in the 
full technical documentation.

- Perhaps someone (who actually has an XServer and has done this) can
correct me on this - but I think the process I outlined (not fully)
is what's needed. (I didn't say you need to run 'disk utility' to 
re-raid the disks.)

Personally I wouldn't be happy buying an XServe without knowing 
exactly how to fix it - to my mind Apple not documenting this in the 
tech docs, is not good.
- I even asked this question at a seminar, and got the speal about 
taking out a maintenance contract then it would be OK.

A disk failure is the most likely cause of failure, being able to 
recover from this is essential. - It's bound to happen at some point, 
might take several years but eventually it will happen.
By the way the 'SMART' drives usually give you some warning about 
drive problems prior to a major failure.


>  >   The server booted from ANOTHER disk
>>    (either firewire or CD)  (During the re-raiding process)
>>    - You can't run off the second disk 'during re-raiding'
>
>OK, that is somewhat nettlesome.  I suppose I could have a third 
>drive as an emergency boot volume.

	- that's what I was suggesting, keeping it synced using psync.
	- and a cron script, to backup 'changed' files.

	That way you can reboot from that drive, and re-raid the others.


>  This XServe will be at a co-lo facility, so traipsing in with an 
>external drive to plug in is not that convenient.  (Admittedly, if I 
>have a drive die I'll be visiting the co-lo anyway -- so maybe 
>that's not a valid concern.)
>
>But three drives would, I think, be overkill for this server.  I 
>don't need 99.99% uptime.  99% would be more than adequate.

Two's probably good enough then.
- But at least it's nice to know what your options are.

By the way: '99% uptime' actually means 3 whole days per year off-line...

>Thanks for the feedback.  I'm thinking now more along the lines of 
>two drives with separate, bootable, volumes.  I could install 
>Retrospect on both and have the main volume backup to the secondary 
>drive every 6 hours or so (create two partitions on the second 
>drive, one OS and one backup volume).  A failure of the primary 
>drive would only require a reboot from the secondary drive and a 
>restore.  It might even be possible to accomplish that remotely if 
>it's not a hardware failure.

Where the raid option helps though, is that if a drive fails
- the system keeps running
- you can then fix it when it suits you. (more or less).

With the method you mention - when it fails, it goes off line until 
someone fixes it.  Although in this case, the 'minimum fix' would be 
to simply reboot the system - it would find the second drive then, 
and boot from that.

Hopefully then someone would tell you they had to do this, or you 
check regularly to see how it is.  (The Xserve has tools to do this.)

Using raided disks keeps things working, without having to reboot, so 
you get to pick when to do the fix.

-- 
Peter Clarke, Harrow Campus Computing Services,  Technical Support. (Projects)
Ext 4685; Room L1.07; Learning Resource Centre.



More information about the X-Servers mailing list