[X-Unix] dirty restart problems ...

Kevin Stevens groups at pursued-with.net
Thu Jun 10 09:33:29 PDT 2004



On Thu, 10 Jun 2004, William H. Magill wrote:

> On 09 Jun, 2004, at 13:59, Kevin Stevens wrote:
> > ... Power cycle is the only way I know to resolve the situation.
> > Followed by a reboot, BTW, as one REALLY, REALLY, BAD THING about
> > OS X 10.3x is that the system will not come up reliably after a
> > power cycle.  I don't mean unreliable as in fdisk gets stuck and
> > you have to manually intervene, I mean unreliable as in it appears
> > to come up properly, after an expected interval of "checking local
> > disks", but is not in fact running critical services such as mail,
> > DNS, or remote access (ssh).  You HAVE to do a normal reboot after
> > a dirty start in order to ensure proper operation. This is a problem
> > if you aren't physically at the box, because you can't log in remotely
> > to do it.
> >
> > My assumption is that the rc dependencies aren't bulletproof, but
> > whatever the cause,it's a very, very bad thing.
>
> I started another thread with this because it is a very different and
> important problem, which needs to be addressed.
>
> In a word, I believe the answer to your assumption is "Yes!"
>
>  From what I can tell the new concepts of the boot process are far from
> well thought out.
>
> They only address the "optimal" situation -- where everything works "as
> expected."

Agree.  Now, how do we get Apple to address it?  I can't imagine opening
an AppleCare case for this, is there a developer mechanism for initiating
trouble tickets?  At the very least they could include the OS X Server
watchdog utility, and use that as a sanity check.

> I happen to have 10.2.8 running on my Beige G3 along with 12-9gig Ultra
> SCSI disks, and any time I have a crash or need to force a reboot
> without a clean shutdown, I have to do it twice. WHY? Because certain
> things which need to be running are not running, and things which
> depend upon them simply fail. (lookupd being the most notorious) and
> even though the system "comes up" and "looks ok" from the single user
> GUI (login window) point of view, it doesn't work correctly. [And yes,
> my G4 with the 160gig FW boot drive has the same problems, but
> differently.]

It's good to know this was a 10.2 issue as well; I wasn't running OS X in
a server role then (and frankly didn't have many dirty startup situations
before the syslogd issue).  I didn't realize it went back so far.

> Granted, a portion of this is the way the daemons are written, but the
> reality is that some of the startup process "take time" before they are
> working, and part of the startup process expects everything to already
> be working correctly when it is "its turn" and no mechanism exists to
> hold those later parts until former complete. (Or if it does exist in
> the new boot scheme, it doesn't work correctly.)

That's my perception from reviewing system logs, as well.  This is
fundamental stuff, it is as unacceptable from OS X as it would be for
Solaris/*BSD/Linux.

> If I remember correctly from my pokings around a while back, lookupd is
> the base culprit here. Given time, the pure unix stuff all works
> correctly, but anything which depends upon lookupd is screwed. It's not
> that lookupd is not running, it is, but it is giving out wrong
> information. One work-around is to change the look-up order to include
> flat-files. That way any of the pure unix apps will work "as expected"
> without lookupd getting in their face.

Have you verified that workaround, or is it a hypothesis?  Lookupd seems
likely, as it's a common element to the things that don't work, but I
didn't think that changing the lookup order took lookupd out of the
picture, it just gives it a different input source?

KeS



More information about the X-Unix mailing list