[Warning: techie dribble to follow]
Yesterday was such a pain in the ass. One of our servers (the new Dell box) just up and died in the wee hours yesterday morning. The CSS switch didn't properly pluck it out of the pool of servers (that's it's job), so 50% of traffic to our site was being directed at a dead box.
The higher security of our data center really pisses me off. I miss the old days of dealing with Ticnet where you could just show up at 3am with a door code and stroll on in to where your cage is. It took half an hour to get somebody to escort me to the machine, then ask stupid, unrelated questions while I was trying to work.
It turns out that the EXT3 filesystem spewed a kernel bug (Assertion failure in commit.c). I don't know if this is a side effect of a hardware problem, or indeed a kernel bug. As a result, /bin/login would not run, and I couldn't login from anywhere on the machine. I finally got the machine back up and running fine(?) for at least the meanwhile. We also resolved why the CSS switch was not properly monitoring machines in the pool: Slacker admins who don't do what you ask them (what the hell do we pay them for, heh?).
I was up at work late last night (till 9:15) setting up monitoring tools to fill up my mailbox with periodic diagnostics. I was fairly tired by the time I got home, and went to sleep relatively early (for a Friday night). Here I am, now after 8am, awake on a Saturday morning. I hate it when that happens, cause I'm going to be a zombie tonight.