Network Outages (1 comment)
Saturday, August 27, 0237 Central Time:
I'm pulled into consciousness by something loud. I find my way out of bed. The floor, cold and hard, greets my feet.
The alarm clock. Walk over, flip the switch. That should be the end of the racket. Bathroom next, then clothes.
Wait.
I force my eyes to focus on the green figures. That's early... too early for the Listserv migration. Or is it? I have a condition. I get confused sometimes when I haven't had my nap.
Walk over, flip the switch. The numbers change to 7:20. Damn thing isn't supposed to go off for almost five hours. Then what? It must be that god-damned pager.
Still barely able to think, I try to find my pants. I usually leave them by the cell phone chargers, but not tonight. I stumble across the cool almost-wooden floor toward my dresser, nearly falling over a hamper of clean laundry on the way. I once again find myself in front of the glow of the clocks. I feel around with my foot. There's something odd there. It's lumpy and hard and moving.
My keys!
I pick up my pants, feel around the belt, and grab my pager. I need to find some light to read why the damn thing was so upset. I should have remembered that the pager's got a back light, but like I said, I've got a condition.
There's the hallway, flooded with moonlight. Finally, I can see something. Not enough to read the pager's display, though. I step in the guest bathroom and flip the switch.
Pain.
I open my eyes to a squint just long enough to read the news: SSH to bismarck failed. That son of a bitch.
Down the stairs in the moonlight, across the front room to the datacenter, where I'm greeted by a familiar blue glow.
The pager, my personal siren, screams its song of hate.
Another system. Coincidence? It can't be, not like this. There's something larger at work here, and I need to get to the bottom of it.
I'm beginning to feel alert. There's the keyboard. Sit down, move the mouse. The monitor comes out of suspend mode and I feel for it. I'm sorry buddy, but we're in this together.
It's time for action. I jump to my SSH window that's connected to eddie. It's not responding. Shit.
I throw open a new tab in Firefox. First step: turn off the automated paging. I send my browser on its way, but I'm met with failure. Ok, maybe it's localized. Try status.uiuc.edu. Same thing.
Now what?
I know it's a network problem. I know that if things aren't fixed soon, my pager will only get worse. I know that it's out of my hands. I know that if I'm needed, the opcenter will call my cell phone.
I turn the pager off and go to bed.
At 0720, the alarm goes off. It's calmer this time. I've got scheduled maintenance. sunlight is making its way around the curtains, so I make my way downstairs without issue. Status loads. They've at least fixed that. But it tells me that UIUCNet is still in an awfully sorry state and the Listserv work has been postponed.
Back to bed.
As if that weren't enough, a crucial (for me) part of SBC's network broke in Chicago yesterday. It got fixed in such a way that my router got caught in a state where it could ping a few hops into SBC's network, but it couldn't get to anything useful. As a result, my script that pings SBC's DNS servers every five minutes and reboots the switch in the case of failure didn't reboot it that one last time that was needed. Around midnight, I manually power-cycled it and things got be'ah.
Since the advertised URL for my website (ck.cx) uses both my DSL and UIUCNet, that amounts to a total of 28 hours of network unavailability over three days. That pulls me down to two 9's of website availability this year:
((365*24)-28)/(365*24) = .9968