Search this site:


Categories:

September 28, 2004 09:45 AM

Broken: Air traffic control run by Windows

From Boing Boing today:

Southern California air-traffic systems were migrated from stable, Unix-based systems to Microsoft Windows-based PCs in the past three years. These systems required regular reboots - and when a tech failed to perform the reboot correctly, the systems died and wouldn't come back up, stranding 800 planes in the skies over Lalaland.
A TechWorld article has the full story.

Comments:

Just an observation here: The boingboing poster, Cory Doctorow, threw in that term "stable" (referring to the old Unix systems), implying that this is yet another example of the evil that is Microsoft. But if you actually take the time to read the articles linked to, there is no basis to conclude that the old systems would have been any safer.

I can identify at least three points of failure outside the requirement to regularly reboot the Microsoft systems: 1) "Fixing" the problem by implementing a manual workaround; 2) "Improper training", as they put it, of the guy whose job it was to reset the systems; and 3) the mysterious failure of the backup systems -- perhaps the old Unix systems? -- that is only briefly alluded to (the L.A. Times also blames that failure on human error, but does not elaborate).

In short, yes, something's broken here, but it goes far beyond a simple "They shoulda stuck to Unix." I would really like to know what the rationale was in switching to Microsoft systems 3 years ago. Maybe there was a sale at dell.com -- but maybe the Unix boxes just didn't work.

Posted by: E.T. at September 28, 2004 12:50 PM

There's no reason other than poor programming that would require a reboot every 49 days. When looking at the details, it seems more likely that they rely on a 32-bit time (miliseconds) counter, and it overflows in 49.71 days. So, ANY platform would have this same problem if you have incapable people writing software. This wasn't anything intrinsic to Windows.

Posted by: Michael Giagnocavo at September 28, 2004 03:48 PM

This has absolutely NOTHING to do with Windows... I hate this pointless Microsoft-bashing you see all the time.

Microsoft *did* have a system library that measured time in a 32-bit milliseconds timer... in Windows 95! It was fixed in 98, ME and never even existed in Windows NT 3.5, NT 4, 2000 or XP. Even the original bug in Windows 95 was patched... you can download it from Microsoft's tech library.

But for some reason, whenever this article comes up, it's always the switch to Windows that's the fault. How stupid! The code they were running that required the reboot would fail in Linux, Windows, Unix or MacOS... it's buggy code! Buggy code is buggy no matter what the operating system is.

People accuse Microsoft sometimes of spreading FUD (Fear, Uncertainty, Doubt) by putting out press releases that say things like, "Linux solutions that fewer support avenues than Microsoft solutions." I guess the FUD goes both ways.

Most of the people who write crap like this have Windows on their home machines anyway, the hypocrites.

Posted by: James Schend at September 28, 2004 03:58 PM

About Windows... It can be unstable at times But Only If you have done something to make it that way. (I know.. my laptop has XP and it became corrupted from SP2. Old Win 95 desktop comp got stuck in DOS for some reason) -But usually it goes awry only if you've done something wrong to it. Now about the Air Traffic Control being run by windows. The McCarren Airport here in Las Vegas, NV went completely dark a month or two ago. No radar, no lights, and planes were landing 'at their own risk.' The reason for this - Someone forgot to Reset the Air Traffic Control System and so it shut itself down.

Posted by: brandon at September 28, 2004 06:19 PM

Just as a bit of an aside... If you go to the Techworld article, there is some related advertising. So the article title "Microsoft server crash nearly causes 800-plane pile-up" is juxtaposed against the ad "Make a name for yourself with Windows Server System".

(It might take a couple of refreshes to get the Microsoft ad to cycle up).

Posted by: Chris Law at September 28, 2004 09:08 PM

Um, the problem was crappy software that couldn't handle running for 49 days straight. (That number is significant because Windows keeps the system uptime as milliseconds in a DWORD, 2^32 ms ~= 49 days. When that quantity rolled over, the program broke.) Crappy software exists on all OSes, not just Windows.

Posted by: Jacques Troux at September 28, 2004 09:53 PM

The more fundemental brokenness is the idea of running *everything* on your computers. Regular old solid state radios don't require rebooting. They just work. When they fail you fix the component or replace the thing altogether.

Posted by: Reed at September 29, 2004 11:03 AM

Please... don't blame Windows for every darn thing, even when programmers screw up and sysadmins don't cluster servers like they should.

More clued-in discussion here on Joel on Software:

http://discuss.joelonsoftware.com/default.asp?pg=pgDiscussThread&ixDiscussTopicParent=9430&ixDiscussGroup=3&cReplies=16

and

http://discuss.joelonsoftware.com/default.asp?pg=pgDiscussThread&ixDiscussTopicParent=9845&ixDiscussGroup=3&cReplies=27

Posted by: MadMan at September 29, 2004 03:44 PM

I am a programmer who programs in and uses Microsoft products all day long. So don't bother labeling me a communist Unix user. :)

I just want to point out that trying to place blame on the programmers instead of Microsoft reminds me of a situation about 100 years ago. The automobile industry was rapidly growing - and so were deaths due to accidents. These deaths were blamed on human/driver error - not on the car. How could a machine be responsible for an accident? The stupid human was telling the machine what to do, and hitting a tree is not good driving practice.

It was not until automobile manufacturers got over themselves and realized the cars should include safety features that accidents and deaths declined.

In this case it may be true that bad programming practices were used, but look at what they were working with - OSs that can't count past 2^32. Whose fault is it really?

Posted by: dege at September 29, 2004 04:36 PM

dege, you miss the point...

Unless this important air traffic control system was running on Windows 95, Windows *can* count past 2^32. It was ONLY EVER a bug in Windows 95... it's been fixed for almost a decade now, and it never even existed in the Windows NT line of operating systems. You can't possibly blame anything about this on Windows.

Posted by: James Schend at September 29, 2004 05:32 PM

800? that sounds a bit exagerated but even if it was that high planes are required to have 45 minutes of extra fuel, their are alternate airports, and im sure the pilots diddnt care since their getting paid for it

Posted by: Matt at October 4, 2004 04:26 PM

800 is close to true, though NYT and LAT are not really worth lining the canary cage with.

It was quoted as that from several different media that day. Planes had to be rerouted, international and national. I thought it was due to a terrorist incident.

Posted by: Scott Packard at October 7, 2004 11:47 PM

Comments on this entry are closed



Previous Posts: