Matt Makes Excuses, Do You?
Here’s a snippet from a recent post on Matt Cutts’ blog.
![]()
I had to wait for PayPal to come back up to send some moolah. Every website has downtime now and then; (emphasis added) it’s just a bother when you want to send some money that second…
Sorry Matt but I emphatically disagree. The entire Internet experience (and perhaps many business’ profitability) would be enhanced if the movers and shakers of the ‘net would just get off this “last century” attitude of accepting failure. the technology is available to see to it that this does not happen, but as long as we are going to operate in the “failure is inevitable” mode we will get we we expect.
Especially for a huge money handling site like PayPal or a site that serves millions of users, like BlogSpot, perhaps), these unplanned outages should be a thing of the past. real-time server mirroring combined with geographic diversity of the servers (to avoid single points of failure) are “nickel-dime” in relationship to the money at stake … but we accept excuses instead of progress.
Popularity: 1% [?]
When Will They Ever Learn ….
Pete Seeger penned those magic words back in 1961. They made some money for him and a lot of money and fame for Peter, Paul & Mary … but even though the song’s been around longer than many of today’s business IT execs have ever listened. Just look at yesterday’s news:
Close to 300 flights were delayed or canceled Wednesday after United’s flight operations computer system Unimatic, which supplies information to pilots, shut down from approximately 8 a.m. to 10 a.m. CDT.
Chief Operating Officer Pete McDonald said the error occurred during routine system testing.
“Yesterday, an employee made a mistake and caused the failure of both Unimatic and our backup system,” he said in the recorded call to employees. He did not elaborate on the error. …source here:
Well, I’ll elaborate on the error … of course I don’t get paid millions per year to drive a company into bankruptcy and cheat the employees, but I have been keeping computer systems and other ops-critical equipment running for nearly 40 years.
People have this annoying but inescapable trait. We make mistakes. We can browbeat employees, we can spend a fortune training them, we can hang posters that say “caution” and we can fire them, after the fact (bet the guy or gal McDonald was talking about is already sending out resumes) but the errors will still happen. Or, as we used the say in Space Command, “anomalies occur”.
Mission critical systems simply must be designed so that one person does not have access to everything. You can buy redundancy out the ying-yang but if you let the same worker hold the passwords or keys to both systems, shit … and I do characterize United’s business security methods as pure shit, amply demonstrated here …WILL happen.
This incident was 100% preventable at virtually no cost at all … except the seemingly insurmountable cost of forethought. Or so think I.
Popularity: 2% [?]
« Previous Page — Next Page »