Anatomy of a Meltdown: Lessons in Crisis Management

Have you seen my website lately? Neither have I. Actually it's back up now, but due to a massive crash at my hosting company - IX Web Hosting - my site was down from Sunday morning through Thursday afternoon. That's right. Four days.

In my prior life as VP/GM for a global supply chain company, one of the first things prospective customers would grill us on was our IT disaster recovery plan. Apparently IX didn't get the memo.

Don't get me wrong. I understand nothing in life is perfect and "stuff" happens, but to be down that amount of time in this day and age is mind boggling and inexcusable. Luckily only my site is on IX and my email is on a different service so my business was only marginally interrupted. Many other IX customers weren't so lucky, as the screams of pain on the IX incident status page illustrated.

Despite the angst - and thousands of dollars in lost revenue from many customers - there are valuable customer service and crisis management lessons to be learned from this mess. I'm not just talking about the technical aspects of proper backup hardware and an effective recovery process. That's a subject some one else can handle better than I can. I want to focus on the way IX handled the crisis from a customer perspective - not good to start, better near the end. Here are my takeaways from the event.

1. Communication is key - No matter how bad the situation, always be open and honest about what happened and what you are doing about it. To their credit, IX had people handling the irate messages on their status board on an ongoing basis. Unfortunately, for the first couple of days of the incident, it appeared management was simply pushing these poor people up out of the trenches and into the line of fire without any ammunition.

Customers wanted to know what was happening and more importantly, when they were going to be back on line. The only answer these unfortunate folks were armed with was "we do not currently have an ETA." After a couple of days of that response, angry customers were doing everything but marching on IX with pitchforks and torches.

Finally on Tuesday, two days after the crash, IX posted a schedule for when each server would be restored and a way to determine which server you were on. Mine was scheduled for Thursday at 3 pm. I wasn't pleased, but at least I knew where I stood. Telling people what was actually going on and how long it was going to take had to be painful for IX, but it let customers know the end was near. My site was actually up a couple of hours earlier than promised, so there's that.

2. Focus on the problem.  In the first hours following the crash, the IX folks got sucked into dialogue about compensation and arguments over up-time claims. That did no good for either IX or the customer base. The message needed to be "all our effort right now is on solving the current problem."

3. Be honest. Quickly.  IX eventually put up a pretty detailed explanation of what caused the outage, but until they did, people were spewing all kinds of sinister theories. Sabotage, terrorist activity, disgruntled employees, plagues of Egypt. The lesson here, despite what Colonel Jessep said in A Few Good Men, is that people CAN handle the truth. And if they aren't told the truth, they will invent scenarios and theories that would make the folks at TMZ blush.

4. Provide corrective action. The damage that this crash caused to IX's business will be enormous. They might be able to stem the exodus of customers if they quickly come out with a detailed plan on what they are going to do to prevent this from ever happening again. When we had a problem at our supply chain company, customers would ask, "How did this happen? You're ISO certified!" I would respond that indeed we were but that certification didn't come from the Vatican. We were not infallible. However, we would correct our processes and guarantee that the problem would never happen again. IX still hasn't explained this final step. Their incident status page simply says "resolved" and the latest on the company blog is a Happy Holidays post from December 26th.

My reaction during the outage was that I was going to move my site off of IX as quickly as possible. Now that it's back up, I'm not so sure. If IX can provide a clear, reasoned plan on what happened, why it happened, and what preventive measures they plan on putting in place for the future, maybe I'll reconsider.

Have you had to deal with a serious issue from either the customer or vendor side? How did you handle it?


Photo Credit: Jenn and Tony Bot via Compfight cc