I run a hosting company so I know all about outages and unexpected issues.
They can and will happen.
No matter what happens you will never please all of your clients. You can try, but you won’t succeed. It’s not that your best efforts won’t be good enough, but simply that there will always be an “expectation gap”. The best you can hope for is pleasing most of them most of the time.
One thing I have learnt (the hard way) is that being honest and upfront can be both painful and rewarding.
You need to swallow your pride and “eat humble pie” when you screw up.
No matter what you as the provider may think, as far as the client is concerned their little site is the centre of the universe.
You can say what you like behind closed doors, but you need to be respectful in public.
I’m not saying that we all manage to do this all of the time or that it is easy, but it’s the sanest way of dealing with things.
Some providers simply don’t “get” that.
Stephen McCarron says
Honest and upfront? What part wasn’t?
We experienced an issue today that lasted 18 minutes, and collateral issues that extended up to an hour after the primary problem was resolved.
We openly and rapidly apologise and gave updates of what was going on the moment we had them.
In fairness, I’m super proud of the team and how they handled things, on phones, live chat, email and the status site.
David Behan says
Ah Steve, it was more like 2-3 hours before things were fully back to normal from an outsider trying to get into the websites/emails. It was just after 1pm things went wrong and it was about 3.15pm or so when I could finally get access to email/websites again. 3 separate years, 3 separate site wide outages! I was going to just keep my reseller account but you just spurred me to get all my sites over onto my dedicated server with blacknight. I feel for you Steve – it must be so difficult to deal with – but you really must look into redundancy a lot more. I thought you’d learned lessons from the last 2 failures. I can’t recommend you at all anymore. I know it’s a leap year and all but you needed to be hosting366 this year! 😀
Michele Neylon says
Stephen
I think it may have been your intention to handle things in an honest and open manner, but I don’t think the public face of the situation ie. your status blog, was managed very well today.
Regards
Michele
Stephen McCarron says
In fairness to Ed, I think he did an excellent job. Updates were posted in real time, as we had the information. To be honest I see no difference between this and the outages you cover on your own blog (I see your offsite status blog hasn’t yet been done) just that perhaps we get more flack as we impact more people.
To be fair also, the volume of trolls, posts from competitors and other hosts was quite amazing (more than 30 posts had to be culled)
I’ll be posting an official statement in the morning.
Michele Neylon says
Stephen
It’s how you handle the trolls etc., that makes all the difference 🙂
Michele
Stephen McCarron says
@Dave To be completely honest with you, some customers were completely unaffected and others were affected for more than two hours.
98% of customers were fine within an hour.
You can plan and invest as much as we have and still experience issues – look at http://url.ie/8xh as an example.
After the first power issue (substation fire) we built a new substation and MV room, after the second issue, we built completely new boards, UPS and hired one of the best facilities managers money could buy. This is a completely unrelated issue that was simply compounded by a DNS screwup (more on which I’ll be officially posting tomorrow). To say we haven’t learned any lessons is not being fair to us. Also, we’ve had a total of under 3 hours outage in 30 months. Including DNS resolution and all other ancilliary and third party impacts, that number is still less than 7 hours. That still better than 99.999% uptime for that period. I’m not throwing any stones, but even Blacknight have suffered two outages since last October that have been pretty much site wide also.
There are no excuses, only plans to fix, I;d worry more about a provider that’d never had to deal with any problems!
Ewan Oughton says
I would have thought you dudes would have been all over the redundant power problem after the last epic outage.
Stephen McCarron says
@ Ewan – it was nothing to do with redundant power – power and connectivity was maintained 100% throughout. The issue was cooling. The chillers were down for about 20 minutes while we replaced a fuse, the problem was the DC temp rose to nearly 50 degrees C in that time, requiring shut downs to protect the kit.
We’re looking at beefing up the cooling to add redundancy.
Ewan Oughton says
Stephen: It was clearly an issue with redundant power if replacing a fuse brought down your HVAC.
Paul Kelly says
Steve,
Your dns was down for 3+ hours. Effectively crippling your services.
I have heard stores of machines only coming back up after 5pm etc. The issue definately lasted more than 40+ minutes, you really need to admit this in public and start writing cheques to people who’s SLAs were breeched.
Paul
Peter Kelly says
Having read that blog post as a complete non-client of h365, all I can say is WTF. If I was a customer the kind of run-around I just read about would be completely unacceptable to me.
And lets face it, based on the information been given to the clients, and also the replys from Ed Byrne – lead me to conclude that h365 were trying to fight a fire with a glass of water instead of a hose – and in same cases less then fully honest with some people.
Stephen McCarron says
@ Paul – appropriateness of a competitor stating facts aside (we certainly don’t post about your issues) – what you are saying is simply untrue. The DNS issue compounded problems, but any claim of 3+ hours is DNS caching, and not our DNS ‘down.
Stephen McCarron says
@ Peter – can you give me more details on what we could have done better? As I said before, we’re the only provider that even has an offsite communications tool to keep customers in the loop.
I’ve posted a statement at http://www.hosting365status.com
Peter Kelly says
Again I am speaking from a non-technical, non-customer background – so asking me what more you could have done is kind of a moot point.
However in the glaring face of some comments left by your customers and the replys from Ed Byrne – there were obviously huge gaps in communication and between what Ed Byrne knew/was relating. Especially when drawn up against the reported issues that your customers were experiencing. The fact that one (and I’ll admit its only one out of the 200 comments that I can remember) said that the customer service support lines were less then adequete, in fact I’ll quote them as saying
“by the way, i raised this issue on the premium support line at 12.58 after several minutes of ping timeouts to hosting365.ie. the guy told me “we have no reports of any problems” and more or less implied that it was my problem. he did say i could follow it up with an email for a more detailed investigation, but i would have appreciated a little more initiative on the part of H365.” – tim mackey
Stephen McCarron says
Hi Peter,
Appreciate what you are saying but, even from our own checks a great deal of the posts on that item are not customers, but rather competitors and trolls.
Our live chat, ticket and phone teams were talking to customers throughout, our sales teams and account managers were ringing their customers proactively, and the status blog worked exactly as it is supposed to – relaying information as we have it.
The only issue I have is third parties (with vested interested) ‘stirring’ the issue, and representing themselves as customer or ‘concerned parties’ when neither is the case.
As I’ve made a formal statement on our own status site, I won’t be responding to external blogs any further.
Rahood says
@”The DNS issue compounded problems, but any claim of 3+ hours is DNS caching, and not our DNS ‘down.”
The fact that 1-2 and 3 went dark is the issue. Cooling within the room is a total non-issue imho. Am I correct in reading into it; that all 3 of your DNservers are in the same rack/room!! because if thats the case and as more information seeps out it would seem to be very much the case cooling is the least of your problems.
Sean MacGabhann selling e100 worth of tat through his sh1tty little site has no understanding of just what a DNS cache is, in the same way he does not understand the difference between F5 and Ctrl-F5, click on IE and he is ‘surfing the net’
A fuse blew and we lost cooling……….. you know as well as I do that the fuse only served to highlight a pressing issue within your facility. One I myself noted on a tour, pointed out, received no joy, shrugged it off and took out a 5 year plan with another provider (not Blacknight btw)
Rahood says
Just so we know I moved 19U into a half cage and they done did give me a proper service.
Michele Neylon says
Rahood – I’m a bit confused by your last comment. Who gave you proper service? Hosting365 or the other company you mentioned, though you didn’t name.