Our IT team is obsessed with redundancy and providing a 100% network and power SLA, as shown here. One thinks and brainstorms about clusters, automated failover, redundant software and hardware that would make sure that there is no single point of failure. One recent example is the recursive DNS servers, which is now served by 2 geographically balanced fail-over cluster. Others worry about natural disasters, thunderstorms, fires, denial of service attacks, hacker attacks, etc.
For iWeb, we’ve been through all of those incidents. We had an incident yesterday though, which was a fiber cable cut.
This kind of incident is frequent, with the IT team drilled and prepared to face cuts, with traffic rerouted to one of the multiple other providers. The cause though was a very rare occurrence; which was in this case due to a squirrel which chewed the cable.
I have to admit squirrels weren’t forethought in the redundancy plan. We’re currently looking for the culprit, thus the poster… In all seriousness though, this incident proves that even if you spend weeks and months planning for 100% availability and 100% scalability, and doing functional and integration tests, nothing beats doing it for good to see how it performs in the real world … and to be ready for the squirrel!