Incident: Network infrastructure – IWeb-NE Data Center (Resolved)

February 17, 2015 by iWeb Technologies in: Status

RFO (Reason for Outage) Report

Event Description: Network outage for the VPR, BareMetal servers and Cloud
Event Start Time: February 17, 2015 06:18 EST
Event End Time: February 17, 2015 06:39 EST
RFO Issue Date: February 19, 2015

Scope of Impact Summary

On February 17th, the distribution routers NE-DR11 and NE-DR12 lost their upstream connectivity to the core routers because of high CPU usage.

Event Timeline

Please note that all times listed in the timeline below are in 24-hour clock format, and refer to Eastern Standard Time.

February 17, 2015

06:18 CPU usage of NE-DR11 and NE-DR12 reached 100%, BFD brought OSPF down and the routers loss their BGP session to the core routers
06:25 the technical team started to investigated the issue
06:28 the technical team found that many hosts created a broadcast storm.
06:30 the Vlan intefaces where the host resides were shut down
06:30 CPU usage levels started to subside and the routing protocol OSPF, LDP and BGP converged
06:30 connectivity to the BareMetal servers and Cloud was restored
06:35 The technical team noticed that the uplinks to the aggregation switches for the VRP were down in err-disable mode
06:38 the ports to the aggregation switches of the VRP were brought up and the network connectivity was restored

Root Cause Analysis

iWeb technical team isolated the issue to a broadcast storm originating from several server of one of the client.

Next Steps

In the next 3 months, a more aggressive storm control policies will be implemented on each of the physical interfaces facing the client’s servers in order to protect the distribution routers.

Update #2 – 2015-02-17, 6:39 am EST: – The network incident has been successfully resolved. In the coming hours, we will monitor the performance and the stability of the affected services to make sure that all is working normally.

We apologize for any inconvenience this situation may have caused. If you have any questions, please do not hesitate to contact our support team.

Thank you for your patience and understanding.

Update #1 – 2015-02-17, 6:30 am EST: Our Classic Servers services are back online.

Our technical team is actively working to fully restore all our services back to normal.

We will continue to communicate all updates.

Start: 2015-02-17, 6:18 am EST
Impact: Network problem on one segment of our network in our iWeb-NE data center

We are currently investigating a situation that has affected a segment of our network infrastructure in our iWeb-NE data center. Our Classic Servers, Virtual Private Racks and one segment of our Cloud Servers services are affected by this incident.

Our technical team is actively working on finding the cause to this problem.

We apologize for any inconvenience this situation may cause. If you have any questions, please do not hesitate to contact our support team.

Thank you for your patience and understanding.

Comments

No comments yet.