Intermittent outages on 2016 (Resolved) Issue / Performance
19 days

All work was completed successfully and things have been moving smoothly since Saturday morning. The new configuration was tested and failover is now behaving as intended. 

Update 02/21/2020 06:50 AM 23 days

We successfully replaced the core router with a spare that we had on hand. We will still replace the unit with the shipped replacement from Cisco Friday or Saturday evening. This NOC will be updated once the swap out is completed.

Update 02/21/2020 05:49 AM 23 days

We will be performing the hardware swap out in 10 minutes. Service is expected to be back online by 1:15AM Eastern

Update 02/21/2020 05:11 AM 23 days

We're in the process of reprogramming one of our spare Cisco routers in order to temporarily replace the current master. We've elected to go this route to allow us to flip back to the original master without a conflict of configurations and addresses. 

We will attempt to activate the spare router in one hour at 1AM Eastern. During the swap, users will not be able to access their mailbox. We've allocated a 45 minute window to perform the swap. If we are unable to complete the swap in the time frame we will revert the changes and continue to work on a temporary solution.

Update 02/20/2020 23:03 PM 23 days

Our team has been working on this issue all day and after consulting with Cisco we are going to attempt a temporary solution that will hopefully minimize the impact to the 2016 infrastructure until the routers can be managed on Saturday, during our weekly maintenance interval where we can minimize impact and continue to serve our clients with minimal interruptions.

We understand this is a serious issue as any impact to performance is an impact to the productivity of the people we serve. We have been fortunate that Outlook is fairly resiliant and have only had a few complaints. Mail isn't bouncing, and disconnections only last from a few moments to a few minutes and happen every few hours. We will attempt a workaround at 2AM EST and hope to be able to restore performance to normal at least until Saturday when full maintenance cycle can be performed. In plain English: we're trying to patch it along so it can make it to the weekend where we can sustain a potential outage of 30 minutes.

Update 02/20/2020 15:33 PM 23 days

We've identified an issue with a core ingress router to our Exchange 2016 network. We will replace the router during our core maintenance cycle this weekend. During the router replacement, we will actively keep connections from opening while we validate the performance and stability of the replacement router. We'll update this posting during the upgrade.