As per our post mortem from last week – we had a larger interruption due to our UPS upgrade which did not go as planned. During the 26 second power outage we did get some power surges that did cause harm to some hardware – but also created some corrupt software and data bases. This is something we understood but needed to fix a few days later when things had calmed a little.
As per cnstatus.com we were to do follow up maintenance early this morning and it was only to affect the admin interface. We started the maintenance work at 01:30 and it was intended to last for less than 1 hour. However the power outage and surge had affected other aspects of the environment in addition to what we knew we had to implement fixes for. We were to restore some databases and make sure all software did not have any form of curruption or inconsistencies. However some hosts (blade servers) were not acting as they should after our updates and we realize there were still inconsistencies affecting some VMs running on a few specific hosts. We believe all the inconsistencies were created during the power failure as some disks were corrupt.
When data bases are again updated and making sure the mangement nodes see exactly what we have in the datases – things looked better and we started to boot up those VMs that had been affected. Less than 100 servers were affected by this but those VMs were on and off during the early morning hours as we had them up a couple of times but had to take them down to make sure are information was completely accurate. The last start of VMs begun around 7:00 am and most VMs were then up and running with a few taking yet some more time to get completely up and running.
At this point we do not see any inconsistencies and all hosts are running in a normal way. We will continue to monitor closely and report anything that might affect operations on cnstatus.com as always. We expect this was the last issue related to the power failure from last week when our partner companies worked on getting or new UPS-system ready for our improved data center.
We do apologize for the inconveniences this may have caused you.