IT Services incident report on recent outage

by Kathleen McCoy  |   

Service disruption: Sunday, Nov. 8, 4:30 p.m. to Thursday, Nov. 12, 6:30 a.m.

This report provides summary findings concerning a major disruption of multiple technology services that occurred between Nov. 8, 2009, and Nov. 12, 2009. Its purpose is to provide our customers with a general understanding of what caused the disruption, specific impacts and a chronology of events, the root cause of the disruption and the specific steps currently being taken to avoid future occurrences.

IT Services' staff are acutely aware of the impact that this disruption has created in University operation and student's classes. The nature of the hardware failure that resulted in this disruption has been isolated to only a handful of other organizations within the U.S. Still, we regret the disruption and appreciate the patience of the UAA community as we worked through the underlying problems with our vendors.

Overview

Beginning at approximately 7:30 p.m. Sunday, Nov. 8, IT Services responded to reports of major system failures. After technicians arrived on-site it was discovered that a Hewlett-Packard (H-P) enterprise storage array in the Anchorage Data Center (ADC) had failed earlier that afternoon at 4:30 p.m. The storage array houses over 25TB (terabytes; equivalent to roughly 25,000,000 megabytes of storage). The failure was attributable to a rapid, progressive failure of multiple disk drives within the array, which in turn caused corruption of critical data housed on the array. Working with H-P, the storage array hardware was stabilized around 7:30 a.m. Monday, Nov. 9. Restoral of corrupted files required access to backup tapes and a tape-to-disk restoral process that was finally completed Thursday, Nov. 12 at 6:30 a.m.

Specific Impacts

  • UAA Web site, UAA WiFi services, eLive!, and Moodle were unavailable from 4:30 p.m. Sunday, Nov. 8 until 7:30 a.m. Monday, Nov. 9;
  • Anchorage employee e-mail (Campus Exchange only) was partially unavailable from 4:30 p.m. Sunday, Nov. 8 until 10 a.m. Wednesday, Nov. 11 when full service was restored;
  • UAA Blackboard™ was unavailable from 4:30 p.m. Sunday, Nov. 8 until 6:30 a.m. Thursday, Nov. 12 when full service was restored;
  • There is a possibility of limited Anchorage employee e-mail loss for messages sent/received between 12:54 a.m. and 4:30 p.m. Sunday, Nov. 8. Student e-mail was unaffected;
  • Blackboard™ content, assignments or grades submitted between 4:30 p.m. and 7:30 p.m. on Sunday, Nov. 8 will need to be re-submitted.
Click here to read the full report.
Creative Commons License "IT Services incident report on recent outage" is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.