Report Blames Northrop Grumman For Virginia Outages - InformationWeek

InformationWeek is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

IoT
IoT
Government // Open Government

Report Blames Northrop Grumman For Virginia Outages

A review of the week-long statewide network outage said it was caused by a combination of human error, faulty hardware, and a failure to follow best practices.

Government Innovators
Slideshow: Government Innovators
(click image for larger view and for full slideshow)

Faulty hardware and Northrop Grumman's failure to follow best practices were responsible for a statewide IT system failure in Virginia last summer that affected online services and network operations for a week, according to a report on the incident released by Virgnia Gov. Robert McDonnell.

The independent review -- prepared by Agilysys, an IT services firm -- found that the combination of the failure of a data storage system and then human error during an attempt to replace one of the failed memory boards caused the unprecedented outage, which affected more than 20 government agencies.

The report also faulted Northrop Grumman, which has a $2.3 billion contract to work with the Virginia Information Technologies Agency (VITA) to look after communications and computer services for the state, for not adhering to industry best practices following the incident. VITA was created in 2003 to maintain and modernize the state's IT operations.

The commonwealth's trouble began Aug. 25 when two memory boards that were meant to back up each other failed. Analysis by EMC, the manufacturer of the boards, said a so-called "electrical over stress condition at the component level" caused the dual failure, which resulted in a loss of data.

Following that, "human error during the memory board replacement process resulted in the incurred extended outage," according to the report.

The outage also was exacerbated by a gap in the Information Technology Service Continuity Management (ITSCM) processes, which resulted in the spread of corrupt data. Lack of a continuity procedure also was one of the reasons it took 18 hours to get the system back up and running, according to the report. Full service to all affected operations and agencies did not return until about a week later.

Specifically, parties responsible for responding to the incident did not suspend what's called Symmetrix Remote Data Facility (SRDF) before the memory board replacement process, which "negatively impacted the data recovery procedures" and allowed corrupt data to be replicated.

SDRF is a process used to replicate data from a local storage array to a remote storage array. The report cites Northrop Grumman as the responsible party for managing risk during the SRDF process.

Northrop Grumman spokeswoman Christy Whitman said the company has been "working hard" since the outage to "make the appropriate improvements to help avoid or mitigate similar disruptions."

The company also is ready to talk with Virginia officials about how best to implement report recommendations, she added.

It's still not known how much the outage will cost the commonwealth and if and how Northrop’s relationship with VITA will be affected. State officials long have criticized the partnership, which has had its troubles over the years.

We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.
Comment  | 
Print  | 
More Insights
News
Top 10 Data and Analytics Trends for 2021
Jessica Davis, Senior Editor, Enterprise Apps,  11/13/2020
Commentary
Where Cloud Spending Might Grow in 2021 and Post-Pandemic
Joao-Pierre S. Ruth, Senior Writer,  11/19/2020
Slideshows
The Ever-Expanding List of C-Level Technology Positions
Cynthia Harvey, Freelance Journalist, InformationWeek,  11/10/2020
White Papers
Register for InformationWeek Newsletters
Video
Current Issue
Why Chatbots Are So Popular Right Now
In this IT Trend Report, you will learn more about why chatbots are gaining traction within businesses, particularly while a pandemic is impacting the world.
Slideshows
Flash Poll