The Library of Congress debuts a site that gives the public access to all Web pages related to the 2002 elections, with other selected events to follow.
Researchers say the average Web site has a lifespan of 44 to 70 days. Some Web pages are outdated and replaced even sooner. But once those pages are removed from a site, most are gone for good.
In order to save a piece of digital history, the Library of Congress has begun preserving and cataloguing all Web pages related to selected events. The Library of Congress this month launched a site that gives the public access to such an archived event: the elections of 2002. The collection contains more than 3,000 URLs, 1.3 terabytes of data, and about 50 million Web objects. The archive represents the daily content from 1,100 Web sites of candidates who ran for the U.S. Senate, House of Representatives, or governor in 2002.
By preserving each Web page, rather than just pieces of content, as it appeared on each candidate's site daily, the archive shows the context of how events unfolded, says Steven Schneider, a professor at the State University of New York Institute of Technology, a co-director of the joint project. An example of election 2002 history unfolding on the Web is the tragic death of Minnesota Sen. Paul Wellstone, who was killed in a plane crash weeks before the election. The archive preserves the evolution of the campaign, including the dramatic changes of Wellstone's site after his accident.
The archive, which covered the campaign season from July 2002 through the end of November, also includes the evolving election-night results, in which 12,000 Web pages were captured hourly.
Key to the archive is indexing and cataloguing software services that let users search and retrieve the stored Web material, says Schneider, who developed the cataloguing services. "The cataloging capabilities are what enables a person to make sense of the Web archive and all its material," he says.
The Library of Congress' Election 2002 archive represents hundreds of sites that are being preserved for historical purposes. However, this kind of archiving and cataloging has potential for other government sites, as well as for smaller-scale projects by companies that want to preserve their external or internal Web material. Kirsten Foot, a professor of communications at the University of Washington and the other co-director the joint project, estimates that approximately 7.3 million Web pages are added to the 4 billion pages that exist online daily. "We're in the process of exploring how this cataloging service can be offered to the commercial world," she says.
The Library of Congress will archive and catalog other events, although they have not yet been chosen, a spokesman says. Foot and Schneider also worked on a Library of Congress project that archived and cataloged Web pages related to Sept. 11, 2001.
5 Top Federal Initiatives For 2015As InformationWeek Government readers were busy firming up their fiscal year 2015 budgets, we asked them to rate more than 30 IT initiatives in terms of importance and current leadership focus. No surprise, among more than 30 options, security is No. 1. After that, things get less predictable.