Saving Pieces Of Digital History - InformationWeek
07:33 AM

Saving Pieces Of Digital History

The Library of Congress debuts a site that gives the public access to all Web pages related to the 2002 elections, with other selected events to follow.

Researchers say the average Web site has a lifespan of 44 to 70 days. Some Web pages are outdated and replaced even sooner. But once those pages are removed from a site, most are gone for good.

In order to save a piece of digital history, the Library of Congress has begun preserving and cataloguing all Web pages related to selected events. The Library of Congress this month launched a site that gives the public access to such an archived event: the elections of 2002. The collection contains more than 3,000 URLs, 1.3 terabytes of data, and about 50 million Web objects. The archive represents the daily content from 1,100 Web sites of candidates who ran for the U.S. Senate, House of Representatives, or governor in 2002.

By preserving each Web page, rather than just pieces of content, as it appeared on each candidate's site daily, the archive shows the context of how events unfolded, says Steven Schneider, a professor at the State University of New York Institute of Technology, a co-director of the joint project. An example of election 2002 history unfolding on the Web is the tragic death of Minnesota Sen. Paul Wellstone, who was killed in a plane crash weeks before the election. The archive preserves the evolution of the campaign, including the dramatic changes of Wellstone's site after his accident.

The archive, which covered the campaign season from July 2002 through the end of November, also includes the evolving election-night results, in which 12,000 Web pages were captured hourly.

Key to the archive is indexing and cataloguing software services that let users search and retrieve the stored Web material, says Schneider, who developed the cataloguing services. "The cataloging capabilities are what enables a person to make sense of the Web archive and all its material," he says.

The Library of Congress' Election 2002 archive represents hundreds of sites that are being preserved for historical purposes. However, this kind of archiving and cataloging has potential for other government sites, as well as for smaller-scale projects by companies that want to preserve their external or internal Web material. Kirsten Foot, a professor of communications at the University of Washington and the other co-director the joint project, estimates that approximately 7.3 million Web pages are added to the 4 billion pages that exist online daily. "We're in the process of exploring how this cataloging service can be offered to the commercial world," she says.

The Library of Congress will archive and catalog other events, although they have not yet been chosen, a spokesman says. Foot and Schneider also worked on a Library of Congress project that archived and cataloged Web pages related to Sept. 11, 2001.

We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.
Comment  | 
Print  | 
More Insights
Newest First  |  Oldest First  |  Threaded View
2017 State of IT Report
2017 State of IT Report
In today's technology-driven world, "innovation" has become a basic expectation. IT leaders are tasked with making technical magic, improving customer experience, and boosting the bottom line -- yet often without any increase to the IT budget. How are organizations striking the balance between new initiatives and cost control? Download our report to learn about the biggest challenges and how savvy IT executives are overcoming them.
Register for InformationWeek Newsletters
White Papers
Current Issue
Top IT Trends for 2018
As we enter a new year of technology planning, find out about the hot technologies organizations are using to advance their businesses and where the experts say IT is heading.
Twitter Feed
Sponsored Live Streaming Video
Everything You've Been Told About Mobility Is Wrong
Attend this video symposium with Sean Wisdom, Global Director of Mobility Solutions, and learn about how you can harness powerful new products to mobilize your business potential.
Flash Poll