The Library of Congress debuts a site that gives the public access to all Web pages related to the 2002 elections, with other selected events to follow.
Researchers say the average Web site has a lifespan of 44 to 70 days. Some Web pages are outdated and replaced even sooner. But once those pages are removed from a site, most are gone for good.
In order to save a piece of digital history, the Library of Congress has begun preserving and cataloguing all Web pages related to selected events. The Library of Congress this month launched a site that gives the public access to such an archived event: the elections of 2002. The collection contains more than 3,000 URLs, 1.3 terabytes of data, and about 50 million Web objects. The archive represents the daily content from 1,100 Web sites of candidates who ran for the U.S. Senate, House of Representatives, or governor in 2002.
By preserving each Web page, rather than just pieces of content, as it appeared on each candidate's site daily, the archive shows the context of how events unfolded, says Steven Schneider, a professor at the State University of New York Institute of Technology, a co-director of the joint project. An example of election 2002 history unfolding on the Web is the tragic death of Minnesota Sen. Paul Wellstone, who was killed in a plane crash weeks before the election. The archive preserves the evolution of the campaign, including the dramatic changes of Wellstone's site after his accident.
The archive, which covered the campaign season from July 2002 through the end of November, also includes the evolving election-night results, in which 12,000 Web pages were captured hourly.
Key to the archive is indexing and cataloguing software services that let users search and retrieve the stored Web material, says Schneider, who developed the cataloguing services. "The cataloging capabilities are what enables a person to make sense of the Web archive and all its material," he says.
The Library of Congress' Election 2002 archive represents hundreds of sites that are being preserved for historical purposes. However, this kind of archiving and cataloging has potential for other government sites, as well as for smaller-scale projects by companies that want to preserve their external or internal Web material. Kirsten Foot, a professor of communications at the University of Washington and the other co-director the joint project, estimates that approximately 7.3 million Web pages are added to the 4 billion pages that exist online daily. "We're in the process of exploring how this cataloging service can be offered to the commercial world," she says.
The Library of Congress will archive and catalog other events, although they have not yet been chosen, a spokesman says. Foot and Schneider also worked on a Library of Congress project that archived and cataloged Web pages related to Sept. 11, 2001.
We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.
2017 State of IT ReportIn today's technology-driven world, "innovation" has become a basic expectation. IT leaders are tasked with making technical magic, improving customer experience, and boosting the bottom line -- yet often without any increase to the IT budget. How are organizations striking the balance between new initiatives and cost control? Download our report to learn about the biggest challenges and how savvy IT executives are overcoming them.
Infographic: The State of DevOps in 2017Is DevOps helping organizations reduce costs and time-to-market for software releases? What's getting in the way of DevOps adoption? Find out in this InformationWeek and Interop ITX infographic on the state of DevOps in 2017.