Delivering A Network-Utility Vision
If there is any doubt that computer storage is becoming increasingly distributed, look no farther than government and industry research labs around the country. They show that storage will become a ubiquitous network service like dial tone is to the voice network--if scientists succeed.
Centralized storage isn't likely to go away soon. But the growing quantities of data that consumers and businesses create necessitate massive amounts of additional storage. The only way to meet this demand cost-effectively is to take advantage of all computing devices on a network, some researchers say.
- The Untapped Potential of Mobile Apps for Commercial Customers
- Why is Information Governance So Important for Modern Analytics?
- A How-To Guide on Using Cloud Services for Security-Rich Data Backup
- Ensuring Your Apps Work in the Real World
Computer scientists at Hewlett-Packard Laboratories, Microsoft Research, the University of California at Berkeley, the National Science Foundation's Internet 2 project, and the Canadian government-funded CA net 4 initiative are working on separate efforts to create peer-to-peer networks of easily accessible storage that don't require the management and administrative overhead found in today's commercial storage systems. Some of these projects don't even require servers for storage; instead, they leverage the network and PCs to provide capacity.
Canadian researchers, under the auspices of Canarie Inc., a nonprofit Internet research consortium working on the CA net 4 project, have the most mind-bending approach to storage, using wavelengths of light as a data-storage medium. They hope the Wavelength Disk Drive will provide large amounts of storage needed for scientific computing grids used for weather forecasting, computational fluid dynamics, genome research, and pharmaceutical modeling. The CA net 4 project is building an advanced optical network capable of supporting a peer-to-peer computing grid that could greatly exceed the raw horsepower found in a centralized supercomputer--the main class of computers currently used for scientific modeling.
The OceanStore Project at UC Berkeley envisions a persistent data store that can scale to billions of users. OceanStore caches data on servers distributed throughout the network, and any computer can join the storage network and contribute storage or user access.
In a similar scheme, researchers at Microsoft are developing a serverless distributed file system that uses the storage capacity of the myriad PCs distributed throughout the network. The Farsite research project is designed to provide a highly available and reliable service, ensuring user privacy and data integrity in an environment that has no centrally trusted authority. Another goal of the project is for the service to automatically configure and tune itself so it can react to component failures, usage variations, and environmental changes.
The technology is aimed at the business market. Should it become commercially available in a few years, it could challenge centralized storage, given that this scheme would let IT departments eliminate some expensive storage servers and administrators, and use the excess storage capacity available on existing PCs.
"There's a lot of excess CPU and storage capacity not being used in corporations. That unused capacity is free," says Bill Bolosky, a senior Microsoft researcher. Microsoft conducted an audit of its PCs in 2000 and found that on average, the machines had 14 Gbytes of memory but only used about 6 Gbytes.
Researchers at the University of Tennessee, under the guidance of the government's Internet 2 project, have created a protocol that lets storage nodes communicate across the Internet. The asynchronous Internet Backplane Protocol will let service providers such as communications carriers create point-to-point storage overlay networks. In this project, storage is passive and not inherently intelligent, says Micah Beck, director of the Internet Logistics Computing and Internet Lab at the university. Users will add the intelligence for data management, content delivery, and security, Beck says. "The overlay network is like a logistical supply chain with highways, depots, and trucks. Storage is one of many shared resources," he adds.
Not having complex data-management schemes will let this storage network scale, unlike current storage area networks and network-attached storage technologies that have inherent scale limitations. The protocol software is available and the specification will be published later this spring.
Researchers at HP Labs also are designing next-generation distributed storage. The iShadow project envisions storage as a shared network utility that's physically distributed throughout the network but logically centralized in a massive data utility center. Two of the most innovative aspects of the research are the self-managing and security technologies for storage.
"Security has been ignored in the storage space for a long time," says Simon Towers, research manager for storage systems at HP Labs. A contract worker could walk into a company's data center and walk off with a hot-swappable storage disk complete with customer records and no one would be the wiser, Towers says. HP is working to bring encryption and cryptographic key distribution and management to storage to avoid such calamities.