Software // Information Management
10:37 AM
Jon Toigo
Jon Toigo
Connect Directly

A Database Fix For The File System

Most everyone agrees that migrating unstructured files into a structured object-oriented database is the best way to manage the explosion of enterprise data. We explain where this effort stands and the obstacles to creating a better file system infrastructure.

Tired of slogging through reams of folders to find a specific file? The only comprehensive way to erase the complex hierarchy of direct and indirect block pointers that make it difficult to sort, index, manage and retrieve a file is to transform your file system into a database-centric storage architecture.

An object-oriented database repository stores files as large binary objects. This allows controlled file-sharing across different OS platforms, quick and granular file searching, and easier overall management, according to Microsoft, Oracle and other software developers that favor the approach. These new file databases also let you easily classify file objects by their retention, accessibility, security and privacy requirements--good for ILM (information life-cycle management).

The limitations of today's "unstructured" file systems affect organizations in different ways (see the diagram at right of a typical Unix file system). An audio/video postproduction house or streaming media company with hefty multimedia files, for instance, may hit the wall with file-size limits--anywhere from 2 TB to 16 TB, depending on the file system--and may even find constraints in the maximum number of files that the file system can support.

However, improved headroom doesn't alter the fact that file systems are self-destructive. Every time you save a file, you overwrite the last valid copy of data. This goes back to the roots of file system design in the 1960s and '70s, when software engineers opted to minimize the costs of expensive resources like storage rather than add to their software journaling or versioning techniques for protecting file versions.

Dig Deeper (on-site search queries)

In addition, most file systems today don't automatically provide detailed descriptions of the data. And the stored metadata (data about data) doesn't say much about the contents or usage of a file, which makes ILM and automatic provisioning impossible. Users name their files, and applications, such as Microsoft's Office, let users add content descriptions, which are saved with the files. But it's up to the user to complete the information page when each file is saved. Few actually bother.

Without rigorous file-naming methodologies or consistent application-level file descriptions, recent regulations such as Sarbanes-Oxley and HIPAA (Health Insurance Portability and Accountability Act) are causing big headaches. It's tough to identify which files must be retained in special repositories for regulatory compliance if the files don't include descriptive information. Just try retrieving the correct files quickly, or segregating files that require special protection, when you're under the pressures of an SEC investigation.

Typical Inode Structure
Click to Enlarge

ILM won't mean much, either, if you don't have the file information necessary to create logical classes. It requires a granular understanding of file content, access requirements, platform cost and capability, and other considerations to create intelligent data-migration policies. Without effective file-naming, you just can't cherry-pick files for storage on appropriate platforms.

Even if ILM isn't in your plans and you're immune from regulatory pressures, the lack of detailed file information can still create problems in your day-to-day business. It's difficult to locate the files your users need if you have nondescript file names and inadequate directory or folder hierarchies.

This problem is exacerbated as organizations grow, and data-sharing among distributed users and applications becomes more pronounced. The larger the organization, the more files get lost in the shuffle.

1 of 3
Comment  | 
Print  | 
More Insights
The Agile Archive
The Agile Archive
When it comes to managing data, donít look at backup and archiving systems as burdens and cost centers. A well-designed archive can enhance data protection and restores, ease search and e-discovery efforts, and save money by intelligently moving data from expensive primary storage systems.
Register for InformationWeek Newsletters
White Papers
Current Issue
InformationWeek Tech Digest September 23, 2014
Intrigued by the concept of a converged infrastructure but worry you lack the expertise to DIY? Dell, HP, IBM, VMware, and other vendors want to help.
Flash Poll
Twitter Feed
InformationWeek Radio
Sponsored Live Streaming Video
Everything You've Been Told About Mobility Is Wrong
Attend this video symposium with Sean Wisdom, Global Director of Mobility Solutions, and learn about how you can harness powerful new products to mobilize your business potential.