Most everyone agrees that migrating unstructured files into a structured object-oriented database is the best way to manage the explosion of enterprise data. We explain where this effort stands and the obstacles to creating a better file system infrastructure.
Tired of slogging through reams of folders to find a specific file? The only comprehensive way to erase the complex hierarchy of direct and indirect block pointers that make it difficult to sort, index, manage and retrieve a file is to transform your file system into a database-centric storage architecture.
An object-oriented database repository stores files as large binary objects. This allows controlled file-sharing across different OS platforms, quick and granular file searching, and easier overall management, according to Microsoft, Oracle and other software developers that favor the approach. These new file databases also let you easily classify file objects by their retention, accessibility, security and privacy requirements--good for ILM (information life-cycle management).
The limitations of today's "unstructured" file systems affect organizations in different ways (see the diagram at right of a typical Unix file system). An audio/video postproduction house or streaming media company with hefty multimedia files, for instance, may hit the wall with file-size limits--anywhere from 2 TB to 16 TB, depending on the file system--and may even find constraints in the maximum number of files that the file system can support.
However, improved headroom doesn't alter the fact that file systems are self-destructive. Every time you save a file, you overwrite the last valid copy of data. This goes back to the roots of file system design in the 1960s and '70s, when software engineers opted to minimize the costs of expensive resources like storage rather than add to their software journaling or versioning techniques for protecting file versions.
In addition, most file systems today don't automatically provide detailed descriptions of the data. And the stored metadata (data about data) doesn't say much about the contents or usage of a file, which makes ILM and automatic provisioning impossible. Users name their files, and applications, such as Microsoft's Office, let users add content descriptions, which are saved with the files. But it's up to the user to complete the information page when each file is saved. Few actually bother.
Without rigorous file-naming methodologies or consistent application-level file descriptions, recent regulations such as Sarbanes-Oxley and HIPAA (Health Insurance Portability and Accountability Act) are causing big headaches. It's tough to identify which files must be retained in special repositories for regulatory compliance if the files don't include descriptive information. Just try retrieving the correct files quickly, or segregating files that require special protection, when you're under the pressures of an SEC investigation.
ILM won't mean much, either, if you don't have the file information necessary to create logical classes. It requires a granular understanding of file content, access requirements, platform cost and capability, and other considerations to create intelligent data-migration policies. Without effective file-naming, you just can't cherry-pick files for storage on appropriate platforms.
Even if ILM isn't in your plans and you're immune from regulatory pressures, the lack of detailed file information can still create problems in your day-to-day business. It's difficult to locate the files your users need if you have nondescript file names and inadequate directory or folder hierarchies.
This problem is exacerbated as organizations grow, and data-sharing among distributed users and applications becomes more pronounced. The larger the organization, the more files get lost in the shuffle.
The Agile ArchiveWhen it comes to managing data, donít look at backup and archiving systems as burdens and cost centers. A well-designed archive can enhance data protection and restores, ease search and e-discovery efforts, and save money by intelligently moving data from expensive primary storage systems.
2014 Analytics, BI, and Information Management SurveyITís tried for years to simplify data analytics and business intelligence efforts. Have visual analysis tools and Hadoop and NoSQL databases helped? Respondents to our 2014 InformationWeek Analytics, Business Intelligence, and Information Management Survey have a mixed outlook.