The NoSQL database system for handling big data as documents adds several features beyond the much-requested "write-ahead" journaling, including covered and sparse indexes.
(click image for larger view)
Slideshow: 10 Tenets Of Enterprise Data Management
An often-requested feature for NoSQL systems is the long-established database service of journaling. In the 1.8 release of MongoDB, "write-ahead" journaling has been added to the system. In the event of a crash, a multi-step transaction can be reconstructed from the journal.
Write-ahead journaling means files are written to the journal before the write occurs to the database. It's a standard feature of relational database systems and can be found in Oracle, IBM's DB2, Microsoft SQL Server, and open source MySQL.
"It was definitely our number one requested feature. It represents a maturing of the product," Dwight Merriman, founder and CEO of 10Gen, the principal company behind MongoDB, said in an interview.
Prior to release 1.8, a MongoDB administrator could recover from crash, but the process was much more painful and time consuming, Merriman explained. The write-ahead journaling feature is the best established approach for crash recovery, allowing each event that preceded the crash to be reconstructed and the data restored.
In addition, MongoDB 1.8 has added covered indexes, or indexes that can be identified as having all the document keys needed to satisfy a given query. Knowing there is an index that satisfies a query makes a database system more efficient. "The index is a lot smaller (than the stored data as a whole) and it may be located in random access memory," Merriman said, where it can be accessed more quickly than drawing data off of disk.
The 1.8 release of MongoDB also includes sparse indexes, where a collection of data objects may include subsets that are missing one or more fields found in the rest of the set. A database system typically puts a null value in the field and, in some cases, an index is filled with as many nulls as stated values. A sparse index is an optimization that allows the system to omit the nulls in building the index, shrinking its size and allowing it to respond more quickly to queries.
Covered indexes and sparse indexes were pioneered by relational systems. Their addition to a NoSQL system represents an effort to add more relational-like characteristics to systems that still have their own unique properties. In terms of similarities, MongoDB uses Query, Insert, Update, and Remove as commands for managing JSON documents; a relational system uses Select, Insert, Update, and Delete for equivalent functions in handling data. But MongoDB can amend what a relational system would consider a fixed schema and keep adding new objects to the files that it is storing without losing track of them.
NoSQL systems also spread themselves and their data over a server cluster and handle storage and retrieval functions in parallel, greatly increasing the amount of data they can deal with at one time. That property is what gives NoSQL systems the reputation for being "Big Data" systems, with some instances handling petabytes.
One area of continued difference with relational systems is the NoSQL system's lack of consistency, or ability to answer the same query from two different parties in exactly the same way. NoSQL systems will do data reads from a slightly out of date replica in order to avoid imposing locks that allow fresh data to be written to the sole copy of the data. That means one user might get an answer from a replica and another user, a fraction of a second later, would get a slightly different answer based on an updated system. In most cases, such as in Facebook updates, social networking, or online games, the difference is minor or non-consequential. But that means NoSQL systems are not used in equity trading or bond trading or other settings where consistency is required rather than optional.
Merriman said MongoDB developers are working with the idea of eventual consistency, where the system can be tuned to be highly efficient in terms of satisfying millions of requests for data, while tolerating a low level of consistency. Or it could be a little less efficient but more consistent in its answers to queries. In the NoSQL world, the idea of consistency can be placed on a slider and moved back and forth, depending on the user needs in the setting in which it will be operating.
That slider does not yet exist. Eventual consistency is a concept that is regularly toyed with by various NoSQL development teams, but there is no definite timeframe for when one of them plans to implement it.
The Agile ArchiveWhen it comes to managing data, donít look at backup and archiving systems as burdens and cost centers. A well-designed archive can enhance data protection and restores, ease search and e-discovery efforts, and save money by intelligently moving data from expensive primary storage systems.
2014 Analytics, BI, and Information Management SurveyITís tried for years to simplify data analytics and business intelligence efforts. Have visual analysis tools and Hadoop and NoSQL databases helped? Respondents to our 2014 InformationWeek Analytics, Business Intelligence, and Information Management Survey have a mixed outlook.