Making The Case For Hadoop: Variety, Not Volume
Newest First  |  Oldest First  |  Threaded View
User Rank: Apprentice
12/5/2013 | 12:23:45 PM
analysis is still required
The fields in a _good_ relational database have been defined by experts to cover the problem. It is restrictive, sort of a waterfall approach to data management. You can easily get around that by using non-relational data tools but I don't think anyone gets to bypass the actual analysis.

The reason I think that is important is because many average users (think managers and VPs) assume anything with numbers in it is correct when it may be way off.

A standard regression line on good data is almost always more accurate than any set of experts. But a regression line on bad data is rather useless, no matter how many zillion different data points you have.

What is making the industry look good now is that the only way to use Hadoop and other big data tools is that you must be well-versed in math and logic. Once someone comes up with a turnkey approach to big data, those results will probably be as useless as most of the Microsoft Access "databases" I've tried to understand and fix.
Li Tan
Li Tan,
User Rank: Ninja
12/5/2013 | 12:23:27 AM
Re: NoSQL takes its place in managing data
The major advantage of RDBMS is its ACID compliance capability. By loosing this rule, NoSQL DB is more powerful in handling big data with large variety. I do agree on the title of this post - the outstanding characteristics of big-data is its variety instead of just larget volume. Velocity is another factor to consider - in addition to big-data at rest we need to take care of data in motion. For variety, Hadoop and NoSQL DB really handle them pretty well due to its distributed and rather loosely organized structure. Velocity is another factor and something more like Stream, S4, etc. are needed.
User Rank: Strategist
12/4/2013 | 4:57:42 PM
NoSQL takes its place in managing data
Doug is right, the beauty of relational database is its column and row structure, which allows the ACID rules to take effect. They impose consistency on the data throughout the database and allow you to do transactions that never have the numbers screwed up. By relaxing the rules, however, the NoSQL systems gather in many different data types under the same roof, sometimes housing two different types that use the same name. They may also serve up an answer that is slightly out of date, such as there are four competitors playing against you in an online game when in fact a fifth just joined. The value gained from the NoSQL systems in pattern detection far outstrips their limitations. Just don't use them for big, multi-currency transactions: you'll get your total in croners when you meant kroners.
D. Henschen
D. Henschen,
User Rank: Author
12/4/2013 | 11:36:14 AM
Re: RDBMS vendors fighting back?
It's not the queries that are inflexible; it's the storage of data predefined into columns and rows. With Hadoop you load anything and come up with the schema (the dimensions of interest) on read, using algorithms, MapReduce, Hive, SQL-on-Hadoop tools, etc. to boil down to the data of interest within that great big lake (a.k.a., Enterprise Data Hub) of information. Some RDMS vendors are trying to make data modeling more flexible (Teradata being one example). Others are finding was to bring unstructured data into the picture -- by, for example, extending SQL queries into Hadoop.

But you can't get around the fact that the best use for RDBMS is structured, consistent data that doesn't change a lot. NoSQL databases are taking off in the transactional and content realm because they also get around this predefined-data-model obstacle.
Lorna Garey
Lorna Garey,
User Rank: Author
12/4/2013 | 10:05:56 AM
RDBMS vendors fighting back?
Doug, Are there projects in the works to make conventional DBs more flexible for queries? Seems like if these vendors want to hang on to marketshare they need to take on that challenge.
User Rank: Author
12/4/2013 | 9:51:27 AM
Great examples
These examples of pattern spotting, such as people dining with kids, will be music to marketers' ears. The restaurant owners must see clear results from the well-targeted promotions.
D. Henschen
D. Henschen,
User Rank: Author
12/4/2013 | 9:24:14 AM
Details, Details
The whole data lake/enterprise data hub discussion around Hadoop is about capturing full-fidelity (raw) data on an affordable, high-scale platform and then creating the "schema on read" as particular dimensions of data are deemed relevant. Before Hadoop, Paytronix had to throw away the right detail to get everything into a predefined schema. You're probably thinking that each restaurant chain has access to this data, but many are midsized businesses that don't have BI and analytics chops. They're too busy putting food on tables and planning new menues. They hired Paytronix to help them with marketing and loyalty program optimization.

What I love about the Paytronix story is that it's easily understandable. Restaurants don't just want to know that deserts are doing well, they want to know that it's the new cheesecake that's popular in the Northeast while the Southeast is going for cherry pie.

Register for InformationWeek Newsletters
White Papers
Current Issue
Top IT Trends to Watch in Financial Services
IT pros at banks, investment houses, insurance companies, and other financial services organizations are focused on a range of issues, from peer-to-peer lending to cybersecurity to performance, agility, and compliance. It all matters.
Twitter Feed
InformationWeek Radio
Archived InformationWeek Radio
Join us for a roundup of the top stories on for the week of October 9, 2016. We'll be talking with the editors and correspondents who brought you the top stories of the week to get the "story behind the story."
Sponsored Live Streaming Video
Everything You've Been Told About Mobility Is Wrong
Attend this video symposium with Sean Wisdom, Global Director of Mobility Solutions, and learn about how you can harness powerful new products to mobilize your business potential.
Flash Poll