Software // Information Management
Commentary
12/4/2013
09:06 AM
Doug Henschen
Doug Henschen
Commentary
Connect Directly
Google+
LinkedIn
Twitter
RSS
E-Mail

Making The Case For Hadoop: Variety, Not Volume

Paytronix manages only tens of terabytes, but it offers the perfect example of why we need more than relational databases.

Panera is one of the restaurant chains that Paytronix supports.
Panera is one of the restaurant chains that Paytronix supports.

Comment  | 
Print  | 
Comments
Newest First  |  Oldest First  |  Threaded View
anon3593880708
50%
50%
anon3593880708,
User Rank: Apprentice
12/5/2013 | 12:23:45 PM
analysis is still required
The fields in a _good_ relational database have been defined by experts to cover the problem. It is restrictive, sort of a waterfall approach to data management. You can easily get around that by using non-relational data tools but I don't think anyone gets to bypass the actual analysis.

The reason I think that is important is because many average users (think managers and VPs) assume anything with numbers in it is correct when it may be way off.

A standard regression line on good data is almost always more accurate than any set of experts. But a regression line on bad data is rather useless, no matter how many zillion different data points you have.

What is making the industry look good now is that the only way to use Hadoop and other big data tools is that you must be well-versed in math and logic. Once someone comes up with a turnkey approach to big data, those results will probably be as useless as most of the Microsoft Access "databases" I've tried to understand and fix.
Li Tan
50%
50%
Li Tan,
User Rank: Ninja
12/5/2013 | 12:23:27 AM
Re: NoSQL takes its place in managing data
The major advantage of RDBMS is its ACID compliance capability. By loosing this rule, NoSQL DB is more powerful in handling big data with large variety. I do agree on the title of this post - the outstanding characteristics of big-data is its variety instead of just larget volume. Velocity is another factor to consider - in addition to big-data at rest we need to take care of data in motion. For variety, Hadoop and NoSQL DB really handle them pretty well due to its distributed and rather loosely organized structure. Velocity is another factor and something more like Stream, S4, etc. are needed.
cbabcock
50%
50%
cbabcock,
User Rank: Strategist
12/4/2013 | 4:57:42 PM
NoSQL takes its place in managing data
Doug is right, the beauty of relational database is its column and row structure, which allows the ACID rules to take effect. They impose consistency on the data throughout the database and allow you to do transactions that never have the numbers screwed up. By relaxing the rules, however, the NoSQL systems gather in many different data types under the same roof, sometimes housing two different types that use the same name. They may also serve up an answer that is slightly out of date, such as there are four competitors playing against you in an online game when in fact a fifth just joined. The value gained from the NoSQL systems in pattern detection far outstrips their limitations. Just don't use them for big, multi-currency transactions: you'll get your total in croners when you meant kroners.
D. Henschen
50%
50%
D. Henschen,
User Rank: Author
12/4/2013 | 11:36:14 AM
Re: RDBMS vendors fighting back?
It's not the queries that are inflexible; it's the storage of data predefined into columns and rows. With Hadoop you load anything and come up with the schema (the dimensions of interest) on read, using algorithms, MapReduce, Hive, SQL-on-Hadoop tools, etc. to boil down to the data of interest within that great big lake (a.k.a., Enterprise Data Hub) of information. Some RDMS vendors are trying to make data modeling more flexible (Teradata being one example). Others are finding was to bring unstructured data into the picture -- by, for example, extending SQL queries into Hadoop.

But you can't get around the fact that the best use for RDBMS is structured, consistent data that doesn't change a lot. NoSQL databases are taking off in the transactional and content realm because they also get around this predefined-data-model obstacle.
Lorna Garey
50%
50%
Lorna Garey,
User Rank: Author
12/4/2013 | 10:05:56 AM
RDBMS vendors fighting back?
Doug, Are there projects in the works to make conventional DBs more flexible for queries? Seems like if these vendors want to hang on to marketshare they need to take on that challenge.
Laurianne
50%
50%
Laurianne,
User Rank: Author
12/4/2013 | 9:51:27 AM
Great examples
These examples of pattern spotting, such as people dining with kids, will be music to marketers' ears. The restaurant owners must see clear results from the well-targeted promotions.
D. Henschen
50%
50%
D. Henschen,
User Rank: Author
12/4/2013 | 9:24:14 AM
Details, Details
The whole data lake/enterprise data hub discussion around Hadoop is about capturing full-fidelity (raw) data on an affordable, high-scale platform and then creating the "schema on read" as particular dimensions of data are deemed relevant. Before Hadoop, Paytronix had to throw away the right detail to get everything into a predefined schema. You're probably thinking that each restaurant chain has access to this data, but many are midsized businesses that don't have BI and analytics chops. They're too busy putting food on tables and planning new menues. They hired Paytronix to help them with marketing and loyalty program optimization.

What I love about the Paytronix story is that it's easily understandable. Restaurants don't just want to know that deserts are doing well, they want to know that it's the new cheesecake that's popular in the Northeast while the Southeast is going for cherry pie.
The Agile Archive
The Agile Archive
When it comes to managing data, donít look at backup and archiving systems as burdens and cost centers. A well-designed archive can enhance data protection and restores, ease search and e-discovery efforts, and save money by intelligently moving data from expensive primary storage systems.
Register for InformationWeek Newsletters
White Papers
Current Issue
InformationWeek Tech Digest - August 20, 2014
CIOs need people who know the ins and outs of cloud software stacks and security, and, most of all, can break through cultural resistance.
Flash Poll
Video
Slideshows
Twitter Feed
InformationWeek Radio
Sponsored Live Streaming Video
Everything You've Been Told About Mobility Is Wrong
Attend this video symposium with Sean Wisdom, Global Director of Mobility Solutions, and learn about how you can harness powerful new products to mobilize your business potential.