Software // Information Management
Commentary
6/30/2010
03:57 PM
Curt Monash
Curt Monash
Commentary
Connect Directly
RSS
E-Mail
50%
50%

Machines Are Driving The Big-Data Era

Future big-data growth will be in the area of machine-generated data. In contrast to human-generated data, which can grow only as fast as human data-generating activities allow it to, machine-generated data is limited only by capital budgets and Moore's Law.

Not long ago I pointed out that much future Big Data growth will be in the area of machine-generated data, examples of which include:

  • Computer, network, and other equipment logs
  • Satellite and similar telemetry (whether for espionage or science)
  • Location data such as RFID chip readings, GPS system output, etc.
  • Temperature and other environmental sensor readings
  • Sensor readings from factories, pipelines, etc.
  • Output from many kinds of medical device, in hospitals and (increasingly) homes alike
The core idea here is that human-generated data can grow only as fast as human data-generating activities allow it to, but machine-generated data is limited only by capital budgets and Moore's Law. So machines' ability to generate data is growing a lot faster than humans'.Up to this point, I think there's broad agreement, at least on the part of anybody who's thought about it this way for very long. But that still leaves open questions as to which kinds of machine-generated data will matter first. The big five that matter right now are:
  • Web logs (partially machine-generated, but tied to human actions)
  • Call detail records (CDRs -- ditto)
  • Financial instrument trades (some purely machine-generated, some human-based)
  • Network event logs (commonly associated with web logs)
  • Telemetry collected by the government (especially for intelligence purposes)
A large fraction of all the 100 TB+ or petabyte+ data warehouse activity I know of falls into those areas.

Following along quickly are:

  • Online game data. Since late last year, online game companies have come up over and over again as an important category of data warehousing/analytics users. Like most of the categories above, the gaming area actually features a hybrid between human- and machine-generated data.
  • Genetic research data, although I don't know to what extent the investment in data gathering is concentrated among the few obvious big pharmaceutical companies. Other health care data (research or clinical) will come along too, but doesn't seem to be there yet.
Until recently I would have added:
  • Energy exploration, energy production, energy refining, and/or utility network data
But while those areas seemed poised to get hot last year, I haven't heard much about them the past few months, with a few exceptions:Finally, I've been assuming that a big area going forward is location data, especially personal movement data. The data volumes involved could be similar to or even greater than those of CDRs. But privacy concerns with that are obviously immense. (Of course, in the case of Foursquare, this sort of overlaps with freely-shared game data.)

If you want to make all this more tangible in your mind, one area to look for ideas is in the huge amount of news about various kinds of innovative sensors. Sources include:

  • Somebody named Landon Cox, who maintains a couple feeds of sensor news.
  • A Twitter feed, apparently associated with a Sensor Expo.
  • Another Twitter feed, this one from Sun Labs. (I have no idea what Oracle is or isn't doing with the Sun SPOT project that links to.)
  • Yet another Twitter feed.
Future big-data growth will be in the area of machine-generated data. In contrast to human-generated data, which can grow only as fast as human data-generating activities allow it to, machine-generated data is limited only by capital budgets and Moore's Law.

Comment  | 
Print  | 
More Insights
The Agile Archive
The Agile Archive
When it comes to managing data, donít look at backup and archiving systems as burdens and cost centers. A well-designed archive can enhance data protection and restores, ease search and e-discovery efforts, and save money by intelligently moving data from expensive primary storage systems.
Register for InformationWeek Newsletters
White Papers
Current Issue
InformationWeek - September 2, 2014
Avoiding audits and vendor fines isn't enough. Take control of licensing to exact deeper software discounts and match purchasing to actual employee needs.
Flash Poll
Video
Slideshows
Twitter Feed
InformationWeek Radio
Archived InformationWeek Radio
Howard Marks talks about steps to take in choosing the right cloud storage solutions for your IT problems
Sponsored Live Streaming Video
Everything You've Been Told About Mobility Is Wrong
Attend this video symposium with Sean Wisdom, Global Director of Mobility Solutions, and learn about how you can harness powerful new products to mobilize your business potential.