Never Miss a Beat: Get a snapshot of the issues affecting the IT industry straight to your inbox.
Startups Offer Tools To Simplify Big Data
Not every big data project has a data scientist on board. Some startups are developing analysis tools to help non-specialists target the information they need.
September 7, 2012
7 Min Read
Researchers at the University of Wisconsin last month announced a tool that combines two high-profile societal trends: a big data analytic that sifts through the 250 million Twitter messages to identify those belonging to school bullies or their victims.
The software, which is designed to identify after-the-fact posts from bullies, victims, accusers, or defenders, tags about 15,000 messages per day as related to a specific incident of bullying.
The ability to identify specific types of incidents from messages posted by participants outside of school should be valuable to school administrators whose only opportunity for assessing bullying in their schools might be an annual survey and whatever facts they can squeeze out of reticent victims, the researchers said.
Unfortunately for the developers, however, bully identification is not high on the list of priorities for which venture capitalists are willing to fund new companies.
[ Data scientists are in high demand, and the field is expected to grow dramatically. Read more at Data Scientists: Meet Big Data's Top Guns. ]
On the other hand, it's relatively simple to land $2 million in funding for big-data tool development, even if your company has a squirrely name and a shadowy origin as a spinoff of the National Security Agency--that is, as long as your plan is to add enough security to big data apps to let heavily regulated industries like health care and financial services use them.
The company--Sqrrl-- is working on developing security that's granular enough to allow emergency room staff access to a patient's phone number, street address, or list of allergies in an Electronic Health Record, while locking down his or her Social Security number and financial and personal details.
That could create a huge change in the way health care organizations handle digital health records, which currently can be either open or closed and have little ability to hide the most sensitive bits of data. Big data is important, but big analytics are more important--because it's not having big data that matters, but doing something useful with it, according to data- and information-management analyst Colin White of BI Research.
The long-term difference between a company that uses big data effectively and one that ends up wandering in the desert with no idea where to go may be the organization's ability to focus on the workload needed to address a problem rather than focusing on a new technology such as Hadoop, White said.
While there is definitely a gap between corporate executives' awareness of big data and the questions they would like it to answer, the more immediate question is how to actually get those answers without tools that let non-specialists ask their own questions--and without the infrastructure of tools needed to whip big-data sets into shape in the first place, according to Shalini Das, research director at the CIO Executive Board.
Big data technology and the knowledge base describing how to use it are both desperately immature and, in fact, are impeding each other's development.
Without knowing what questions to ask, how to ask them, or what data needs to be assembled to find the answers they need, business managers--who should be the ultimate consumers of big-data analytics--don't know where to start, according to White. And without specific questions to be answered, goals to be met, and well-defined benefits to be gained from effective analysis, Das explained, big data analytics software developers don't know which types of tools would be most valuable.
Effective tools and best practices will develop in tandem, forcing IT and business analysts to work together to set ground rules and to implement the actual technology, according to White.
Hot Big Data Tools, Startups Some software tools are clearly important, however. For example, software that can lock down some fields in a record while opening others to queries is a clear advance over current security, which generally allows a record to be either open or shut with no caveats or extra protections for those that are opened, Das said.
Among the hottest functions still to come is the ability to let non-specialists surf big data, run data visualizations that might show new trends, and get answers to their most pressing questions without requiring the help of a data scientist, according to Ted Cuzzillo, a decision support and business intelligence analyst who blogs at DataDoodle.
But perhaps the most urgent need, according to Das, is for tools that can manage, structure, and analyze unstructured data by applying metadata, cleaning up the language, and squeezing the content into fields that can be processed by analytic software.
Simply cleaning and structuring data isn't enough, however. A CIO Executive Board survey posted in March found that only half of all employees who work with data or analytics get any training in these areas at all. Of those who do, only half report that the training does any good. That means three quarters of the potential big data users in corporate America have had no useful training in what may turn into the most important part of their jobs.
With few users trained in complex tools, simplicity becomes far more important. This is true even for heavily cleaned and processed big data sets, because they still have to be sorted and sifted to find the subset of data that could contain the answers users are looking for, according to CIO Executive Board analyst Andrew Horne.
Even estimating how complex a tool--let alone which specific tool--they need to analyze a particular set of data is not an easy task for most knowledge workers, according to researchers at the University of Pennsylvania's Wharton School. These researchers wrote a paper and built a set of models designed to help non-specialists pick the right tool for the right problem, but to date the paper has not been published and the software is not funded.
Better Tools For Non-Specialists DataSift, which landed $7.2 million in venture funding in May, started out under the name Tweetmeme, searching for interesting conversational trends. The company then changed direction, devoting its complex data-filtering software to sorting social network and other unstructured data according to gender, location, or even opinion.
A tool from Karmasphere is designed to let users create a graphic representation of a big data set, run ad-hoc queries against it to find trends and patterns they think are significant, then post the results for colleagues.
A third startup, Domo, comes with $63 million in funding and a CEO who founded Omniture, one of the more successful online data analytics companies. Domo provides analytics as a cloud service that gives executives a data dashboard on which they can poke and prod data using almost any device. It's not designed specifically as a big data tool, however. Instead, its goal is to give business managers direct access to the data and business intelligence analytics they previously could get only through the IT department, which often meant the data had gotten stale by the time it was crunched.
WibiData, founded by another celebrity entrepreneur, offers a big data platform that combines the abilities of Hadoop, HBase, and Avro to collect, manage, manipulate, and analyze big data sets without having to go from platform to platform. It's not quite self-service big data building and management, but it comes as close as you'll find anywhere in the business today. (WibiData was founded in 2011 as ObiData, and recently changed its name.)
Startup MetaMarkets doesn't bother with structuring or filtering data. It is designed to look into thick flows of transactional data to detect subtle changes in traffic or trends, and to predict how those trends will change over time. Its strength, in fact, is in its predictive--not its current-- analytics.
Being able to tell the future is a valuable skill, of course. Metamarkets has gone through two rounds of funding so far, landing a total of $8.5 million to help with its own transactions. See the future of business technology at Interop New York Oct. 1-5. It's the best place to learn about next-generation technologies including cloud computing, BYOD, big data, and virtualization. End of Summer Discounts end Sept. 5. Save up to $800 on Interop New York Conference Passes with code WEYLBQNY07.
About the Author(s)
Kevin Fogarty is a freelance writer covering networking, security, virtualization, cloud computing, big data and IT innovation. His byline has appeared in The New York Times, The Boston Globe, CNN.com, CIO, Computerworld, Network World and other leading IT publications.
You May Also Like