Big Data. Big Decisions
InformationWeek
Special Coverage Series


DataSift Turns Back Clock On Twitter

Social data processing specialist launches searchable archive of tweets going back to January 2010.

10 Cool Social Media Monitoring Tools
Slideshow: 10 Cool Social Media Monitoring Tools
(click image for larger view and for slideshow)
Social media architects have been so busy creating the future that they've often given short shrift to the past.

The perishable nature of the status post may have started to change on Facebook, with its new Timeline profiles, but history is still hard to come by on Twitter, where search results typically don't reach back more than days or weeks. Individual users may value Twitter precisely for how well it lives in the moment, but researchers seeking to analyze changing attitudes and trends often need to look backward in order to tell what has changed.

More Insights

Webcasts

More >>

White Papers

More >>

Reports

More >>

They will be able to get that history starting Tuesday through DataSift's social data processing service. DataSift is a specialist in the heavy lifting of sorting through billions of tweets and social posts, then indexing them for rapid search and retrieval. DataSift enriches that index through partnerships with Lexalytics for sentiment analysis and Klout for influence scores. It acts as a data processing back end to other social media analytics services, as well as some enterprise applications such as those of large news organizations.

Anyone can test a basic version of the service at the DataSift website, composing queries like "interaction.content contains 'obama' AND klout.score > 50" in the firm's curated stream definition language. More data-intensive queries and deeper integration with the service require a commercial license--and historical queries by definition will fall into that commercial category.

[ Even short-form Twitter can provide a window into consumer preferences. Read more in Do Tweets Predict The Future? ]

Making it work well is the challenge, CEO Rob Bailey said in an interview. "Companies are overwhelmed handling even the realtime nature of social--it has completely overwhelmed even the social media monitoring companies. Our team has been working on this feverishly over the last few months, and we've now got something like 100 billion tweets stored."

Founder Nick Halstead, who was previously the founder of TweetMeme, said his organization is up to the task, even though "filtering this data is a massive technical task" and means tackling all the complexities of Big Data management technologies like Hadoop and related tools like Pig and Hive. "A lot of these things require data scientists to get involved" in making sense of the data and processing it efficiently at scale, Halstead said.

To keep things simple for the user, or the front-end application developer, DataSift uses the same query language with historical data as with realtime queries, Halstead said. The major difference is you have to decide how much history you need to query for your analytic purpose, compared with the data processing cost associated with that query, he said.

Twitter generated about 85 billion posts in 2011 alone, and DataSift has been working with Twitter to extract data from its own archives, going back to the beginning of 2010. To respect the rights of the users, DataSift has also had to make sure that deleted tweets that were removed from the live stream have also been removed from the copy of the archives made available for analysis, Bailey said.

To make the data easier to interpret, DataSift has also worked in some correlation with events in the news that function as "mileposts" in your analytic slog through history. For example, the news of the resignations of the Research in Motion co-CEOs is displayed in the DataSift application as a reference point you can see relative to tweets about that company.

Follow David F. Carr on Twitter @davidfcarr. The BrainYard is @thebyard

Attend this Enterprise 2.0 webcast, Rebalancing The IT-User Relationship: The Business Value In Consumerization, and learn how the consumerization of IT will ultimately help organizations drive innovation and productivity, retain customers, and create a business advantage. It happens March 7. (Free registration required.)



Related Reading




Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

BYTE encourages readers to engage in spirited, healthy debate, including taking us to task. However, BYTE moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing/SPAM. BYTE further reserves the right to disable the profile of any commenter participating in said activities.

Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.

Follow InformationWeek

By The Numbers

What Are Your Primary Concerns About Using Big Data Software?

Base: 417 respondents at organizations using or planning to deploy data analytics, BI or statistical analysis software
Data: InformationWeek 2013 Analytics, Business Intelligence and Information Management Survey of 541 business technology professionals, October 2012

What Do You Think?

What's your attitude about SQL analysis on top of Hadoop?
We want fast, standard SQL analysis capabilities on Hadoop ASAP
Hadoop is for unstructured data; SQL is for relational databases
We'll give SQL on Hadoop a try, but relational DBs will remain the mainstay
Given strong SQL support on Hadoop, we'd nix the data warehouse
We're not interested in Hadoop
No opinion



Related Content

From Our Sponsor

Five Big Data Challenges and How to Overcome Them with Visual Analytics

Five Big Data Challenges and How to Overcome Them with Visual Analytics

Business leaders often need a visual snapshot of data to quickly grasp and use it. This paper identifies five challenges in presenting data and how visual analytics can resolve them. Solutions are suggested to overcome the challenges of: speed, data clarity, data quality, displaying meaningful results, and dealing with outliers.

Game-Changing Analytics: How IT Executives Can Use Analytics to Create Innovation and Business Success

Game-Changing Analytics: How IT Executives Can Use Analytics to Create Innovation and Business Success

Today's competitive advantage requires a deeper understanding of your business, your market and your customers. As an IT executive, you can drive that knowledge transformation. In this white paper, learn how to make decisions as a strategic business leader and three steps to begin an analytics initiative within your enterprise.

Data Visualization Techniques: From Basics to Big Data with SAS Visual Analytics

Data Visualization Techniques: From Basics to Big Data with SAS Visual Analytics

High-performance data visualization turns sophisticated analyses into meaningful graphics, leading to faster and smarter decision making. In this white paper, learn how visual analytics can transform big data, with additional features such as real-time functionality, mobile compatibility, robust applications for technical groups and accessibility for nontechnical users.

Big Data: Lessons from the Leaders

Big Data: Lessons from the Leaders

Financial performance, competitive advantage, operational efficiency, strategic decision making - every business goal can extract value from big data, and the time for doubt or inaction has long passed. In this Economist Intelligence Unit report, in-depth interviews with data pioneers reveal the link between the effective use of big data and the bottom line among other results.

Decision-Driven Data Management: A Strategy for Better Decisions with Better Data

Decision-Driven Data Management: A Strategy for Better Decisions with Better Data

Which came first, the data or the decision? This white paper makes the case for having a decision in mind, then tailoring big data's volume, variety and velocity to achieve business results such as overcoming customer dissatisfaction or creating well-informed strategies in real time.

Informationweek Reports

Research: The Big Data Management Challenge

Research: The Big Data Management Challenge

The challenge of big data is real, but most organizations don't differentiate 'big data' from traditional data, and nearly 90% of respondents to our survey use conventional databases as the primary means of handling data. We'll help you understand what constitutes big data (it's not just size) and the numerous management challenges it poses.