Big Data // Big Data Analytics
News
6/2/2014
12:55 PM
Connect Directly
Google+
RSS
E-Mail
50%
50%

Big Data Learns To Write

Automated writing platforms cull streaming data from multiple sources and churn out thousands of articles per second.

Apple WWDC 2014: 9 Things To Expect
Apple WWDC 2014: 9 Things To Expect
(Click image for larger view and slideshow.)

Computers can write, and surprisingly well, too. Software from startups like Automated Insights and Narrative Science generate written reports in plain English for targeted markets, including fantasy football, real estate, personal fitness, journalism, and essentially any storytelling niche where algorithms can quickly transform real-time data into readable text.

"When we're producing narratives around a particular data set, we'll produce thousands per second," said Adam Smith, Automated Insights' VP of sales and marketing, in a phone interview with InformationWeek. "We published over 300 million stories for our clients last year, and we'll publish well over a billion this year. We're tailoring a story in a personalized way to an individual user, or about an individual topic."

Automated Insights' clients include Yahoo Fantasy Sports, which uses the company's Wordsmith platform to generate personalized reports for its users, a process too time- and labor-intensive for human writers.

With the Yahoo Fantasy Football recaps, for instance, "we're doing probably 1,500 to 2,000 [stories] per second, and millions over a one- or two-hour period," Smith told us.

[It's not just journalists who are nervous about robots: Wearables, Drones Scare Americans.]

Insights from big data streams are often presented in dashboard form, complete with charts, graphs, and other visually oriented infographics -- an approach that requires end-users to "interpret" the data, says Smith.

But with automated written reports, "all you have to do is read. It's like sitting down with a data scientist and having them walk you through the key aspects."

Image: Enokson on Flickr.
Image: Enokson on Flickr.

Of course, this algorithmic approach to writing works best with data-driven topics.

"We do a lot of work with big data, BI, and analytics. And part of that is, how can we mine data in real time, make it actionable, spot the insights, pull together the insights that are most important, and tell a story about it?"

Software-generated copy is well suited to formulaic topics, too, such as summarizing a baseball game or other sporting event.

"A software platform like ours can look back to the 1800s and analyze every single performance that's ever happened," says Smith. And while a few human sportswriters may possess a near-encyclopedic knowledge of historical baseball stats and scores, none can match the automated system's prodigious output.

How does the robot writer "watch" a game? Ingest the data it needs to construct a story? Joe Procopio, Automated Insights' VP of product engineering, explained in a recent blog post:

At the professional sports level (think MLB, NFL, NBA), data is collected not just at the game and player level, but at the play and performance level. We now know how fast each pitch is thrown and where, how many times and in which direction a quarterback goes long, and even whether or not a game-deciding call was blown, thanks to replay.

In all pro sports and even most college and some recreational, there are now all kinds of sensors and cameras tracking the game, sometimes at the individual level, all of which can support qualitative analysis on quantitative facts. For example, when we tell you a hitter is off his swing, we're not playing a hunch, we can see it.

Robot writers are penning more than sports recaps, too. Quakebot, an algorithm created by Los Angeles Times journalist and programmer Ken Schwencke, garnered plenty of attention in March when it developed, wrote, and published a story about a Southern California earthquake in less than three minutes.

No ink-stained wretch could do that.

So should flesh-and-blood journalists be worried? Will the algorithm put them out of work?

On the contrary, Automated Insights claims. By doing data-churning grunt work, robot writers free human journalists to interview people, provide deeper insights, and essentially tell stories that algorithms can't.

Well, not yet, anyway.

When it comes to managing data, don't look at backup and archiving systems as burdens and cost centers. A well designed archive can enhance data protection and restores, ease search and e-discovery efforts, and save money by intelligently moving data from expensive primary storage systems. Read our The Agile Archive report today. (Free registration required.)

Jeff Bertolucci is a technology journalist in Los Angeles who writes mostly for Kiplinger's Personal Finance, The Saturday Evening Post, and InformationWeek. View Full Bio

Comment  | 
Print  | 
More Insights
Comments
Newest First  |  Oldest First  |  Threaded View
Ariella
50%
50%
Ariella,
User Rank: Ninja
6/12/2014 | 8:43:27 AM
Re: Interesting
@Elizabeth very interesting. Thanks for going through the details for me. 
ElizabethFarabee
50%
50%
ElizabethFarabee,
User Rank: Apprentice
6/12/2014 | 6:23:23 AM
Re: Interesting
Hi Arielle,

Good question!

There are several points which distinguish us more broadly from Narrative Science:
  • Only Yseop writes reports in real time.
  • Only Yseop contains a unique dialogue engine, which allows it to interact with and gather contextual data from the user (using closed questions).
  • Only Yseop writes in multiple languages (English, Spanish, French, German, etc.).
  • Only be Yseop can be installed on a customers' servers (so they can maintain data confidentiality). Yseop can also be run in the cloud.          
  • Only Yseop allows users to build & maintain applications on their own. 
  • Only Yseop provides non-Regression, Impact and Coherence Testing tools to ensure accuracy and consistency of the generated text.
  • Only Yseop holds a patent on its unique ability to write using synonyms.

However, to answer your question more specifically--it is really this last point which ensures that Yseop writes "like a human being". As I mentioned in my previous comment, we are able to do this because we incorporate synonyms, both in terms of word choice and sentence structure in the software. This ensures that no two sentences are written exactly the same way for the same type of output. Yseop knows the difference between a subject, verb and complement - which allows us to dynamically construct sentences from the rules of grammar. We do not use templates.

I hope that answers your question!

Best regards,

Elizabeth 
Ariella
50%
50%
Ariella,
User Rank: Ninja
6/11/2014 | 8:36:11 PM
Re: Interesting
@Elizabeth How do you distinguish your computer's writing from NarrativeScience's? That company also claims that what it produces will sound like the product of a human being. 
ElizabethFarabee
50%
50%
ElizabethFarabee,
User Rank: Apprentice
6/3/2014 | 4:39:30 PM
Re: Interesting
Hi Lorna,

I agree with you that any story or piece of text, which is generated by a robot, be as human-like as possible.

No human would write any story or text in exactly the same way twice. We would include the use of synonyms - both in terms of the variety of the vocabulary we use, as well as variations in sentence and paragraph structure and form.

However, only one software company owns the US patent of this feature. 

Yseop (full disclosure: I work for them) is a natural language generating software based on artificial intelligence which writes - truly just like a human being. We are able to do this thanks to a unique, patented aspect of our technology which allows us to incorporate synonyms, both in terms of word choice and sentence structure on the text. Each person can specify the vocab words most appropriate to their industry and the type of output they want the robot to produce, and Yseop ensures the human-like nature of the text!

In our ideal world, readers wouldn't ask this question because they wouldn't notice the difference!

-Elizabeth
Ariella
50%
50%
Ariella,
User Rank: Ninja
6/3/2014 | 9:01:00 AM
Re: Interesting
I wrote about Narrative Science shortly after it received a round of funding in September 2013 that brought up the total funding at that point to $20 million. The concept came to be a as a Northwestern University research project called StatsMonkey.  Two professors, Kris Hammond and Larry Birnbaum  advised computer science and journalism students on developing software that could generate an account of baseball games solely from batter statistics. After college, two students, John Templon and Nick Allen obtained funding to launch the business that was incorporated as Narrative Science in January 2010.   Subsequently, StatsMonkey was replaced by the more sophisticated Quill™,  a "patented artificial intelligence authoring platform."

In a guest blog on HBR, entitled "The Value of Big Data Isn't the Data," Hammond made argued that algorithms write better narratives based on big data  than peopple because algorithms make the process of turning big data into narratives that people relate to seamless. "By embracing the power of the machine, we can automatically generate stories from the data that bridge the gap between numbers and knowing."

And the narratives are supposed not sound robotic.  Jonathan Morris, COO of a financial analysis firm called Data Explorers, which set up a securities newswire using Narrative Science technology, was quoted in a Wired article saying,  "You can get anything, from something that sounds like a breathless financial reporter screaming from a trading floor to a dry sell-side researcher pedantically walking you through it." 

 
danielcawrey
50%
50%
danielcawrey,
User Rank: Ninja
6/2/2014 | 3:58:09 PM
Re: Interesting
I have read a few Narrative Science articles, and I could see an error or two. But this is the beginning. These bugs will get worked out, and this will likely become a trend. There is so much information out there today to be written about that there will be a need for algorithms to put some stories together.

But other than mining data, I'm not sure how these software programs will write more complex stories for the time being. 
Lorna Garey
50%
50%
Lorna Garey,
User Rank: Author
6/2/2014 | 3:34:23 PM
Interesting
Can a reader typically tell that a story was generated by robot? Seems likely that after some time regular subscribers to a sports site will notice that their newsletters (or whatever the output) are exactly the same structure and lack (presumably) the flourishes that a human writer might add. 

And, if they do notice, do they perceive the quality as better, worse or on par with a human-written article?
6 Tools to Protect Big Data
6 Tools to Protect Big Data
Most IT teams have their conventional databases covered in terms of security and business continuity. But as we enter the era of big data, Hadoop, and NoSQL, protection schemes need to evolve. In fact, big data could drive the next big security strategy shift.
Register for InformationWeek Newsletters
White Papers
Current Issue
InformationWeek Tech Digest - July 22, 2014
Sophisticated attacks demand real-time risk management and continuous monitoring. Here's how federal agencies are meeting that challenge.
Flash Poll
Video
Slideshows
Twitter Feed
InformationWeek Radio
Archived InformationWeek Radio
A UBM Tech Radio episode on the changing economics of Flash storage used in data tiering -- sponsored by Dell.
Live Streaming Video
Everything You've Been Told About Mobility Is Wrong
Attend this video symposium with Sean Wisdom, Global Director of Mobility Solutions, and learn about how you can harness powerful new products to mobilize your business potential.