How Amazon Kinesis Adds Speed, Resilience To Analytics

Three Kinesis users explain how they use AWS's real-time, streaming data capture and analytics service to support their businesses.

Charles Babcock, Editor at Large, Cloud

November 19, 2014

9 Min Read
(Source: <a href="" target="_blank">SiliconAngle</a>)

7 Important Tech Regulatory Issues In 2015

7 Important Tech Regulatory Issues In 2015

7 Important Tech Regulatory Issues In 2015 (Click image for larger view and slideshow.)

Amazon Web Services' Kinesis, a real-time, streaming data capture and analysis system, is a key example of Amazon's emphasis on more sophisticated, non-commodity services. The Kinesis user creates an application that analyzes the data stream, and, like other data captures in the EC2 cloud, the data is replicated to three facilities, giving any Kinesis-based system durability and potentially great resilience.

Kinesis is remarkable for its ability to scale up to terabytes of data an hour, meaning it could be used with an application to analyze events and click-streams on a large and heavily trafficked website, which might prove tremendously valuable to a financial service or major retailer. And because it's a managed service, capable of synchronously replicating data, more than one analysis application may be applied to the data stream at one time. Furthermore, using Kinesis is free, though users will, of course, incur server and storage charges to capture, process, and store streams of big data.

All of this makes Kinesis, and no doubt additional Amazon services to come, a test product for how well newly fledged cloud services can mesh with the way enterprises want to do things. At Amazon's third annual Re:Invent conference, held recently in Las Vegas, InformationWeek asked three enterprise users why they had chosen Kinesis and how it was working out.

[Want to learn more about Kinesis? See Amazon Kinesis: Fast Analytics On Streaming Data.]

DataXu is a marketing cloud, a new mechanism designed to assess in near-real-time where potential buyers may be on the web and deciding, through its bid process, whether to put ads in front of them. A bidding system tells a site what DataXu is willing to pay at that moment to place an ad on its site, and the bid is instantly accepted or rejected.

The process has to happen in a few milliseconds, so the DataXu bid engine sits on bare metal in the IBM SoftLayer cloud. But the background analysis system of what media opportunities are out there and where a particular brand's buyers will be found is done through DataXu applications tied into the AWS Kinesis service.

Figure 1: (Source: SiliconAngle) (Source: SiliconAngle)

One of the things that DataXu applications have to do is assess which ads are generating click-throughs by viewers and in what media on the web they're located. DataXu wins customers by getting favorable results. It needs a lot of intelligence on what users are doing and which appeals work where. Co-founder and CTO Bill Simmons says his three-year-old data-handling firm, started in Boston by MIT grads, has 300 employees and 500 customers in 50 countries.

"I believe the next wave of big data technology will be to take action based on the data available," not just collecting and analyzing it, he said in an interview at Re:Invent.

Simmons founded DataXu -- originally intended to be DataZoo, but that name was already taken -- to be on the leading edge of "taking action" on big data streams, and the spot he chose was placing ads in appropriate venues in near real time.

"Amazon has phenomenal support for automated systems. If a server goes down, if a hard drive fails, a new one pops up. That's a huge benefit operationally," he said.

DataXu has had to build systems on Amazon Kinesis that allow it to know what audience characteristics its advertiser customers are seeking, then match them to what's being regularly offered at an online ad exchange. DataXu collects streams of data from its customers' websites and other "edge" servers to learn everything it can about potential customers. Intelligence from the Kinesis data is fed into the bid engine, which has to determine within 10 milliseconds what a good price would be to gain the available ad space and bid that price. The type of ad, such as a banner ad on a home page, versus a display ad amidst text, or an ad buried in a mobile app, helps determine the bid. Ads for videos and social networking systems are also gaining currency.

DataXu's bidding engine is competing with MediaMath, Turn, The Trade Desk, AppNexus, and other online ad placement companies that are also trying to gain the ad impression at a favorable price. The supplier of the ad space has usually selected a buyer within 100 milliseconds of submitting its availability, Simmons said.

This real-time bidding market is growing at a rate of 41% a year and will amount to a $42 billion market by 2018, according to online market researcher

DataXu plans to grow with the market, and to do so it must feed constant streams of data into Kinesis. After initial analysis, it is refactored for

Next Page

storage in AWS's S3 object storage, AWS's NoSQL system DyanamoDB, or AWS's Redshift data warehouse for reuse by longer-term analysis systems.

"All the edge server logs, all the real-time data flows are downloaded into Kinesis," Simmons said. In addition, a data stream from the bidding engine allows a Kinesis application to look at bids that won versus those that lost and look for events in the bidding stream that may be recurring. Perhaps DataXu can better position its bid the next time around.

Amazon's ability to supply "the durability, the availability of the data, and the throughput performance [in processing real-time data streams] are very critical to us," said Simmons.

Suhas Kulkani, VP of engineering and chief architect at Gree, the Japan-based builder of mobile games, such as Casino, Crime City, War of Nations, and Knights and Dragons, said his firm streams online game activity through Kinesis to examine the individual player experience, responses of thousands of players to specific events within a game, analysis of when players become buyers of the wares sold within games, and so forth.

Gree is an 1,800 employee company with studios in San Francisco, Vancouver, B.C., and Japan. While it started out on PCs, 90% of its user traffic is now from smartphones. It offers free versions of its game or makes them available as $1.99 downloads for iPhone and Android users.

"Gree's long-term success relies on game insights and game optimization. What is the load time, what is the experience of going from one screen to another, what is the response to a promotion within the game? We collect tons of information," said Kulkani.

Gree wants to know how new users fare in a Gree game and how quickly they progress into it. How do users respond to the new figures in the game or new twists to the game's flow? With such information, "we try to improve the overall experience," he said in an interview.

With such information, Gree has been able to maintain the popularity of its games in a competitive mobile gaming market. Modern War is two years old, long in the tooth for many mobile entries, but Kulkani says "it's still doing extremely well performance-wise." To accomplish that, the game must keep a healthy balance between the challenge it presents and "the win/lose experience of the player. We want a good win/lose ratio. It's very important for continued engagement," according to analysis of player activity on Kinesis.

Before moving its analytics onto Amazon Web Services, Gree relied on an in-house analytics platform, which was difficult to maintain as a highly available, highly resilient system. In addition to analyzing the player experience, Gree needs analytics to be available to marketing staff, game designers, and other groups in the company. Moving to Kinesis and other Amazon analytical services made results available throughout the company at all times.

Kinesis's near-real-time data analysis shows what happens when a game crashes, usually due to a hidden bug in the game logic. If it occurs with an early user, all the energy and attention that the user has invested in the game "is gone, and that player is not likely coming back," he said. The analysis provided by Gree applications on game activity illustrates that fact to game developers and acts as a spur to better programming and more thorough testing, he said.

Omnifone is another Kinesis user, though it first attempted to build systems in its own data centers to provide its Music Station platform to a wide variety of streaming music customers. It is a B2B music platform provider, with Sony Music Unlimited, Sirius XM, Guvera, and Rara among its customers.

Figure 2: Omnifone's Phil Sant, speaking at Re:Invent.
(Source: Amazon video) Omnifone's Phil Sant, speaking at Re:Invent.
(Source: Amazon video)

Phil Sant, founder and chief engineer, said he and his partners attempted to build a global firm without realizing the difficulty of meeting data center requirements for such an operation. Sony Music, in particular, lobbied Omnifone to improve its operations. If Omnifone was going to be Sony's supplier of streaming services, it needed to build a new data center with high reliability -- and a second data center as a disaster-recovery site. "I had to commit $15 million in capital expenses for those two data centers," Sant recalls.

Re-engineering a music service while streaming music in a rapidly growing business was like "rebuilding an airplane while it's in flight," he said. Omnifone sampled Amazon services in 2008 and began a steady transition into the cloud. Now he uses the Kinesis service for analytical applications and other parts of Amazon for compute and storage. "We sit on this cloud like a rising tide. We use 21 of their services," Sant said.

Omnifone is streaming data from its customers' music sites to see what new types of music are gaining popularity, what people prefer to hear on mobile devices, how musicians gain acceptance in different cultures, and what's being played in different parts of the world.

Sant has concluded from his experience that companies should be based on a scalable data capture and analytics system, such as Kinesis, rather than building their own. "There is nothing you should be running by yourself... If you're not rebuilding on Amazon, you will be killed by those who are."

Just 30% of respondents to our new Big Data and Analytics Survey say their companies are very or extremely effective at identifying critical data and analyzing it to make decisions, down from 42% in 2013. What gives? Get the The Trouble With Big Data issue of InformationWeek Tech Digest today. (Free registration required.)

About the Author(s)

Charles Babcock

Editor at Large, Cloud

Charles Babcock is an editor-at-large for InformationWeek and author of Management Strategies for the Cloud Revolution, a McGraw-Hill book. He is the former editor-in-chief of Digital News, former software editor of Computerworld and former technology editor of Interactive Week. He is a graduate of Syracuse University where he obtained a bachelor's degree in journalism. He joined the publication in 2003.

Never Miss a Beat: Get a snapshot of the issues affecting the IT industry straight to your inbox.

You May Also Like

More Insights