How Amazon Kinesis Adds Speed, Resilience To Analytics - InformationWeek

InformationWeek is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

IoT
IoT
Cloud // Software as a Service
News
11/19/2014
10:55 AM
Connect Directly
Twitter
RSS
E-Mail
50%
50%

How Amazon Kinesis Adds Speed, Resilience To Analytics

Three Kinesis users explain how they use AWS's real-time, streaming data capture and analytics service to support their businesses.

7 Important Tech Regulatory Issues In 2015
7 Important Tech Regulatory Issues In 2015
(Click image for larger view and slideshow.)

Amazon Web Services' Kinesis, a real-time, streaming data capture and analysis system, is a key example of Amazon's emphasis on more sophisticated, non-commodity services. The Kinesis user creates an application that analyzes the data stream, and, like other data captures in the EC2 cloud, the data is replicated to three facilities, giving any Kinesis-based system durability and potentially great resilience.

Kinesis is remarkable for its ability to scale up to terabytes of data an hour, meaning it could be used with an application to analyze events and click-streams on a large and heavily trafficked website, which might prove tremendously valuable to a financial service or major retailer. And because it's a managed service, capable of synchronously replicating data, more than one analysis application may be applied to the data stream at one time. Furthermore, using Kinesis is free, though users will, of course, incur server and storage charges to capture, process, and store streams of big data.

All of this makes Kinesis, and no doubt additional Amazon services to come, a test product for how well newly fledged cloud services can mesh with the way enterprises want to do things. At Amazon's third annual Re:Invent conference, held recently in Las Vegas, InformationWeek asked three enterprise users why they had chosen Kinesis and how it was working out.

[Want to learn more about Kinesis? See Amazon Kinesis: Fast Analytics On Streaming Data.]

DataXu
DataXu is a marketing cloud, a new mechanism designed to assess in near-real-time where potential buyers may be on the web and deciding, through its bid process, whether to put ads in front of them. A bidding system tells a site what DataXu is willing to pay at that moment to place an ad on its site, and the bid is instantly accepted or rejected.

The process has to happen in a few milliseconds, so the DataXu bid engine sits on bare metal in the IBM SoftLayer cloud. But the background analysis system of what media opportunities are out there and where a particular brand's buyers will be found is done through DataXu applications tied into the AWS Kinesis service.

(Source: SiliconAngle)
(Source: SiliconAngle)

One of the things that DataXu applications have to do is assess which ads are generating click-throughs by viewers and in what media on the web they're located. DataXu wins customers by getting favorable results. It needs a lot of intelligence on what users are doing and which appeals work where. Co-founder and CTO Bill Simmons says his three-year-old data-handling firm, started in Boston by MIT grads, has 300 employees and 500 customers in 50 countries.

"I believe the next wave of big data technology will be to take action based on the data available," not just collecting and analyzing it, he said in an interview at Re:Invent.

Simmons founded DataXu -- originally intended to be DataZoo, but that name was already taken -- to be on the leading edge of "taking action" on big data streams, and the spot he chose was placing ads in appropriate venues in near real time.

"Amazon has phenomenal support for automated systems. If a server goes down, if a hard drive fails, a new one pops up. That's a huge benefit operationally," he said.

DataXu has had to build systems on Amazon Kinesis that allow it to know what audience characteristics its advertiser customers are seeking, then match them to what's being regularly offered at an online ad exchange. DataXu collects streams of data from its customers' websites and other "edge" servers to learn everything it can about potential customers. Intelligence from the Kinesis data is fed into the bid engine, which has to determine within 10 milliseconds what a good price would be to gain the available ad space and bid that price. The type of ad, such as a banner ad on a home page, versus a display ad amidst text, or an ad buried in a mobile app, helps determine the bid. Ads for videos and social networking systems are also gaining currency.

DataXu's bidding engine is competing with MediaMath, Turn, The Trade Desk, AppNexus, and other online ad placement companies that are also trying to gain the ad impression at a favorable price. The supplier of the ad space has usually selected a buyer within 100 milliseconds of submitting its availability, Simmons said.

This real-time bidding market is growing at a rate of 41% a year and will amount to a $42 billion market by 2018, according to online market researcher ReportsnReports.com.

DataXu plans to grow with the market, and to do so it must feed constant streams of data into Kinesis. After initial analysis, it is refactored for

Next Page

Charles Babcock is an editor-at-large for InformationWeek and author of Management Strategies for the Cloud Revolution, a McGraw-Hill book. He is the former editor-in-chief of Digital News, former software editor of Computerworld and former technology editor of Interactive ... View Full Bio

We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.
Previous
1 of 2
Next
Comment  | 
Print  | 
More Insights
Comments
Newest First  |  Oldest First  |  Threaded View
Charlie Babcock
50%
50%
Charlie Babcock,
User Rank: Author
11/19/2014 | 2:58:12 PM
Remember the 24 hour limit on data retention
One limitation: Amazon Kinesis retains the data streamed into it for 24 hours. In other words, if you want to save it, be sure and designate a target repository, such as S3 or DynamoDB.
Slideshows
10 RPA Vendors to Watch
Jessica Davis, Senior Editor, Enterprise Apps,  8/20/2019
Commentary
Enterprise Guide to Digital Transformation
Cathleen Gagne, Managing Editor, InformationWeek,  8/13/2019
Slideshows
IT Careers: How to Get a Job as a Site Reliability Engineer
Cynthia Harvey, Freelance Journalist, InformationWeek,  7/31/2019
White Papers
Register for InformationWeek Newsletters
Video
Current Issue
Data Science and AI in the Fast Lane
This IT Trend Report will help you gain insight into how quickly and dramatically data science is influencing how enterprises are managed and where they will derive business success. Read the report today!
Slideshows
Flash Poll