The Weather Company, which owns a slew of weather-reporting and weather-prediction businesses, has no shortage of data and no shortage of data management tools at its disposal. But why does it need three different NoSQL databases?
I recently put this question to Bryson Koehler, CIO at The Weather Company, parent of The Weather Channel, WeatherFX, Weather Underground, and Intellicast, among other business units. The company ingests and processes some 20 terabytes of data per day to serve up current global weather conditions and predictions for airlines, emergency services, shippers, utilities, insurers, and millions of users of online weather sites and mobile weather apps. The demand adds up to billions of requests for weather data each day, and performance expectations are as fast as 10 milliseconds.
[Want more wisdom from Bryson Koehler? Watch Why Big Data Tools Are Here To Stay: InformationWeek Video.]
Riak is the NoSQL database behind The Weather Company's massive transactional Storage Utility Network (SUN) data-ingestion platform, which runs on several Amazon Web Services (AWS) availability zones and captures more than 2 billion weather data points 15 times per hour. So Riak clearly handles scale, yet the company also uses Cassandra and recently added MongoDB to serve data to iOS and Android mobile apps for Weather.com.
The Weather Company is simply using different products, Koehler explains, because "different tools have different strengths."
On Cassandra, which is used to serve up API data to Weather Company and third-party weather apps used by consumers around the globe: "Our data-distribution platform handles hundreds of thousands of transactions per second, and we find Cassandra to be a great way to distribute data globally and provide high availability on the [database] read side of the house." It's essentially serving up data to Weather Company and third-party weather apps used by consumers around the globe.
On MongoDB, which is used as a middle-layer caching capability that feeds the Weather.com Web site and mobile apps:"We don't yet have all the Weather.com content coming off of our core APIs, so MongoDB is the recipient and distribution point for Weather.com and the Weather.com mobile application on Android and iOS. Mongo has a lot of benefits in terms of flexibilities within the schema and JSON capabilities built in."
On Riak, which is used to consume weather data and observations, including images and videos, from throughout the world: "We love Riak for its data-ingestion capabilities, and doing that in a globally distributed way. It's a really solid choice for inbound [database] writes and doing it from many locations on a globally distributed platform."
I've heard Datastax, Basho, and Couchbase executives disparage the scalability of MongoDB, yet MongoDB points to massive deployments at Facebook supporting mobile apps on more than 200 million devices, and at eHarmony, where it processes billions of potential dating matches per day. MongoDB handles "a billion transactions per day" for Weather.com and the Weather.com mobile app, according to Koehler, and "there's no question that you can configure and deploy Mongo to be able to handle insanely high volumes of transactions."
Nonetheless, Koehler admits he would "love to see MongoDB continue to make global clustering and multiple-location [capabilities] more seamless and easy to use." These are the kinds of global distributed clustering, replication, and load-balancing features that Cassandra and Riak are known for.
To put the scale discussion in perspective, few companies operate at The Weather Company's scale. Its ease of development, schema flexibility, and JSON data handling have made MongoDB the world's most popular NoSQL database. And that's why Microsoft and IBM are going after MongoDB's turf with Microsoft Azure DocumentDB and IBM Cloudant, respectively, not Cassandra and Riak.
The Weather Company may well consolidate from three NoSQL standards down to two, Koehler says, but the company is not quite ready to make that call.
"We have an overly complicated environment today that stems from us casting the nets pretty wide to build a lot of different data solutions," he notes. "We wanted to give the team some freedom so we could understand the pros and cons of all our choices, but you will see some consolidation from us."
When the time comes, migration won't be difficult because "the great thing about a NoSQL databases is that you don't really get that locked into it," Koehler explains. "If you've architected yourself and coded yourself correctly, moving from one to the other is not that hard. With schema freedom and the ability to just dump data in, whether it's a key-value store or whatever, it's much easier to swap things in and out."
Gone are the days of complex stored procedures being custom-coded against particular products, Koehler says, but is there more to "architected and coded correctly"? The idea is to avoid special single-vendor tools and features that might lock you in. He cites Amazon Web Services' (AWS) messaging service as an example.
"You don't have to use that service to run in the cloud," he explains. "You could just deploy your own RabbitMQ environment and not be locked in, so you could move an app from AWS and go deploy it on the Google Compute Cloud. Whether it's a data platform, a storage environment, or a cloud computing environment, be careful about allowing yourself to get locked in to shiny bells and whistles that are custom or only provided by one vendor."
What will you use for your big data platform? A high-scale relational database? NoSQL database? Hadoop? Event-processing technology? One size doesn't fit all. Here's how to decide. Get the new Pick Your Platform For Big Data issue of InformationWeek Tech Digest today. (Free registration required.)