Hailo isn't the only taxi-hailing smartphone app around (see Uber and TaxiMagic, to name just two), but the two-year-old startup launched with an ambitious plan for big data-powered global growth.
Hailo is taking advantage of location-aware smartphones to unite riders with cabs in London, Dublin, Toronto, Chicago and Boston. The London-based company says it has served more than 5 million passengers and reached a sales rate of $100 million in fares a year. Now it's moving into New York, Tokyo, Washington DC, Cork (Ireland), Madrid and Barcelona. Hailo says it's ready for the increased workload because it's running on Apache Cassandra, an open-source NoSQL database that can run in multiple global data centers.
"We started off as a MySQL, PHP and Java shop, but we switched to Cassandra in anticipation of scaling the business," explained Dave Gardner, senior engineer at Hailo. "We knew we wanted to launch to consumers in the U.S. and we wanted to ensure availability and resilience."
Gardner acknowledges it's possible to set up global replication with conventional relational databases "if you have enough money," but he says it has been dead easy to run a single instance of Cassandra across the company's data centers in the U.S. and U.K. (and soon, Singapore).
[ Want more on NoSQL database reliability and performance? Read The Man Who Tortures Databases. ]
"We want Hailo to be a utility service -- something that just works, all the time, around the world," Gardner explained. "Cassandra lets us replicate our data and keep copies in each data center. If there are errors, Cassandra will just carry on because it's designed to run in that fashion."
Practitioners are learning that highly scalable NoSQL databases don't always live up to their billing on "just carrying on," and new testing and benchmarking standards are just beginning to emerge.
As Joe Masters Emison reports, tests by NoSQL investigator Kyle Kingsbury recently revealed "unexpected behavior" from some products, though Cassandra has yet to go under his microscope.
Performance is one topic to explore when considering NoSQL, and analytical capabilities another. What you give up with Cassandra, and with many other NoSQL databases, is the deep and broad query capabilities of SQL. NoSQL fans might insist that the "No" stands for "not only," but query imitations like Cassandra Query Language (CQL) are no substitute for full SQL.
That point is confirmed by Gardner, a big fan who runs a Cassandra meetup group in London. "You gain availability, but you lose the degree and flexibility of query that you have with relational databases," Gardner said.
The need for fast and robust querying led Hailo to Acunu, an analytics vendor that specializes in streaming data ingest and analysis, data visualization and sub-second querying on top of Cassandra. Acunu works with Cassandra's event counters. As the database stores events -- someone hailing a cab from a particular location, a driver taking that fare, the journey being completed and the fare being paid -- Acunu's software aggregates the information for analysis.
Acunu's query language is Acunu Query Language (AQL), which one suspects is also limited compared to SQL, but Gardner insists it's easy to use and offers many more options and CQL. "I can execute an AQL query on how many people have requested cabs in the last eight hours, and I can actually write, 'where time equals between eight hours ago and now' as text and Acunu figures it out," Gardner said.
Every time a Hailo user hits the "get me a cab" button, Hailo offers the job out to drivers in that area. Drivers using the app see available fares in their area based on GPS mapping, and they can accept, decline or ignore the call. In an ongoing query powered by Acunu, Hailo counts accepts, declines and ignores in the system.
"That gives the people who run the business side information to optimize the network," Gardner explained. "They can see how acceptance rates vary by time of day, and if they see that certain areas of the city are underserved, they can go on a recruitment push for new drivers."
Acunu's user interface has a data explorer view that lets you construct ad hoc queries, and there are also standardized line and bar charts and tables. A new geospatial feature that Acunu has in development will enable Hailo to show heat maps of customer demand in real time. Every event supported by the Hailo app relates to location, so all events include a latitude, a longitude and a time stamp.
"We don't have to build anything; we just add the data," said Gardner. "When you find queries and visualizations that work well, you can add them to a dashboard. We track jobs with lots of charts and graphs, and it's a handy way to see what's going on."
It's common to hear about improvements in analytics capabilities from the various NoSQL communities, and the same is true for Cassandra. Just-released Cassandra 2.0, for instance, includes CQL enhancements such as cursors and improved index support. Many Cassandra users are also rolling their own queries and reports from scratch, but it's easy to imagine that Acunu will have plenty of room to bring companies the kind of query, analysis and data visualization capabilities they're used to having on relational database platforms.