When the box from HelloFresh arrives at your doorstep with recipes and ingredients for 3 to 5 well-balanced gourmet meals for you to make at home, it may feel like this service was designed just for you. And that's probably the goal of this 5-year-old company based in Berlin, Germany.
But in reality, the company is delivering more than 7.5 million meals per month to more than 800,000 subscribers in the UK, German, Switzerland, Netherlands, the US, Canada, Austria, Belgium, and Australia.
To deliver that high level of customer service and ensure customer satisfaction, HelloFresh has relied upon an internally developed business intelligence system backed by a mix of a relational database and key value storage for pre-calculated data, built on PHP.
"It was definitely not a good idea, but at the time it was the technology we were most comfortable with," HelloFresh CTO Nuno Simaria told InformationWeek in an interview. This internally developed system started running into some serious limitations.
First, it didn't offer the kind of flexibility that the company's analysts wanted. While it tracked KPIs (key performance indicators) and told analysts what was going on, it couldn't answer their next question. Why? Why was a KPI rising or falling? it didn't enable the analysts to drill down into the data. HelloFresh wanted a system that could provide more flexibility to analysts, and to do it in a way that provided those analysts with self-service options.
[You know about Hadoop, but what about Spark? Read Apache Spark Ignites Big Data Adoption.]
But the second problem with the existing home-grown BI system was a bit more urgent. HelloFresh's BI system was running out of capacity, given the growth of the company and the growth of the data collected. The clock was ticking. It was time to migrate to something else. But what?
"We set up our own Tiger Team to wrap our head around the big data challenge," Simaria said. "At the time we didn't have a big data problem, but we could see it coming from miles away. We kept adding more and more metrics that we wanted to collect for business insights."
The company initially looked at a few different alternatives including newer relational database management systems, such as MemSQL. HelloFresh also briefly considered solutions from Oracle and SAP's HANA, but quickly ruled those out due to the expense of the scale of the systems needed to accommodate HelloFresh's huge transactional needs.
HelloFresh also looked at Hadoop. Simaria told InformationWeek that Hadoop's low cost compared to alternatives made it a strong contender. The technology could offer high performance on inexpensive commodity hardware. There was no need to invest in expensive specialty hardware.
And even though Simaria's team wasn't familiar with the Hadoop stack, they were familiar with some of the technologies around it such as Hive, and a little bit of Spark.
"We were very curious about the technology and we already had some prior knowledge," Simaria said. "But mainly the budget led us to search for a Hadoop vendor we could partner with."
But choosing Hadoop did not come without challenges, too. Among the biggest initial challenges was finding skilled professionals in a technology that was relatively new to the market.
"It is very hard to find data engineers in the market," Simaria said. But HelloFresh has an approach it uses when it hits a challenge like that. "We empower our tech team to do what we call 'Figure it out.'" In this case that meant giving two of the company's really good engineers who wanted to learn more about engineering time to learn -- a month and a half -- and the budget that they needed to set up a Hadoop cluster.
"We'll give you the budget, and we'll give you the time," Simaria said of the approach. "This is something we've done with other technologies as well. If it is not easy for us to access talent in the market in the short term, we will empower our developers and our engineers who are interested in problem solving, and we will let them discover the complexities of that technology."
At the end of the project the engineers had to answer three questions:
- Is Hadoop the right technology?
- How do we go about migrating what we have to our future solution?
- What should be the Hadoop distribution we should use going forward.
In the end, HelloFresh partnered with MapR, which offers its own big data stack in the form of its Converged Data Platform. Simaria said that HelloFresh chose MapR for two reasons. First, the company's implementation of HDFS (Hadoop Distributed File System) was easier to work with than the standard Apache version of the technology. Second, MapR offers a snapshot feature that lets HelloFresh quickly roll back to a previous version of its data source in the case of a catastrophic failure. There's no need to rebuild the entire data warehouse in case of disaster, Simaria told InformationWeek.
MapR senior director of industry solutions Dale Kim told InformationWeek in an interview that the need for real-time capabilities also played a part in HelloFresh's goals for its new analytics system.
"We used to think that a report that was a day old or an hour old was real time," Kim said. "People now want to know about what happened in the last minute."
And the need for real-time analytics capabilities dovetails with the move towards providing self-service options to data analysts.
"The traditional model for BI is about waiting for information," Kim said. "Now, as soon as information is available analysts can start querying it."
And that's what HelloFresh has achieved.
"This technology has allowed us to spread data-driven decision making to anyone in the organization, from local teams to global finance to whoever needs to use data insights to make decisions," Simaria said.
And ultimately this new technology enabled HelloFresh to be more precise in its forecasting so it can know exactly the right number of tomatoes it will need in one month's time to serve all its customers and avoid the costs of spoilage.