Connected data in the cloud is driving a bit of a sea change, says Nick Caldwell, chief product officer for Looker, a provider of business intelligence and big data analytics. He spoke with InformationWeek about data engineering and data-driven workforces. He previously worked as vice president of engineering with Reddit and prior to that was general manager for Microsoft Power BI business intelligence and analytics service.
He says his exposure at Reddit opened him to new ways of approaching data challenges using Amazon AWS and Google BigQuery for a site with users that generate massive amounts of data. Now at Looker, he sees activity in the market around how people use data and ways infrastructure is evolving with an expectation for data to be integrated into the tools they use. This goes beyond data professionals and includes factory workers, school teachers, and students. He sees this trend leading to growth in SaaS applications that might sit atop large datasets.
How do trends in connected data and business intelligence affect cloud infrastructure?
“Modern cloud infrastructure, in massively parallel data houses, allows you to dump enormous amounts of data at low cost without losing any sort of performance. Increasingly the data stores are baking analytics directly into the data store. Google BigQuery has a language called BQML, BigQuery machine learning, where you can dump your data in and run TensorFlow machine learning jobs directly within the database. The databases are cheaper, faster, and very, very powerful. That trend is something Looker has latched onto.
“It means that no matter how many of these new SaaS apps or data sources are going to pop up, you can push them into one of these massively parallel data lakes. Then rather than do the older generation approaches of ETL (extract, transform, load) jobs, creating data marts, and doing aggregate tables—just push the data into one massively parallel warehouse and do a technique called schema on read.
What are the advantages of such consolidation of data?
“After you’ve pushed all the data into one spot, you can use a semantic layer that describes what you think the data should look like. Given that I have Marketo, Zendesk, and other software and tools in one spot, what are the tables I actually care about? What are the business metrics I care about? Tell the semantic layer how to compute and calculate those things. What Looker does is take that semantic description and convert it into SQL queries that are optimized for whatever the underlying data store is.
“If you’ve got all your data in BigQuery, Looker’s going to know how to take your semantic understanding of the data and convert it into the actual SQL queries to run against BigQuery in the most efficient way. This has a lot of advantages from an infrastructure perspective in terms of time to value. If I am a data engineering team who previously spent all my time maintaining complex ETLs from these different data stores, updating data marts, responding to my end users who want to ask new business questions—that’s a very costly cycle to iterate. I have to change how the data is getting transformed at multiple steps.
How can you simplify that process?
“With Looker, you just change the semantic layer. There is one place that multiple developers can edit at the same time using tools that allow for large scale collaboration. We have customers such as Square, who have developers of the semantic model all working with it at the same time and then allowing other departments within Square to make use of that semantic model to deliver dashboards, build custom applications, and other sorts of experiences.
“Looker is corralling this ever-exploding mountain of data and SaaS applications and putting in a governed, well-understood API for that data like a central layer that tells you ‘this is the truth.’ On top of that you can build different experiences from exploratory dashboards to something like Deliveroo, which has an app that delivery drivers use. I saw a demo where the API was being used to optimize bid campaigns for marketing spend. You can use it all sorts of different ways, but the fundamental thing is they are now all trusted and there is one place where you can rapidly iterate on what the definition of data truth is.
How do you deal with friction that arises if customers are used to handling data a certain way?
“There’s friction because we’re a fundamentally different architecture. It is very different from how the majority of companies architect a data warehouse. Typically, when we go into an account, you have legacy systems on premise, and they are trying to figure out how to join the modern cloud revolution. In those cases, they quickly realize all of the trends that have been underway. They discover for themselves, ‘Given all of these new capabilities, maybe I don’t need to do things the old way.’ Then they look for a solution that works using the modern approach and hit upon Looker.
“It’s a different architecture and a different approach that was built from the ground with a cloud-first governance layer in mind. The previous generation was workbook chaos—just give everyone in the organization a tableau or book and they can edit however they want. You were giving up things to accommodate for speed or convenience. With Looker, you still get to see all of the data; you still get the performance because you’re using a modern data store but you don’t get the chaos.”