Guinness, an Aircraft Carrier, and the History of Data
The chief data scientist for the NY Times and a professor of history from Princeton U took a deep dive into data’s past and potential future at Own’s rebrand event.
At a Glance
- The computer era began 75 years ago, but data collection goes back some 2000 years.
- After its IPO in the 19th century, Guinness was the first in industry to put statisticians and data work.
- The Netflix Prize competition in the early 2000s contributed to the development of modern AI.
Onboard the Intrepid aircraft carrier museum ship, forever berthed along the Hudson River in New York City, there was a discussion last week on data’s history and its potential future.
Chris Wiggins, chief data scientist for The New York Times, and Matthew L. Jones, professor of history at Princeton University (previously at Columbia University), presented some of their findings gleaned from their book, “How Data Happened: A History from the Age of Reason to the Age of Algorithms.”
The discussion on data was part of an event hosted by Own (previously OwnBackup) to introduce a company rebrand and a new addition to its SaaS data platform. Own, known for backup and recovery, is expanding its platform with Own Discover, which will let companies draw upon their respective historical SaaS data for insights and to make AI-driven decisions.
Data, with its ability to inform decisions and actions, obviously puts tremendous power in the hands of those who possess and control it. “Today, data rules. We’re in a world in which data is on top,” said Wiggins. “You can look at the top companies at the NASDAQ and see that the companies that are driving our political and professional and personal realities are companies that know how to make sense of data and are shaping our world through data.”
He pointed out that the era of computers processing data spans some 75 years, but that built upon a tradition of 200 years of people gathering and using data to better understand the natural world. “That builds on a much longer tradition of people trying to collect data and just store data and put it to work for 2,000 years,” Wiggins said.
One of the earliest examples of writing, cuneiform, etched into stone tablets recorded elements of daily life and preserved data from millennia past. “Taxes, tariffs, crop yields, these sorts of things, but also observations of the stars and planets as well as observations of things like animal entrails,” said Jones. “This data, though thousands of years old, remains salient in understanding the development of the solar system and the movement of the stars.”
Chris Wiggins (l) and Mathew L. Jones (r) take the stage at Own's rebranding event within the Intrepid Museum. (photo by Joao-Pierre S. Ruth)
Keeping track of Mesopotamian data, Wiggins said, can help develop an understanding of long time scales. A change came 200 years ago when people realized they could gather data from multiple sources on a subject -- such as measurements of the locations of planets -- to create consensus models. “That really changed the relationship between data and truth,” he said. “It wasn’t just mathematical detail. It was actually looking at and comparing numbers.” In effect, Wiggins said, that approach to data changed the way humanity understood nature and the universe.
Such observations on data and its use tie into one of the biggest IPOs of its time. “Guinness, when it IPO’d in the late 19th century, went to a valuation that’s crazy,” Wiggins said. “It was like a half trillion in today dollars valuation. And what did they do with all that capital? They hired statisticians."
One of the first uses of data within industry was to make beer.
“Guinness knew that somehow they had to put these data to work,” he said. “They hired some of the best statisticians around, but they knew that it was a trade secret.” So, the beer maker would not let the statisticians publish under their own name in order to protect their methods of statistics and the data Guinness had access to.
One of the first uses of data within industry was to make beer.
Wiggins further detailed data’s history, which included the importance of codebreakers in World War II and access to abundant data along with computation to make sense of that data. Nowadays, he said, companies are awash with data. “They not only have engineering challenges to store it and to make sense of it, but they have to think through, just like Guinness did, ‘How are you going to actually learn from those data? How do you build models of those data?’”
The deluge of data in the early 2000s, Jones said, saw business intelligence and other tools come into play to make sense of that. “It was such a powerful challenge that a company like Netflix thought that was worth putting big money behind competitions,” he said, “to bring together diverse people with all kinds of different approaches to data to try to figure out how they could make use of that data in better ways for profit, for science, and for enjoyment of movies.”
The Netflix Prize competition for $1 million, initially begun in 2006 and then ultimately won in 2009, was a search for an algorithm that could predict users’ ratings of movies. That brought together different kinds of algorithms, Jones said, which led to a key component in the overall transformation from what was called machine learning to what is now regarded as artificial intelligence. “AI for much of the 20th century, remarkably for us, was not about data. It was about trying to emulate the great chess players, or a great mathematician, or a logician and trying to build a robot on those grounds,” he said.
Netflix’s efforts to crowdsource the discovery of new ways to use data for profit, Jones said, reflected the promise of the accumulation of data, the demand for new kinds of analytic technologies, and certain dangers. Wiggins said Netflix’s competition led to lessons in the way value is created with data. There came a realization, he said, the ratings data from Netflix could be combined with discussions on IMDb to discover more about the users. “Part of how we create value together with data, particularly when it’s the customer’s data, involves a lot of care about the way we protect those data,” Wiggins said.
This points to data privacy concerns, Jones said, found in academia, the corporate world, and the world of intelligence. “This is a conversation that has been going on for a little while,” he said. “It takes us back to the era of Watergate -- a moment of deep suspicion about the nature of the US state and states around the world.”
“Part of how we create value together with data, particularly when it’s the customer’s data, involves a lot of care about the way we protect those data.” --Chris Wiggins, chief data scientist, New York Times
Rumors of a federal, centralized data repository, Jones said, led to an eruption of worries in the press and Congress, which led to a bipartisan movement concerned about privacy. This understanding that data, and metadata for the matter, is valuable raised awareness about potential privacy dangers for individuals and organizations. “The end product was actually a fairly narrow form of legislation that restricted the federal government, and not other sorts of entities, but it was a transformative moment,” he said, referring to the Privacy Act of 1974.
Protection of data, Wiggins said, was soon understood to also be an infrastructure concern. The cloud, Jones said, presented a solution as well as a problem in regard to the protection of large-scale data. “Consumers found they were resting on a cloud that didn’t always have their privacy in mind,” he said, “when many corporations had a business model that pushed against that privacy.” Meanwhile, corporations learned the clouds they relied on to house their data rested in the hands of third parties that may have different interests.
The foundational aspects and understandings of data, privacy, and reliability inform some of the way modern AI is developed and might speak to concerns and guardrails that may arise in response to the further advancement of the technology. “It’s clear today that artificial intelligence is coming from the way we make sense of the world through data,” Wiggins said. “We derived data value not only from putting it together in tables, but from looking at for example the progression of data and that allows us to build models that not only tell us how the world is to but to predict how things will be in the future.”
About the Author
You May Also Like