Consider for a minute whether the best way to collect important data is to mail 125 million (or so) paper forms, often to "Current Occupant," and to then follow up with humans carrying clipboards and ringing doorbells. You probably would conclude that it's a lot of work and a process likely to result in the collection of incomplete or inaccurate data.
Then, you'll update that data only every 10 years: Lots can change in 10 years. Yet, you will use the collected data to determine things like how your congressional representatives will be elected, how federal funds are allocated to local schools, even where new roads will be built and public transportation offered.
Is there a better way to do the US Census than how it has been done for 228 years?
A group of university researchers believes that the data gathered and analyzed by the US Census Bureau can be found in existing sources without sending any forms or people out into the field. Actually, the researchers argue that the government can collect much more data and more timely data using sources like tax returns, state websites, even Google search data.
"The costs of a census are pretty large, $17.5 billion. That's based on these paper forms. That's really the driver behind our research," says Murray Jennex, a professor focused on knowledge management at San Diego State University. "The Census Bureau has spent a lot of money for technology to analyze data, but very little on collecting data," he added during a recent interview.
[Estimates are based on past census projects. The Census Bureau says that the 2020 census will make better use of the Internet and cost less.]
Jennex was part of the team that included San Diego State professors James Kelly (lead author), Kaveh Abhari and Eric Frost, along with Alexandra Durcikova of the University of Oklahoma. Together, they authored a research paper titled, "Data in the Wild: A KM Approach to doing a Census Without Asking Anyone and the Issue of Privacy." That paper will be presented in January at the Hawaii International Conference on System Sciences.
While the cost of paper census surveys -- including the one scheduled for 2020 -- is a key consideration in the team's research, there are several other major factors.
One such consideration is the growing abundance of data in the public sphere, such as that collected by many federal -- the Internal Revenue Service, Department of Education, Department of Labor for example -- state and municipal agencies, and academic research organizations. Add in the trend data that can be gleaned from search engines such as Google, public utility records, and commercial data services such as the major consumer credit bureaus. Together they represent a wealth of data, highlighting how many people live where, areas where poverty is most challenging, ethnic trends, and the need for elderly, healthcare, and educational support.
In addition, that data can be updated and analyzed in what Jennex calls "not quite real time." "The data we would be using could be refreshed every year, and could be used to guide public policies," he said.
The limiting factor, however, is that of privacy, how the Census Bureau could protect personally identifiable information (PII). Jennex notes that data can be anonymized by stripping off PII, which would be effective protection when the data analysis covers large areas, even five-digit ZIP codes. But it might not take a lot of work for someone to identify unique individuals or families at a neighborhood level, particularly those who stand out in the neighborhood by income, size of household, or ethnic background.
So, protections would have to be put in place.
Another hurdle that the researchers acknowledge is that "government is actually very bad at sharing data." For decades, government agencies have tended to keep their data siloed, despite attempts by some government leaders to move to an open data approach. Jennex cited the IRS as a particularly rich data source, not only for basic financial data but also for insight into household size, health issues, employment trends, and even transportation planning as more Americans work out of home offices.
Existing data, such as that from the IRS, actually can be more accurate than that currently collected through census forms -- known as the American Community Survey. In their paper the researchers cited how "household income" can be misleading, depending on whether household members are married or unrelated. Also, the income questions focus on what someone made in a single year, not factoring in that the individual year's earnings were significantly higher or lower than what they earn in a more typical year.
However, don't expect the paper questionnaire to go away in the year and a half before you expect to find one in your mail. The changes that the researchers suggest are much further down the road.
[Editor's note: After this article was posted the Census Bureau provided these estimates for the cost of the upcoming 2020 census. Life cycle cost of the 2010 Census - $10.2 billion. Cost of technology and IT in 2010 - roughly 25% of the $10.2 billion. Number of questionnaires mailed out - 142,353,933.]Jim Connolly is a versatile and experienced technology journalist who has reported on IT trends for more than two decades. As editorial director of InformationWeek and Network Computing, he oversees the day-to-day planning and editing on the site. Most recently he was editor ... View Full Bio