Citizen Data Scientists: 7 Ways To Harness Talent
A new role is emerging to deal with the ongoing shortage of data scientists. Learn more about these new power users and find out how organizations can cultivate more of them.
![](https://eu-images.contentstack.com/v3/assets/blt69509c9116440be8/blt475b40c038a49b16/64cb55d79e1e49c92757c974/1-gangster-539993_1280.jpg?width=700&auto=webp&quality=80&disable=upscale)
The worldwide shortage of data scientists won't end anytime soon. To try to compensate for the shortage, data discovery solutions are automating tasks that have traditionally been done manually by a data scientist, statistician, or other analytics expert. The confluence of trends is giving rise to a new role that Gartner calls a "citizen data scientist."
A recent Gartner report defines a citizen data scientist as "a person who creates or generates models that leverage predictive or prescriptive analytics but whose primary job function is outside of the field of statistics and analytics." It could be a line-of-business role, a business analyst, or a member of the business intelligence or IT team. The defining trait is that statistics and analytics are secondary in the role.
Not everyone in an organization will become a citizen data scientist -- at least by Gartner's definition. By that standard citizen data scientists are power users. The new role does not threaten those of data scientists, data analysts, or business analysts; it complements them. And in fact, citizen data scientists necessarily have to work with other roles to derive the most value from analytics.
Like anyone else in an organization, citizen data scientists need the right technology to do their jobs. In this case, that's one of data-discovery offerings that automate parts of complex processes such as data preparation and pattern identification.
[Read about the challenges of data misinterpretation.]
As advanced analytics capabilities become available to more people, companies will have to ensure they have the governance in place to make it work, which includes software enforcement of governance policies. According to Gartner, by 2018 the multiple styles of data discovery available today -- smart, governed, Hadoop-based, search-based, visual-based, and graph-based -- will converge as their unique capabilities become requirements. And the convergence is already under way.
Click through the following pages to learn seven ways companies can prepare for the coming wave of citizen data scientists.
Organizations are becoming more agile. To accomplish this, they need to automate certain tasks and processes that have historically been done manually. With automation, organizations are able to achieve dramatically higher levels of scale and speed. And, in the case of data-discovery platforms and solutions, the automation can lead to insights that might not have been uncovered otherwise.
"We're seeing the beginning of automation in each of the components. As we discussed in the report, there are a number of specialist vendors focusing on the data preparation, automating the pattern detection, and [enabling the use of] natural language. Some others are beginning to offer pieces of all of them," Rita Sallam, a research VP at Gartner, said in an interview. "I think we have the beginning of a next-generation set of capabilities."
Automation does not completely remove humans from the equation, however. It speeds and simplifies what has historically been time-consuming and difficult.
"I don't think you'll ever fully automate the job of an analyst, but what you'll be able to do is automate enough on the data preparation side of things [to improve] the time, cost, and accuracy of preparing your data, which is a big problem for data discovery," Sallam said. "As we can automate finding patterns more, we'll reduce the time it takes to build a production-grade model, so, to the extent we can automate some of the exploration, data scientists can find things that are significant."
Universities are expanding their data science and analytics programs to better align with what's happening in the real world. They're offering new executive education programs, MBA classes, and undergraduate classes aimed at people whose career focus is not primarily data science, machine learning, or statistics. The primary goal is to educate business leaders and line-of-business managers so they can use data and work with data science teams more effectively.
"I'm not sure [citizen data scientists] need to take masters-level classes in statistics and machine learning. In fact, on the contrary, they may just need to be trained in sort of basic statistics," says Gartner research VP Rita Sallam. "It's more of training on how you would work with a data scientist to make sure that we're not misinterpreting findings in the data. That is really part of a broader process to automate the exploration phase and push forward those hypotheses that truly need further investigation by a specialist."
Instead of just investing in two-week training by a vendor, which may also be wise, citizen data scientists' skill sets need to be enhanced on an ongoing basis to get the most value out of the tools and approaches their companies have adopted -- which continue to evolve. As Sallam notes, they must learn to work cooperatively with the data science team.
"As companies shift to more of the self-service model for analytics, whether it be visual-based discovery or citizen data science, the need for training and agile training methods increases. The idea for BI in the past, where we had a competence center centralized in IT, and IT was building content, [it was probably OK to give] those developers vendor training and then 'off you go.' Now … we're trying to enable more regular users to incorporate analytics as part of their job, meaning their primary job is not analytics, but they use analytics as a tool."
According to McKinsey, there is a shortage of 1.5 million managers and analysts with the skills to understand and make decisions based on the analysis of big data. Training is not going to solve the problem, according to BeyondCore CEO Arijit Sengupta.
"You need the technology to be so simple that everyone can use it," says Sengupta. "Analytics is so important to society, it can't be something that's the domain of experts."
The citizen data scientist role will gain traction over time because the role is new, and not everyone is (or cares to become) a power user. Gartner recommends a slow approach.
"I think companies have to start small. It makes sense for them, and they see how this would fit in the continuum of analytics that they provide, from basic discovery to data science," Rita Sallam explained. "We absolutely recommend using that as an opportunity to build trust in what is essentially a black box, because that's often what companies and people in general have an issue with. I think you're going to have to pilot, you're going to have to try. You're going to have missteps, and it will take time to build trust in this kind of capability."
The good news is that the results of a black box calculation usually can be exported by a more advanced user to another tool for verification.
Effective data governance is essential. As more users access data and use analytics, organizations need to ensure they have appropriate governance in place that is enforced by software.
"As more business users gain the ability to run calculations and discover findings themselves, you run the risk of people using the same data and coming up with different results," according to Sallam. "You want to allow more sophisticated users to go in and make sure [the results are trustworthy], but then there are governance processes on the data side [that define] who can access what data, who can promote models, and who can share models without having them be checked by a more sophisticated data scientist."
Sallam says a lot of companies have an internal certification program that provides training on processes and tools as well as the rules for usage.
"We usually suggest creating an internal program that business users are required to take in exchange for access to the tool. Companies are struggling with governance as more people in the company use advanced analytics. As we're shifting, as we're going through this major shift from analytics being IT-centric to business-centric, governance is the biggest challenge."
There is often resistance to governance among line-of-business staff members who believe fast access to data will be impeded by governance. They have to understand the need for governance and why it is important. Enforcing governance through software is an effective way -- but not the only way -- to improve compliance.
"Every analysis, every step of the analysis, every interaction with it should be logged and stored," said BeyondCore CEO Arijit Sengupta. "[That way, if another] party asks if you included a variable, you should be able to see what was included and rerun the analysis. Governance has to be enforced by the software, not by the people running around trying to do things."
Citizen data scientists can't be effective if they're working in a vacuum. The same is true of data scientists and business analysts. In fact, the three roles have to work together.
"I think that citizen data scientists will sit in between data scientists and business analysts. Business analysts probably have more domain expertise than they have data science expertise, but they nonetheless need and can benefit from the insights advanced analytics can provide," said Gartner's Sallam.
"I think we're likely going to see some portion of analysts evolving into citizen data scientists as they gain more knowledge and basic skills around statistics and advanced analytics concepts, but I still think we're always going to have a need for a data scientist, because ultimately, once we explore data and have a hypothesis, the data scientist can focus in on a more targeted set of findings that need to be further explored or operationalized. [Unlike a data scientist,] the citizen data scientist isn't going to have the skill set to build a production-grade model, to maintain it, and to operationalize it."
The emergence of any new role requires the support of the organization, whether by means of a chief data officer or a citizen data scientist. The wisest way to start is to have a small success and build upon it, because that allows the organization -- and those who might become citizen data scientists -- to understand the role and what the role requires to be successful.
"We definitely recommend looking internally. See if you can find business analysts who are interested in evolving their skills," said Sallam. "I would start there because [business analysts] know the business, and many of them have the interest and aptitude to learn more advanced skill sets."
The emergence of any new role requires the support of the organization, whether by means of a chief data officer or a citizen data scientist. The wisest way to start is to have a small success and build upon it, because that allows the organization -- and those who might become citizen data scientists -- to understand the role and what the role requires to be successful.
"We definitely recommend looking internally. See if you can find business analysts who are interested in evolving their skills," said Sallam. "I would start there because [business analysts] know the business, and many of them have the interest and aptitude to learn more advanced skill sets."
-
About the Author(s)
You May Also Like