Data scientist positions are expected to grow 19% over the next decade, according to a recent report from Burning Glass, which collects and analyzes millions of job postings from across the country. With data increasingly becoming the lifeblood of all organizations, data scientists need to be equipped not only with the right technical skills, but a robust dose of business acumen as well.
Machine Learning/Neural Networks
In 2021, machine learning methods like transfer learning and transformers are drawing a lot of attention because they are rapidly driving innovation in a number of different spaces. For building and training neural networks, PyTorch has a lot of momentum behind it, and Keras and TensorFlow are also commonly used.
There is also a rich ecosystem of software libraries, many open source, that can help accelerate machine learning and data science applications.
“Data scientists can make themselves attractive by demonstrating deep intuition into why and how machine learning algorithms work, which is important for working through challenges that inevitably arise during training and testing,” said Matthew Silver, senior director of data science at Vectra, an AI threat detection and response specialist company. “ONNX, a neural networks standard that facilitates platform, library, and language independent model deployment, helped us streamline our use of AI in production and accelerate our modeling work.”
“Data scientists who are able to walk on the job and start using common software libraries to build models right away are the most competitive, and strong software development skills are a plus in almost all cases,” Silver said.
Data scientists with an understanding of cloud engineering principles and cloud infrastructure are attractive to many employers. That means getting comfortable with one of the big three public cloud providers -- Microsoft, Amazon Web Services, or Google. Each offers a comprehensive set of tools for data scientists for data extraction, data cleansing, visualization, and machine learning purposes.
“I personally look for data scientists familiar with cloud infrastructure, CI/CD pipelines, and automation,” said Phillip Gates-Idem, chief architect at JupiterOne, a provider of cyber asset management and governance solutions. “Data scientists need to have a firm understanding of how to build and utilize tools with cloud infrastructure.”
Statistics, a field of mathematics which seeks to collect and interpret quantitative data using models and representations for a given set of data, is at the core of data science and includes concepts like probability, variability, regression and central tendency.
“If you don’t have an in-depth knowledge of statistics -- the heart of data science -- and how to apply sound mathematical reasoning to the problems you’re working on, then I don’t care how many platforms or languages you can list on your resume,” said Lars Kemmann, principal architect at IT consulting firm Netrix. “I think that’s a challenge in the industry right now -- we get lots of resumes from people who haven’t done the hard work to internalize the scientific method.”
Because data science projects can involve long exploration phases, as well as multiple unknowns even late into the game, project management is another key skill for data scientists to have. Adopting an agile methodology, for example, allows data scientists to prioritize and create roadmaps based on requirements and goals.
“It’s often very difficult to predict how long it will take to develop and train a machine learning model, and businesses waiting on updated models or results will often have timelines and planning that suffer due to this unpredictability,” Silver explained. “Data scientists who are able to take ownership over major modeling efforts by understanding limitations from the outset, conveying project status as efforts progress, and predicting when they’ll be able to offer the next meaningful readout, play an important role in our team.”
While the organization’s data may hold remarkable amounts of potential value, no value can be created unless you can uncover those insights and then translate them into actions or business outcomes. Plotly, Tableau, and D3 are among the top data science visualization and storytelling tools in demand today.
“When your client doesn’t understand what you are doing, it’s easy for them to undervalue the work you are putting in, especially in the data prep phase,” Kemmann said. “Clearly explaining the process and the benefits of each step, in a language that your audience can relate to, and supported where possible by appropriate data visualizations, is a key part of your role.”
Data scientists now have more opportunities than ever before to be “hands on” with the data, but that requires a strong understanding of business objectives and the ability to communicate tech jargon clearly. The data scientists that can translate the data into useful terms are the people that are going to be able to add that extra value.
“Being able to translate that data into clean, digestible business information is going to be a huge skill, and data scientists don’t always have those soft skills, or the experience of sitting in a room of executives and be able to clarify their decision-making process,” said Joshua Drew, regional manager at IT staffing firm Robert Half Technology.