Help Wanted: Data Engineers Who Fill Enterprise Need
Data engineers build the infrastructure and tools that data scientists and business users need to perform analysis and create machine learning models. Maybe that's why demand is high for this emerging category of IT pro.
Data science as a career has generated incredible buzz over the past several years as enterprises have sought to reap the rewards of the massive amounts of data available to harvest the insights that will keep them competitive in a new era. But hiring data scientists is just one of many possible solutions to the shortage of talent and the demand for providing the kinds of insights that come from a data science and machine learning practice.
Unable to invest in teams of expensive, hard-to-hire data scientists, many organizations have sought alternative means to get value out of their data and analytics initiatives. For instance, companies have been pursuing self-service analytics -- a way to make the process easy enough for business users to access and consume, even if they don't have PhDs in statistics. Tech companies have also invested in making their analytics and data visualization platforms easier to use so they can be deployed to the masses. Think Tableau or Microsoft's Power BI, for instance.
Amid these strategies is the growth of another job title in enterprise organizations. Data engineers or data science engineers are the technology professionals in charge of the platforms, systems, and infrastructure that enable data analysis, machine learning, and AI. They create the systems used by enterprise workers performing data and analytics tasks -- everyone from data scientists to business analysts. Data engineers may be the wizards behind the curtain who make these complicated systems work.
David Palaitis, senior vice president of technology at the quantitative hedge fund company Two Sigma told InformationWeek in an interview that his data-driven company is reliant on the ability to ingest huge quantities of data. Two or three years ago, Two Sigma hired people to build the models to perform this task. But Palaitis said the company quickly learned that these models, while mathematically sound, just couldn't scale to analyze the massive amounts of data the firm needed to input. Data engineers to the rescue.
"We've seen a specialization in our field of data science engineers who can work closely with the predictive modeler to build systems that can perform optimally in order to compute these predictions and scale quickly in a way that can meet the demands of our business," he said. "The data science engineer is a hybrid role -- someone who can understand the language of machine learning and predictive modeling, but also understands distributed systems and the basics of computer science."
These tech professionals may straddle the IT department and data science departments. They typically have skills from both areas -- a strong knowledge of and experience with IT infrastructure and also a knowledge of development languages used for machine learning, such as Python.
"There's no standard skill set," said Tobi Knaup, CTO of hybrid cloud startup Mesosphere, who spoke with InformationWeek in an interview. Because it's an emerging role, there's also no university program where you can study to be a data engineer.
"I always like to compare the role to what DevOps engineers do for software developers," Knaup said. "They build sophisticated tools to help their internal customers."
Data engineers also must evangelize the tools they create inside the company, according to Knaup. They need to share best practices and provide instructions on things like how to get access to notebooks on a cluster.
Data engineers also play a role at Experian DataLabs, the internal R&D applied research group within the larger credit reporting firm, according to Eric Haller, executive VP and global head of the organization. DataLabs' entry level hires are called data engineers, Haller told InformationWeek.
"That's somebody who is going to learn about how to clean data and how to troubleshoot the data to see if there are any glitches that need to be fixed before we can use it," Haller told InformationWeek. "Over time they get exposure to the modeling side."
Data engineers at DataLabs can pursue a career path to become a data scientist. Or they can stay in the role of data engineer.
For IT pros looking to get into the data analytics or machine learning area of the enterprise, the career path of data engineer offers a great option. The job builds on infrastructure skills they may already possess, and it's a job that looks to be in demand in the near future.
About the Author
You May Also Like