It's been said many times that the key to a successful big data strategy is to hire a team of data-savvy individuals, each with a particular skill set, rather than trying to find a single data scientist adept at multiple disciplines, including computer science, mathematics, and domain expertise.
Make sure one of your hires is a data engineer, said Sean Kandel, cofounder and chief technical officer of Trifacta, a big data software startup whose data transformation platform enables less technical users to quickly visualize and analyze large and varied data sets. Trifacta's business partners in the Hadoop space include Cloudera, Hortonworks, and Tableau Software.
In a phone interview with InformationWeek, Kandel explained how the role of data engineer is essential to the success of an organization's big data effort. To understand what a data engineer does, it's important to distinguish the position from that of data scientist.
"In many companies, the data engineer is responsible for setting up systems and processes that other data workers -- including in many cases data scientists -- need to use and rely on to be successful to work with data," said Kandel.
[Can your Excel-wielding staffers slice through data? See Analytics For All, No Data Scientists Needed.]
A lot of the data engineer's work is focused on building out systems, architectures, and platforms.
"The data engineer will look at [ways] to take insights and operationalize them so that you can have day-to-day impacts on your business," Kandel said.
He added: "In a lot of organizations, data engineers are oftentimes responsible for finding data that's relevant for analysis... in a way that's meaningful and suitable for that specific task." In addition, they're in charge of integrating data from a variety of sources.
Data scientists often have engineering backgrounds, too, but their work is generally geared toward discovering new insights or building models. A data scientist sometimes fills the role of data engineer as well, although that approach may not deliver the best ROI.
On a data science team, however, individual roles aren't always set in stone. Team members may perform duties based on their individual skills, background, availability, and other factors.
"A lot of times it's fairly fluid," Kandel said. "You see teams of people working together" rather than performing rigidly defined tasks.
Trifacta's data transformation tools are designed to help simplify the data engineer's job of culling relevant data from a number of different sources, he added.
"A lot of times that's still done today through writing code and scripting languages like Pig, Hive, or Python. Our tools enable data engineers to quickly perform those types of data transformations in a much more visual, graphical interface, while still getting all of the benefits, such as scale, that you'd get by writing code by hand."
Then again, a data scientist may be handling these tasks instead.
"In some organizations you'll see data scientists perform things that might be done [elsewhere] by data engineers," Kandel said. "But when you look across an organization -- at all of the different use cases and how quickly use cases for data are popping up -- it really requires a dedicated team or role that's focused on enabling multiple end users to work with data quickly."
Do you need a deeper leadership bench? Send your most promising leaders to our InformationWeek Leadership Summit, Sept. 30 in New York City, for a day of peer learning and strategic speakers.