Aspiring data scientists and machine learning engineers may be researching the best skills and languages to learn for their future careers. But there are plenty of practitioners already out there in the marketplace, using their favorite tools every day. So of Python, R, or SAS, which is the top tool out there preferred by data scientists and analytics professionals? A new survey from quantitative executive recruiting firm Burtch Works provides some insight.
Released earlier this week, the new survey is the 5th annual report on the preferred language. The survey originally just looked at R, an open source tool, and SAS, a commercial tool from the company of the same name. But in 2016 Burtch Works added another open source tool, Python, to the survey.
This year's survey shows something that hasn't happened before -- there isn't a single clear leader. The 1,196 professionals surveyed split almost exactly evenly three ways with Python at 33%, R at 33%, and SAS at 34%.
"This is the first year that we’ve seen SAS, R, and Python all at the same level of preference," said Linda Burtch, managing director at Burtch Works and a quantitative recruiting expert.
To hit that virtual 3-way tie, R declined slightly from last year and SAS remained relatively flat. Python continued to enjoy an increase over the last two years since it was first included in the survey.
"The most noticeable trend from the 2018 data was Python’s ascension, and how Python’s growing popularity has been eroding support for R," Burtch told InformationWeek. "Data scientists have typically strongly preferred Python, but predictive analytics professionals working primarily with structured data are shifting that way as well."
What accounts for Python's rise? It may be considered the stronger language for machine learning, compared to R, which is considered strong for statistical applications and data visualizations.
The Burtch Works survey breaks down the results into some demographic segments that show each tool's strength with particular groups.
For instance, professionals with 16 or more years of experience preferred SAS at 47% compared to R at 26% and Python at 27%.
For those with 6 to 15 years of experience, the split was close to even again with SAS at 33%, R at 36% and Python at 31%.
Those with 5 or less years of experience were more likely to favor Python at 48% compared to 38% for R and 14% for SAS.
"Open source tools like R and Python are overwhelmingly favored by professionals with 5 or less years' experience," Burtch Works said in the blog post announcing the survey results. Nevertheless, R use by this group has fallen from over 50% in 2016 to just under 40% in this year's survey. Meanwhile Python has grown from just over 20% in 2016 to just under 50% this year.
"Python gained support in almost every category we examined this year and has especially taken hold at the early career level, with professionals who have five or less years of work experience," Burtch told InformationWeek.
Burtch Works also looked at whether particular tools were more popular with professionals at different education levels -- Bachelor's, Master's, and PhDs.
SAS was at the top for those with Bachelor's and Master's degrees at 39% and 37% respectively. Meanwhile, those with PhDs preferred Python at 43% compared to 33% for R and 24% for SAS.
Tool preferences also differed by region. Burtch Works noted that the largest proportion of Python supporters are on the West Coast, and Python also topped the list in the Mountain states and the Northeast. Not surprisingly, SAS is most dominant in the Southeast. (The SAS company headquarters is in North Carolina.) Burtch Works also noted that Python gained support in every region compared to 2017.
When broken down by industry vertical, SAS led in finance and healthcare while Python was the top choice in tech/telecom and consulting. R was the top choice in the retail/consumer products space and in the "other" category that lumps together those who didn't fit into the named verticals.
In terms of data scientists versus predictive analytics pros, data scientists overwhelmingly preferred Python at 69% followed by R at 29%. The split was less dramatic for predictive analytics pros where 40% preferred SAS, followed by 34% for R and 26% for Python.
Again, the most notable trend in this year's study is probably the ascent of Python, particularly among those just entering the industry.
"The three-year trend at the early career level clearly shows that Python is gaining at the expense of R, and we’ve often found that shifts at the early career level can foreshadow changes in the market," Burtch said. "Anecdotally, this lines up with what we’re hearing, that many R users are quickly moving towards Python."
Be sure to check out Burtch Works' full blog post along with data visualizations here.