全部版块 我的主页
论坛 计量经济学与统计论坛 五区 计量经济学与统计软件 LATEX论坛
4867 25
2017-09-20

本帖隐藏的内容

By Gregory Piatetsky, KDnuggets.

comments

My recent analysis of KDnuggets Poll results (Python overtakes R, becomes the leader in Data Science, Machine Learning platforms) has gathered a lot of attention and generated a tremendous number of comments, discussion, and inevitable critique from proponents of both languages.

Some have complained that the poll is not scientific and voters represent a self-selected sample. That is obviously true. But KDnuggets has conducted polls since 2001 and reaches a large audience of several hundred thousand visitors each month. In our experience KDnuggets polls have been a good indicator of trends and developments in Data Mining and Data Science. We tracked R vs Python debate for several years, so unlike other sites we can compare the latest poll results with several previous years.

Let's examine other measures of Python vs R popularity among Data Scientists.

First, we analyze Google Trends (this was also done by DSC after the publication of our poll results).

Python is a much more popular language overall, and it is IEEE Spectrum No. 1 language of 2017 (thanks to Martin Skarzynski @marskar for the link), so it is unfair to compare Python and R searches directly, but we can compare Google Trends for search terms "Python data science" vs "R data science".

Here is the chart since Jan 1, 2012. Note that if you select the range that includes full months, and start in 2012, then you get smoothed monthly trends, rather than more chaotic weekly trends.

Fig. 1: Google Trends, Jan 2012 - Aug 2017, "Python data science" vs "R data science".

We note that R was slightly ahead in 2014 and 2015, as Data Science was gathering popularity, but "Python data science" searches moved ahead of "R data science" in late 2016 and are clearly ahead since January 2017.

Note: the statistics are the same regardless of how Data Science is capitalized: "Data Science" or "data science", but Google autocomplete suggests "data science" for both Python and R.

However, recently Machine Learning has become very popular - see my post Machine Learning overtaking Big Data? (May 2017), so let's examine Python vs R for "Machine Learning" in Google Trends.


Fig. 2: Google Trends, Jan 2012 - Aug 2017, "Python Machine Learning", "R Machine Learning", "Python data science", and "R data science".

We see that "Python Machine Learning" is way ahead of "Python data science", and both are significantly ahead of "R data science" and "R Machine Learning".

Relative search volume for Aug 2017 is
  • Python Machine Learning: 100
  • Python data science: 49
  • R data science: 33
  • R Machine Learning: 32
(Note: while Google autocomplete suggests search term "Python data science", with lower-case "data science", it suggests Capitalized search term "Python Machine Learning". There is probably some deep meaning here ... )


Fig. 3: Snapshot of indeed.com Data Scientist job ads in USA that also include Python and/or R, Sep 2017
Next, let's look at job ads on indeed.com. All numbers below are for jobs in USA as of Sep 11, 2017.We represent this relationship in a Venn Diagram on the right.

Indeed job trends below also show that demand for Data Scientists that know Python and those that know R has been very close until very recently, and these represent significant portion of all Data Scientist jobs.

Fig. 4: Indeed "Data Scientist", "Data Scientist" Python, and "Data Scientist" R Job Trends, 2014-2017

These job ad counts suggest that current employers see most Data Scientists as able to use both Python and R as needed, but Python has a small advantage at the moment.

Google trend results suggest that Python advantage will grow and Python-related Data Science and Machine Learning jobs will grow faster than those related to R.

Note: with indeed.com you need to specify the search string carefully, and search for [Data Scientist Python] will include many jobs that have either Data or Scientist but not necessarily both.

Finally, among many comments on my original post Python overtakes R in Data Science I want to highlight two observations:
  • Stanislav Seltser notes that among top 15 languages on the github https://octoverse.github.com, Python is no. 3 while R is not on the list.
  • Stanislav also noted Kaggle 2016 Year Summary which says
    In past years, R was the language of choice on Kaggle, but 2016 has seen Python emerge as a clear winner when it came to the number of kernels written.




二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

全部回复
2017-9-20 08:14:32
oliyiyi 发表于 2017-9-20 08:07
**** 本内容被作者隐藏 ****
都很强大
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

2017-9-20 08:16:21
R的支持者看看什么观点
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

2017-9-20 08:16:39
谢谢分享
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

2017-9-20 08:22:36
Thanks
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

2017-9-20 08:23:56
看看,谢谢!
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

点击查看更多内容…
栏目导航
热门文章
推荐文章

说点什么

分享

扫码加好友,拉您进群
各岗位、行业、专业交流群