The top 10 algorithms and their share of voters are:
Fig. 1: Top 10 algorithms used by Data Scientists.
See full table of all algorithms at the end of the post.
The average respondent used 8.1 algorithms, a big increase vs a similar poll in 2011.
Comparing with 2011 Poll Algorithms for data analysis / data miningwe note that the top methods are still Regression, Clustering, Decision Trees/Rules, and Visualization. The biggest relative increases, measured by (pct2016 /pct2011 - 1) are for
Boosting, up 40% to 32.8% share in 2016 from 23.5% share in 2011
Text Mining, up 30% to 35.9% from 27.7%
Visualization, up 27% to 48.7% from 38.3%
Time series/Sequence analysis, up 25% to 37.0% from 29.6%
Anomaly/Deviation detection, up 19% to 19.5% from 16.4%
Ensemble methods, up 19% to 33.6% from 28.3%
SVM, up 18% to 33.6% from 28.6%
Regression, up 16% to 67.1% from 57.9%
Most popular among new options added in 2016 are
K-nearest neighbors, 46% share
PCA, 43%
Random Forests, 38%
Optimization, 24%
Neural networks - Deep Learning, 19%
Singular Value Decomposition, 16%
The biggest declines are for
Association rules, down 47% to 15.3% from 28.6%
Uplift modeling, down 36% to 3.1% from 4.8% (that is a surprise, given strong results published)
Factor Analysis, down 24% to 14.2% from 18.6%
Survival Analysis, down 15% to 7.9% from 9.3%
The following table shows usage of different algorithms types: Supervised, Unsupervised, Meta, and other by Employment type. We excluded NA (4.5%) and Other (3%) employment types.
Table 1: Algorithm usage by Employment Type
Employment Type
% Voters
Avg Num Algorithms Used
% Used Super-
vised
% Used Unsuper-
vised
% Used Meta
% Used Other Methods
Industry
59%
8.4
94%
81%
55%
83%
Government/Non-profit
4.1%
9.5
91%
89%
49%
89%
Student
16%
8.1
94%
76%
47%
77%
Academia
12%
7.2
95%
81%
44%
77%
All
8.3
94%
82%
48%
81%
We note that almost everyone uses supervised learning algorithms.
Government and Industry Data Scientists used more different types of algorithms than students or academic researchers,
and Industry Data Scientists were more likely to use Meta-algorithms.
Next, we analyzed the usage of top 10 algorithms + Deep Learning by employment type.
Table 2: Top 10 Algorithms + Deep Learning usage by Employment Type
Algorithm
Industry
Government/Non-profit
Academia
Student
All
Regression
71%
63%
51%
64%
67%
Clustering
58%
63%
51%
58%
57%
Decision
59%
63%
38%
57%
55%
Visualization
55%
71%
28%
47%
49%
K-NN
46%
54%
48%
47%
46%
PCA
43%
57%
48%
40%
43%
Statistics
47%
49%
37%
36%
43%
Random Forests
40%
40%
29%
36%
38%
Time series
42%
54%
26%
24%
37%
Text Mining
36%
40%
33%
38%
36%
Deep Learning
18%
9%
24%
19%
19%
To make the differences easier to see, we compute the algorithm bias for a particular employment type relative to average algorithm usage as Bias(Alg,Type)=Usage(Alg,Type)/Usage(Alg,All) - 1.