(This article was first published on Jan Gorecki - R, and kindly contributed to R-bloggers)
I was recently browsing stackoverflow.com (often called SO) for the most voted questions under R tag.
To my surprise, many questions on the first page were quite well addressed with the data.table package. I found a few other questions that could benefit from a data.table answer, therefore went ahead and answered them.
In this post, I’d like to summarise them along with benchmarks (where possible) and my comments if any.
Many answers under highly voted questions seem to have been posted a while back. data.table is quite actively developed and has had tons of improvements (in terms of speed and memory usage) over the recent years. It might therefore be entirely possible that some of those answers will have even better performance by now.
50 highest voted questions under R tagHere’s the list of top 50 questions. I’ve marked those for which a data.table answer is available (which is usually quite performant).
I
Number of votes
Question titleUse data.table solution
1
1153
How to make a great R reproducible example?
2
621
How to sort a dataframe by column(s)?TRUE
3
496
R Grouping functions: sapply vs. lapply vs. apply. vs. tapplTRUE
4
429
How can we make xkcd style graphs?
5
396
How to join (merge) data frames (inner, outer, left, right)?TRUE
6
330
What statistics should a programmer (or computer scientist)
7
314
Drop columns in R data frameTRUE
8
290
Tricks to manage the available memory in an R session
9
280
Remove rows with NAs in data.frameTRUE
10
279
Quickly reading very large tables as dataframes in RTRUE
11
263
How to properly document S4 class slots using Roxygen2?
12
250
Assignment operators in R: '=' and '<-'
13
236
Drop factor levels in a subsetted data frameTRUE
14
234
Plot two graphs in same plot in R
15
225
What is the difference between require() and library()?
16
221
data.table vs dplyr: can one do something well the other can
17
216
In R, why is [ better than subset?
18
212
R function for testing if a vector contains a given element
19
201
Expert R users, what's in your .Rprofile?
20
197
R list to data frameTRUE
21
197
Rotating and spacing axis labels in ggplot2
22
197
How to Correctly Use Lists in R?
23
192
How to convert a factor to an integernumeric without a loss
24
184
How can I read command line parameters from an R script?
25
184
How to unload a package without restarting R?
26
182
Tools for making latex tables in R
27
181
In R, what is the difference between the [] and [[]] notatio
28
180
How can I view the source code for a function?
29
171
Cluster analysis in R: determine the optimal number of clust
30
170
How do I install an R package from source?
31
162
How do I replace NA values with zeros in R?
32
152
Counting the number of elements with the values of x in a ve
33
152
Write lines of text to a file in R
34
151
Standard library function in R for finding the mode?
35
150
How to trim leading and trailing whitespace in R?
36
143
How to save a plot as image on the disk?
37
139
Most underused data visualization
38
137
Convert data.frame columns from factors to charactersTRUE
39
136
How to find the length of a string in R?
40
134
Workflow for statistical analysis and report writing
41
132
Create an empty data.frame
42
130
adding leading zeros using R
43
129
Check existence of directory and create if doesn't exist
44
127
Run R script from command line
45
125
Changing column names of a data frame in RTRUE
46
120
How to set limits for axes in ggplot2 R plots?
47
114
How to find out which package version is loaded in R?
48
112
How to plot two histograms together in R?
49
112
How can 2 strings be concatenated in R
50
112
How to organize large R programs?
Below are the chosen answers where data.table can be applied. Each one supplied with the usage and timing copied from the linked answer. Click on the question title to view SO question or follow the answer link for a reproducible example and benchmark details.