knitr Elegant, flexible and fast dynamic report generation with R

oliyiyi

1098

收藏 2016-06-18

Overview

[size=1.2em]The knitr package was designed to be a transparent engine for dynamic report generation with R, solve some long-standing problems in Sweave, and combine features in other add-on packages into one package (knitr ≈ Sweave + cacheSweave + pgfSweave + weaver +animation::saveLatex + R2HTML::RweaveHTML + highlight::HighlightWeaveLatex + 0.2 * brew + 0.1 * SweaveListingUtils + more).

[size=1.2em]

Transparency means that the user has full access to every piece of the input and output, e.g., 1 + 2 produces [1] 3 in an R terminal, andknitr can let the user decide whether to put 1 + 2 between\begin{verbatim} and \end{verbatim}, or <div class="rsource"> and</div>, and put [1] 3 in \begin{Routput} and \end{Routput}; see thehooks page for details
knitr tries to be consistent with users’ expections by running R code as if it were pasted in an R terminal, e.g., qplot(x, y) directly produces the plot (no need to print() it), and all the plots in a code chunk will be written to the output by default
Packages like pgfSweave and cacheSweave have added useful features to Sweave (high-quality tikz graphics and cache), and knitrhas simplified the implementations
The design of knitr allows any input languages (e.g. R, Python and awk) and any output markup languages (e.g. LaTeX, HTML, Markdown, AsciiDoc, and reStructuredText)

[size=1.2em]This package is developed on GitHub; for installation instructions and FAQ’s, see README. This website serves as the full documentation of knitr, and you can find the main manual, thegraphics manual and other demos / examples here. For a more organized reference, see theknitr book.

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

全部回复

oliyiyi

2016-6-18 15:16:22

Motivation
One of the difficulties with extending Sweave is we have to copy a large amount of code from the utils package (the file SweaveDrivers.R has more than 700 lines of R code), and this is what the two packages mentioned above have done. Once the code is copied, the package authors have to pay close attention to what is changing in the version in official R – apparently an extra burden. The knitr package tried to modularize the whole process of weaving a document into small manageable functions, so it is hopefully easier to maintain and extend (e.g. easy to support HTML output); on the other hand, knitr has many built-in features and it should not be the case to have to hack at the core components of this package. By the way, several FAQ’s in the Sweave manual are solved in knitr directly.

Let us change our traditional attitude to the construction of programs: Instead of imagining that our main task is to instruct a computer what to do, let us concentrate rather on explaining to humans what we want the computer to do.

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

oliyiyi

2016-6-18 15:16:43

Features
The ideas are borrowed from other packages, and some of them are re-implemented in a different way (like cache). A selected list of features include:

faithful output: using evaluate as the backend to evaluate R code, knitr writes everything that you see in an R terminal into the output by default, including printed results, plots and even warnings, messages as well as errors (they should not be ignored in serious computations, especially warnings)
a minor issue is that for grid-based graphics packages like ggplot2 or lattice, users often forget to print() the plot objects, because they can get the output in an R terminal without really print()ing; in knitr, what you get is what you expected
built-in cache: ideas like cacheSweave but knitr directly uses base R functions to fulfill cache and lazy loading, and another significant difference is that a cached chunk can still have output (in cacheSweave, cached chunks no longer have any output, even you explicitly print() an object; knitr actually caches the chunk output as well)
formatting R code: the formatR package is used to reformat R code automatically (wrap long lines, add spaces and indent, etc), without sacrificing comments as keep.source=FALSE does
more than 20 graphics devices are directly supported: with dev='CairoPNG' in the chunk options, you can switch to the CairoPNG() device in Cairo in a second; with dev='tikz', the tikz() device in tikzDevice is used; Could anything be easier than that? These built-in devices (strictly speaking, wrappers) use inches as units, even for bitmap devices (pixels are converted to inches by the option dpi, which defaults to 72)
even more flexibility on graphics:
width and height in the output document of plots can be additionally specified (the fig.width option is for the graphics device, and out.width is for the output document; think out.width='.8\\textwidth')
locations of plots can be rearranged: they can either appear exactly in the place where they are created, or go to the end of a chunk together (option fig.show='hold')
multiple plots per code chunk are recorded, unless you really want to keep the last plot only (option fig.keep='last')
R code not only can come from code chunks in the input document, but also may be from an external R script, which makes it easier to run the code as you write the document (this will especially benefit LyX)
for power users, further customization is still possible:
the regular expressions to parse R code can be defined, i.e., you do not have to use <<>>= and @ or \Sexpr{}; if you like, you can use any patterns, e.g., %% begin.rcode and %% end.rcode
hooks can be defined to control the output; e.g. you may want to put errors in red bold texts, or you want the source code to be italic, etc; hooks can also be defined to be executed before or after a code chunk, and there are infinite possibilities to extend the power of this package by hooks (e.g. animations, rgl 3D plots, …)
Lots of efforts have been made to producing beautiful output and enhancing readability by default. For example, code chunks are highlighted and put in a shaded environment in LaTeX with a very light gray background (the framed package), so they can stand out a little bit from other texts. The reading experience is hopefully better than the verbatim or Verbatim environments. The leading characters > and + (called prompts) in the output are not added by default (you can bring them back by prompt=TRUE, though). I find them really annoying in the output when I read the output document, because it is so very inconvenient to copy and run the code which is messed up by these characters.

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

oliyiyi

2016-6-18 15:17:19

Objects
Objects to manipulate options, patterns and hooks

The knitr package uses a special object to control options and settings (denoted as obj below); it has the following methods:

obj$get(name): returns an option named name or a list of several options if name is a character vector of length greater than 1, and it returns all the options if name not provided
obj$set(...): permanently changes options; the argument ... can be of the form tag = value or a list of options list(opt1 = value1, opt2 = value2)
obj$merge(values): temporarily merges a list of new options into the current list and returns the merged list (original list not changed)
obj$restore(): restores the object
These objects are visible to users in knitr:

opts_chunk and opts_current: manages options for code chunks
opts_knit: manages options for the knitr package
knit_hooks: manages hook functions
knit_patterns: manages regular expressions to extract R code from the input document
knit_engines: functions to deal with other languages
Except knit_patterns, all other objects are initialized with default values, and knit_patterns will be automatically determined according to the type of input document if not provided. The knit_hooks object is supposed to be used most frequently, and the other three are usually not to be used directly. For example, opts_chunk is usually set in the input document rather than using the command line directly.

Knitr’s settings must be set in a chunk before any chunks which rely on those settings to be active. It is recommended to create a knit configuration chunk as the first chunk in a script with cache = FALSE and include = FALSE options set. This chunk must not contain any commands which expect the settings in the configuration chunk to be in effect at the time of execution. The configuration chunk could look something like this:

<<setup, cache=FALSE, include=FALSE>>=
library(knitr)
opts_knit$set(upload.fun = imgur_upload, self.contained = FALSE,
root.dir = '~/R/project')
@
On a technical note, these objects are similar to closures – they consist of a list of functions returned by a function. For details, see the unexported function knitr:::new_defaults. The chunk options are also managed by closures.

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

oliyiyi

2016-6-18 15:17:41

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

分享

扫码加好友，拉您进群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群