全部版块 我的主页
论坛 数据科学与人工智能 数据分析与数据科学 R语言论坛
16155 13
2012-04-07
来自 Google 的 R 语言编码风格指南R 语言是一门主要用于统计计算和绘图的高级编程语言. 这份 R 语言编码风格指南旨在让我们的 R 代码更容易阅读、分享和检查. 以下规则系与 Google 的 R 用户群体协同设计而成.


  • 表示和命名

    • 文件命名
      文件名应以 .R (大写) 结尾, 文件名本身要有意义.
      正例: predict_ad_revenue.R
      反例: foo.R
    • 标识符命名
      在标识符中不要使用下划线 ( _ ) 或连字符 ( - ). 标识符应根据如下惯例命名. 变量名应使用点 (.) 分隔所有的小写字母或单词; 函数名首字母大写, 不用点分隔 (所含单词首字母大写); 常数命名规则同函数, 但需使用一个 k 开头.
      • variable.name
        正例: avg.clicks
        反例: avg_Clicks , avgClicks
      • FunctionName
        正例: CalculateAvgClicks
        反例: calculate_avg_clicks , calculateAvgClicks
        函数命名应为动词或动词性短语.
        例外: 当创建一个含类 (class) 属性的对象时, 函数名 (也是constructor) 和类名 (class) 应当匹配 (例如, lm).
      • kConstantName
  • 语法

    • 单行长度
      最大单行长度为 80 个字符.
    • 缩进
      使用两个空格来缩进代码. 永远不要使用制表符或混合使用二者.
      例外: 当括号内发生折行时, 所折行与括号内的第一个字符对齐.

    • 空白
      在所有二元操作符 (=, +, -, <-, 等等) 的两侧加上空格.
      例外: 在函数调用中传递参数时 = 两边的空格可加可不加.
      不可在逗号前加空格, 逗号后总须加空格.

      正例:
      tabPrior <- table(df[df$daysFromOpt < 0, "campaignid"])total <- sum(x[, 1])total <- sum(x[1, ])
      反例:
      tabPrior <- table(df[df$daysFromOpt<0, "campaignid"])  # 在 '<' 两侧需要增加空格tabPrior <- table(df[df$daysFromOpt < 0,"campaignid"])  # 逗号后需要一个空格tabPrior<- table(df[df$daysFromOpt < 0, "campaignid"])  # 在 <- 前需要一个空格tabPrior<-table(df[df$daysFromOpt < 0, "campaignid"])  # 在 <- 两侧需要增加空格total <- sum(x[,1])  # 逗号后需要一个空格total <- sum(x[ ,1])  # 逗号后需要一个空格, 而非逗号之前
      在前括号前加一个空格, 函数调用时除外.
      正例:
      if (debug)
      反例:
      if(debug)
      多加空格 (即, 在行内使用多于一个空格) 也是可以的, 如果这样做能够改善等号或箭头 (<-) 的对齐效果.
      plot(x    = xCoord,     y    = dataMat[, makeColName(metric, ptiles[1], "roiOpt")],     ylim = ylim,     xlab = "dates",     ylab = metric,     main = (paste(metric, " for 3 samples ", sep="")))不要向圆括号或方括号中的代码两侧加入空格.
      例外: 逗号后总须加空格.
      正例:
      if (debug)x[1, ]
      反例:
      if ( debug )  # debug 的两边不要加空格x[1,]  # 需要在逗号后加一个空格
    • 花括号
      前括号永远不应该独占一行; 后括号应当总是独占一行. 您可以在代码块只含单个语句时省略花括号; 但在处理这类单个语句时, 您必须 前后一致地要么全部使用花括号, 或者全部不用花括号.
      if (is.null(ylim)) {  ylim <- c(0, 0.06)}或 (不可混用)
      if (is.null(ylim))  ylim <- c(0, 0.06)总在新起的一行开始书写代码块的主体.
      反例:
      if (is.null(ylim)) ylim <- c(0, 0.06)
      if (is.null(ylim)) {ylim <- c(0, 0.06)}
    • 赋值
      使用 <- 进行赋值, 不用 = 赋值.
      正例:
      x <- 5
      反例:
      x = 5
    • 分号
      不要以分号结束一行, 也不要利用分号在同一行放多于一个命令. (分号是毫无必要的, 并且为了与其他Google编码风格指南保持一致, 此处同样略去.)
  • 代码组织

    • 总体布局和顺序
      如果所有人都以相同顺序安排代码内容, 我们就可以更加轻松快速地阅读并理解他人的脚本了.
      • 版权声明注释
      • 作者信息注释
      • 文件描述注释, 包括程序的用途, 输入和输出
      • source() 和 library() 语句
      • 函数定义
      • 要执行的语句, 如果有的话 (例如, print, plot)
      单元测试应在另一个名为 原始的文件名_unittest.R 的独立文件中进行.

    • 注释准则
      注释您的代码. 整行注释应以 # 后接一个空格开始.
      行内短注释应在代码后接两个空格, #, 再接一个空格.
      # Create histogram of frequency of campaigns by pct budget spent.hist(df$pctSpent,     breaks = "scott",  # method for choosing number of buckets     main   = "Histogram: fraction budget spent by campaignid",     xlab   = "Fraction of budget spent",     ylab   = "Frequency (count of campaignids)")
    • 函数的定义和调用
      函数定义应首先列出无默认值的参数, 然后再列出有默认值的参数.
      函数定义和函数调用中, 允许每行写多个参数; 折行只允许在赋值语句外进行.
      正例:
      PredictCTR <- function(query, property, numDays,                       showPlot = TRUE)反例:PredictCTR <- function(query, property, numDays, showPlot =                       TRUE)理想情况下, 单元测试应该充当函数调用的样例 (对于包中的程序来说).
    • 函数文档
      函数在定义行下方都应当紧接一个注释区. 这些注释应当由如下内容组成: 此函数的一句话描述; 此函数的参数列表, 用 Args: 表示, 对每个参数的描述 (包括数据类型); 以及对于返回值的描述, 以 Returns: 表示. 这些注释应当描述得足够充分, 这样调用者无须阅读函数中的任何代码即可使用此函数.
    • 示例函数
      CalculateSampleCovariance <- function(x, y, verbose = TRUE) {  # Computes the sample covariance between two vectors.  #  # Args:  #   x: One of two vectors whose sample covariance is to be calculated.  #   y: The other vector. x and y must have the same length, greater than one,  #      with no missing values.  #   verbose: If TRUE, prints sample covariance; if not, not. Default is TRUE.  #  # Returns:  #   The sample covariance between x and y.  n <- length(x)  # Error handling  if (n <= 1 || n != length(y)) {    stop("Arguments x and y have invalid lengths: ",         length(x), " and ", length(y), ".")  }  if (TRUE %in% is.na(x) || TRUE %in% is.na(y)) {    stop(" Arguments x and y must not have missing values.")  }  covariance <- var(x, y)  if (verbose)    cat("Covariance = ", round(covariance, 4), ".\n", sep = "")  return(covariance)}
    • TODO 书写风格
      编码时通篇使用一种一致的风格来书写 TODO.
      TODO(您的用户名): 所要采取行动的明确描述
  • 语言

    • Attach
      使用 attach 造成错误的可能数不胜数. 避免使用它.
    • 函数
      错误 (error) 应当使用 stop() 抛出.
    • 对象和方法
      S 语言中有两套面向对象系统, S3 和 S4, 在 R 中这两套均可使用. S3 方法的可交互性更强, 更加灵活, 反之, S4 方法更加正式和严格. (对这两套系统的说明, 参见 Thomas Lumley 的文章 "Programmer's Niche: A Simple Class, in S3 and S4", 发表于 R News 4/1, 2004, 33 - 36 页:http://cran.r-project.org/doc/Rnews/Rnews_2004-1.pdf.)
      这里推荐使用 S3 对象和方法, 除非您有很强烈的理由去使用 S4 对象和方法. 使用 S4 对象的一个主要理由是在 C++ 代码中直接使用对象. 使用一个 S4 泛型/方法的主要理由是对双参数的分发.
      避免混用 S3 和 S4: S4 方法会忽略 S3 中的继承, 反之亦然.
  • 例外
    除非有不去这样做的好理由, 否则应当遵循以上描述的编码惯例. 例外包括遗留代码的维护和对第三方代码的修改.
  • 结语
    遵守常识, 前后一致.如果您在编辑现有代码, 花几分钟看看代码的上下文并弄清它的风格. 如果其他人在 if 语句周围使用了空格, 那您也应该这样做. 如果他们的注释是用星号组成的小盒子围起来的, 那您也要这样写。
    遵循编码风格准则的意义在于, 人们相当于有了一个编程的通用词汇表, 于是人们可以专注于您在 说什么, 而不是您是 怎么说 的. 我们在这里提供全局的编码风格规则以便人们了解这些词汇, 但局部风格也很重要. 如果您加入文件中的代码看起来和周围的已有代码截然不同, 那么代码阅读者的阅读节奏就会被破坏. 尽量避免这样做. OK, 关于如何写代码已经写得够多了; 代码本身要有趣的多. 编码愉快!
  • 参考文献
    http://www.maths.lth.se/help/R/RCC/ - R语言编码惯例
    http://ess.r-project.org/ - 为 emacs 用户而生. 在您的 emacs 中运行 R 并且提供了一个 emacs mode.


二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

全部回复
2012-4-7 14:06:00
什么。。。。。。
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

2012-4-7 20:25:21
很感兴趣~~楼主多介绍一下呗
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

2012-4-7 21:39:42
个人喜好,一致就好。
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

2012-4-7 22:30:01
R Coding Conventions (RCC)
- a draft

Version 0.9, January 2009
(since 2002)

Henrik Bengtsson

Dept of Statistics, University of California, Berkeley


Table of Content

Introduction
Layout of the Recommendations
Recommendation Importance
General Recommendations
Naming Conventions
General Naming Conventions
Specific Naming Conventions
Files
Statements
Generic functions (under S3)
Variables
Constants
Loops
Conditionals
Miscellaneous
Layout and Comments
Layout
White Space
Comments
Acknowledgments
References

Introduction

Please note that this document is under construction since mid October 2002 and should still be seen as a first rought draft.  There is no well defined coding recommendations for the R language [1] and neither is there a de facto standard. This document will give some recommendations, which are very similar to the ones in the Java programming style [2][3], which have found to be helpful for both the developer as well as the end user of packages and functions written in R.

Layout of the Recommendations

The recommendations are grouped by topic and each recommendation is numbered to make it easier to refer to during reviews.  Layout for the recommendations is as follows:

Guideline short description
Example if applicable
Motivation, background and additional information.

The motivation section is important. Coding standards and guidelines tend to start "religious wars", and it is important to state the background for the recommendation.

Recommendation Importance

In the guideline sections the terms must, should and can have special meaning. A must requirement must be followed, a should is a strong recommendation, and a can is a general guideline.

General Recommendations

Any violation to the guide is allowed if it enhances readability.

The main goal of the recommendation is to improve readability and thereby the understanding and the maintainability and general quality of the code. It is impossible to cover all the specific cases in a general guide and the programmer should be flexible.


Naming Conventions

General Naming Conventions

Names representing classes must be nouns and written in mixed case starting with upper case ("CamelCase").
Line
FilePrefix        # NOT:  File.Prefix
Even if it is legal to have . (period) in a class name, it is highly recommended not to have it, since declaration of S3 methods are separating the method name from the class name where a . (period) occurs, cf. the following ambigous method definition:
a.b.c <- function(x) {
  :
}
Is the method above meant to be method a.b() for class c or method a() for class b.c?


Variable (field, attribute) names must be in mixed case starting with lower case ("camelCase").
line                    
filePrefix        # NOT:  file.prefix
Makes variables easy to distinguish from types, e.g. Line vs line.
Avoid using . (period) in variable names to make names more consistent with other naming conventions.


Names representing constants must be all uppercase using period to separate words.
MAX.ITERATIONS, COLOR.RED
Since the R language does not support (final) constants, but regular variables must be used, it is up to the programmer to make sure that such variables keeps the same value throughout its life time. This rule will help the programmer to identify which variables can be modified and which can be not.
Note that this does not follow the general suggestions of avoiding . (period) in names. In other languages, it is common to use _, but since this was previously used as a "shortcut" for assignment in R, we choose not to use this in order to avoid problems. In the future, we might update this guideline to make use of _ instead/also.


Names representing methods (functions) must be verbs and written in mixed case starting with lower case ("camelCase").
getName()                   # NOT:  get.name()
computeTotalWidth()         # NOT:  compute.total.width()
This is identical to variable names, but methods in R are already distinguishable from variables by their specific form.
Do not use . (period) in the method name as it is ambigous in the context of object oriented code, cf. "Names representing classes must be nouns and written in mixed case starting with upper case." above.


Names representing arguments should be in mixed case starting with lower case.
normalizeScale <- function(x, newSd=1) {
  :
}
For backward compatibility with historical functions, it is alright to also use . for separating words, e.g.
normalizeScale <- function(x, new.sd=1) {
  :
}


Names representing constructors should be identical to the class name.
Line <- function(x0, y0, x1, y1) {
  line <- list(x=c(x0,y0), y=(x1,y1));
  class(line) <- "Line";
  line;
}
This makes it easy to remember the name of a function for creating a new instance of a class. It also makes the constructor to stand out from the methods.


Abbreviations and acronyms should not be uppercase when used as name.
exportHtmlSource();  # NOT: exporthtmlSource();
openDvdPlayer();     # NOT: openDVDPlayer();
Using all uppercase for the base name will give conflicts with the naming conventions given above. A variable of this type whould have to be named dVD, hTML etc. which obviously is not very readable. Another problem is illustrated in the examples above; When the name is connected to another, the readability is seriously reduced; The word following the acronym does not stand out as it should.


Private variables and class fields should have . prefix.
.lastErrorValue <- 0;

SomeClass <- function() {
  object <- list(
    .length = NA;
  )
  class(object) <- "SomeClass";
  object;
}
Apart from its name and its type, the scope of a variable is its most important feature. Indicating class scope by using . makes it easy to distinguish class variables from local scratch variables. This is important because class variables are considered to have higher significance than method variables, and should be treated with special care by the programmer.

This naming convention makes it easy to exclude these objects from the ones exported in the name spaces, e.g.
# NAMESPACE file for package not exporting private objects:
exportPattern("^[^\\.]")

A side effect of the . naming convention is that it nicely resolves the problem of finding reasonable variable names for setter methods:
setDepth.SomeClass <- function(this, depth) {
  this$.depth <- depth;
  this;
}

An issue is whether the . should be added as a prefix or as a suffix. Both practices are commonly used, but the former is recommended because it is also consistent with how ls() works, which will only list R object with . as a prefix if and only if the argument all.names=TRUE.

It should be noted that scope identification in variables have been a controversial issue for quite some time. It seems, though, that this practice now is gaining acceptance and that it is becoming more and more common as a convention in the professional development community.


Private functions and class methods should have . prefix.
.anInternalUtilityFunction <- function(x, y) {
  # ...
}

.calculateIntermediateValue.SomeClass <- function(this) {
  # ...
}
The rational for this rule is the same as the one for the above rule about private variables and private class fields.


Arguments and generic variables should have the same name as their type.
setTopic(topic)        # NOT: setTopic(value)
                       # NOT: setTopic(aTopic)
                       # NOT: setTopic(x)

connect(database)      # NOT: connect(db)
                       # NOT: connect(oracleDB)
Reduce complexity by reducing the number of terms and names used. Also makes it easy to deduce the type given a variable name only.
If for some reason this convention doesn't seem to fit it is a strong indication that the type name is badly chosen.
Non-generic variables have a role. These variables can often be named by combining role and type:
Point  startingPoint, centerPoint;
Name   loginName;


All names should be written in English.
fileName;   # NOT:  filNamn
English is the preferred language for international development.


Variables with a large scope should have long names, variables with a small scope can have short names [2].

Scratch variables used for temporary storage or indices are best kept short. A programmer reading such variables should be able to assume that its value is not used outside a few lines of code. Common scratch variables for integers are i, j, k (or ii, jj, kk), m, n and for characters ch (c is not recommended since it used for concatenating vectors, e.g. c(1,2)).


The name of the object is implicit and should be avoided in a method name.
getLength(line);   # NOT:  getLineLength(line);
The latter seems natural in the class declaration, but proves superfluous in use, as shown in the example.
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

2012-4-7 22:30:50

Specific Naming Conventions

The is prefix should be used for boolean variables and methods.
isSet, isVisible, isFinished, isFound, isOpen
Using the is prefix solves a common problem of choosing bad boolean names like status or flag. isStatus or isFlag simply doesn't fit, and the programmer is forced to chose more meaningful names.
There are a few alternatives to the is prefix that fits better in some situations. These are has, can, should, and was prefixes:
hasLicense();
canEvaluate();
shouldAbort <- FALSE;


The term find can be used in methods where something is looked up.
findNearestVertex(vertex);  findMinElement(matrix);
In cases where get is inappropriate or awkward, find can be used as a prefix. It gives the reader the immediate clue that this is a simple look up method with a minimum of computations involved. Consistent use of the term enhances readability.

Moreover, find may indicate that there is a lookup that may take some computing time, whereas get indicates a more directory action of low cost. Contrary to, say, searchFor, both find and get indicate that there will be a unique answer.


The term initialize can be used where an object or a concept is established.
initializeFontSet(printer);
The American initialize should be preferred over the English initialise. Abbreviation init must be avoided.


Tcl/Tk (GUI) variables should be suffixed by the element type.
okButton, bgImage, mainWindow, leftScrollbar, nameEntry
Enhances readability since the name gives the user an immediate clue of the type of the variable and thereby the available resources of the object.


The List suffix can be used on names representing a list of objects.
vertex      # one vertex
vertexList  # a list of vertices
Enhances readability since the name gives the user an immediate clue of the type of the variable and the operations that can be performed on the object.
Simply using the plural form of the base class name for a list (matrixElement (one matrix element), matrixElements (list of matrix elements)) should be avoided since the two only differ in a single character and are thereby difficult to distinguish.
A list in this context is the compound data type that can be traversed backwards, forwards, etc. (typically a Vector). A plain array is simpler. The suffix Array can be used to denote an array of objects.


The n prefix should be used for variables representing a number of objects.
nPoints, nLines
The notation is taken from mathematics where it is an established convention for indicating a number of objects.
In addition to n the prefix nbrOf or the prefix numberOf can also be used. A num prefix must not be used.


The No suffix should be used for variables representing an entity number.
tableNo, employeeNo
The notation is taken from mathematics where it is an established convention for indicating an entity number.
An elegant alternative is to prefix such variables with an i: iTable, iEmployee. This effectively makes them named iterators.


Iterator variables should be called i, j, k etc.
for (i in seq(nTables)) {
  :
}
The notation is taken from mathematics where it is an established convention for indicating iterators.
Variables named j, k etc. should be used for nested loops only.
Some prefer to use "doubled" variable names, e.g. ii, jj, kk, etc., because they are much easier to find using the editors search functions.


Complement names must be used for complement entities [2].
get/set, add/remove, create/destroy, start/stop, insert/delete,
increment/decrement, old/new, begin/end, first/last, up/down, min/max,
next/previous, old/new, open/close, show/hide
Reduce complexity by symmetry.


Abbreviations in names should be avoided.
computeAverage();  # NOT:  compAvg();
There are two types of words to consider. First are the common words listed in a language dictionary. These must never be abbreviated. Never write:
cmd   instead of   command
cp    instead of   copy
pt    instead of   point
comp  instead of   compute
init  instead of   initialize
etc.
Then there are domain specific phrases that are more naturally known through their acronym or abbreviations. These phrases should be kept abbreviated. Never write:
HypertextMarkupLanguage  instead of   html
CentralProcessingUnit    instead of   cpu
PriceEarningRatio        instead of   pe
etc.


Negated boolean variable names must be avoided.
isError;  # NOT:  isNotError
isFound;  # NOT:  isNotFound
The problem arise when the logical not operator is used and double negative arises. It is not immediately apparent what !isNotError means.


Associated constants (final variables) should be prefixed by a common type name.
COLOR.RED   <- 1;
COLOR.GREEN <- 2;
COLOR.BLUE  <- 3;
This indicates that the constants belong together, and what concept the constants represents.


Functions (methods returning an object) should be named after what they return and procedures (void methods) after what they do.

Increase readability. Makes it clear what the unit should do and especially all the things it is not supposed to do. This again makes it easier to keep the code clean of side effects.
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

点击查看更多内容…
相关推荐
栏目导航
热门文章
推荐文章

说点什么

分享

扫码加好友,拉您进群
各岗位、行业、专业交流群