SAS和R是当今两大主流统计与数据分析工具,SAS权威、全面,R勇于挑战传统,其灵活、便捷和强大丝毫不逊色于SAS,二者结合之后是什么感觉——统计巨无霸?
附件这本书是2010年新书《SAS and R:Data management,Statistical Analysis and Graphics》,通过实例将两个统计强将对比介绍,涉及常见统计分析的各个方面,极为实用。
全书共六章主体加四章附录,目录如下:(由于字数限制,略去了后半部分)
Contents
List of Figures xiii
List of Tables xv
Preface xvii
1 Data management 1
1.1 Input ........................................ 1
1.1.1 Native dataset .............................. 1
1.1.2 Fixed format text files .......................... 2
1.1.3 Reading more complex text files . ................... 3
1.1.4 Comma separated value (CSV) files .................. 4
1.1.5 Reading datasets in other formats ................... 4
1.1.6 URL .................................... 5
1.1.7 XML (extensible markup language) .................. 6
1.1.8 Data entry ................................ 7
1.2 Output ...................................... 7
1.2.1 Save a native dataset . .......................... 7
1.2.2 Creating files for use by other packages ................ 8
1.2.3 Creating datasets in text format .................... 9
1.2.4 Displaying data .............................. 9
1.2.5 Number of digits to display ....................... 10
1.2.6 Creating HTML formatted output ................... 10
1.2.7 Creating XML datasets and output . . . ................ 11
1.3 Structure and meta-data . ............................ 11
1.3.1 Access variables from a dataset ..................... 11
1.3.2 Names of variables and their types ................... 12
1.3.3 Values of variables in a dataset ..................... 12
1.3.4 Rename variables in a dataset ...................... 12
1.3.5 Add comment to a dataset or variable ................. 13
1.4 Derived variables and data manipulation .................... 13
1.4.1 Create string variables from numeric variables ............. 13
1.4.2 Create numeric variables from string variables ............. 14
1.4.3 Extract characters from string variables . ............... 14
1.4.4 Length of string variables ........................ 15
1.4.5 Concatenate string variables ....................... 15
1.4.6 Find strings within string variables . .................. 15
1.4.7 Remove spaces around string variables . . ............... 16
1.4.8 Upper to lower case . .......................... 16
v
© 2010 by Taylor and Francis Group, LLCvi CONTENTS
1.4.9 Create categorical variables from continuous variables . . . ..... 17
1.4.10 Recode a categorical variable ...................... 17
1.4.11 Create a categorical variable using logic . . .............. 18
1.4.12 Formatting values of variables ...................... 18
1.4.13 Label variables .............................. 19
1.4.14 Account for missing values . ...................... 19
1.4.15 Observation number ........................... 21
1.4.16 Unique values ............................... 22
1.4.17 Lagged variable .............................. 22
1.4.18 SQL .................................... 23
1.4.19 Perl interface ............................... 23
1.5 Merging, combining, and subsetting datasets . . ............... 23
1.5.1 Subsetting observations ......................... 23
1.5.2 Random sample of a dataset ...................... 24
1.5.3 Convert from wide to long (tall) format ................ 25
1.5.4 Convert from long (tall) to wide format ................ 26
1.5.5 Concatenate datasets .......................... 26
1.5.6 Sort datasets ............................... 27
1.5.7 Merge datasets .............................. 27
1.5.8 Drop variables in a dataset ....................... 29
1.6 Date and time variables ............................. 30
1.6.1 Create date variable ........................... 30
1.6.2 Extract weekday . ............................ 30
1.6.3 Extract month .............................. 31
1.6.4 Extract year ............................... 31
1.6.5 Extract quarter .............................. 31
1.6.6 Create time variable ........................... 31
1.7 Interactions with the operating system ..................... 32
1.7.1 Timing commands ............................ 32
1.7.2 Execute command in operating system ................. 32
1.7.3 Find working directory .......................... 33
1.7.4 Change working directory ........................ 33
1.7.5 List and access files . .......................... 34
1.8 Mathematical functions ............................. 34
1.8.1 Basic functions . . ............................ 34
1.8.2 Trigonometric functions . . ....................... 35
1.8.3 Special functions ............................. 35
1.8.4 Integer functions ............................. 36
1.8.5 Comparisons of floating point variables ................ 36
1.8.6 Derivative ................................. 37
1.8.7 Optimization problems . . ........................ 37
1.9 Matrix operations ................................ 38
1.9.1 Create matrix ............................... 38
1.9.2 Transpose matrix . . ........................... 38
1.9.3 Invert matrix . . . ............................ 39
1.9.4 Create submatrix . . ........................... 39
1.9.5 Create a diagonal matrix . . . ..................... 39
1.9.6 Create vector of diagonal elements . .................. 40
1.9.7 Create vector from a matrix . . . .................... 40
1.9.8 Calculate determinant .......................... 40
1.9.9 Find eigenvalues and eigenvectors .................... 40
© 2010 by Taylor and Francis Group, LLCCONTENTS vii
1.9.10 Calculate singular value decomposition . ................ 41
1.10 Probability distributions and random number generation ........... 41
1.10.1 Probability density function . . ..................... 41
1.10.2 Quantiles of a probability density function . . ............. 42
1.10.3 Uniform random variables ........................ 42
1.10.4 Multinomial random variables ...................... 42
1.10.5 Normal random variables . ....................... 44
1.10.6 Multivariate normal random variables . . ............... 44
1.10.7 Exponential random variables ...................... 45
1.10.8 Other random variables ......................... 46
1.10.9 Setting the random number seed .................... 46
1.11 Control flow, programming, and data generation ............... 47
1.11.1 Looping .................................. 47
1.11.2 Conditional execution .......................... 47
1.11.3 Sequence of values or patterns ..................... 48
1.11.4 Referring to a range of variables .................... 50
1.11.5 Perform an action repeatedly over a set of variables ......... 50
1.12 Further resources ................................. 51
1.13 HELP examples . . . ............................... 51
1.13.1 Data input and output .......................... 51
1.13.2 Data display ............................... 54
1.13.3 Derived variables and data manipulation ................ 55
1.13.4 Sorting and subsetting datasets ..................... 61
1.13.5 Probability distributions . . ....................... 63
2 Common statistical procedures 65
2.1 Summary statistics ................................ 65
2.1.1 Means and other summary statistics .................. 65
2.1.2 Means by group ............................. 66
2.1.3 Trimmed mean .............................. 67
2.1.4 Five-number summary .......................... 67
2.1.5 Quantiles ................................. 67
2.1.6 Centering, normalizing, and scaling . .................. 68
2.1.7 Mean and 95% confidence interval ................... 68
2.1.8 Bootstrapping a sample statistic .................... 69
2.1.9 Proportion and 95% confidence interval ................ 70
2.2 Bivariate statistics ................................ 70
2.2.1 Epidemiologic statistics ......................... 70
2.2.2 Test characteristics . . .......................... 71
2.2.3 Correlation ................................ 72
2.2.4 Kappa (agreement) . ........................... 73
2.3 Contingency tables ................................ 73
2.3.1 Display cross-classification table .................... 73
2.3.2 Pearson chi-square statistic ....................... 74
2.3.3 Cochran–Mantel–Haenszel test . .................... 74
2.3.4 Fisher’s exact test ............................ 75