Introductory Statistics with R, 2E, Peter

1 Basics

1.1 First steps . 1.1.1 An overgrown calculator . 1.1.2 Assignments . 1.1.3 Vectorized arithmetic . 1.1.4 Standard procedures . 1.1.5 Graphics .

1.2 R language essentials . 1.2.1 Expressions and objects . 1.2.2 Functions and arguments . 1.2.3 Vectors . 1.2.4 Quoting and escape sequences . 1.2.5 Missing values . 1.2.6 Functions that create vectors . 1.2.7 Matrices and arrays . 1.2.8 Factors . 1.2.9 Lists . 1.2.10 Data frames . 1.2.11 Indexing . 1.2.12 Conditional selection . 1.2.13 Indexing of data frames . 1.2.14 Grouped data and data frames .xii 1.2.15 Implicit loops . 1.2.16 Sorting . 1.3 Exercises .26

2 The R environment

2.1 Session management . 2.1.1 The workspace . 2.1.2 Textual output . 2.1.3 Scripting . 2.1.4 Getting help . 2.1.5 Packages . 2.1.6 Built-in data . 2.1.7 attach and detach . 2.1.8 subset, transform, and within .

2.2 The graphics subsystem . 2.2.1 Plot layout . 2.2.2 Building a plot from pieces . 2.2.3 Using par . 2.2.4 Combining plots .

2.3 R programming . 2.3.1 Flow control . 2.3.2 Classes and generic functions .

2.4 Data entry . 2.4.1 Reading from a text file . 2.4.2 Further details on read.table . 2.4.3 The data editor . 2.4.4 Interfacing to other programs .

2.5 Exercises .31

3 Probability and distributions

3.1 Random sampling . 3.2 Probability calculations and combinatorics . 3.3 Discrete distributions . 3.4 Continuous distributions .

3.5 The built-in distributions in R . 3.5.1 Densities . 3.5.2 Cumulative distribution functions . 3.5.3 Quantiles . 3.5.4 Random numbers .

3.6 Exercises .55

4 Descriptive statistics and graphics

4.1 Summary statistics for a single group .

4.2 Graphical display of distributions . 4.2.1 Histograms .67 4.2.2 Empirical cumulative distribution . 4.2.3 Q–Q plots . 4.2.4 Boxplots .

4.3 Summary statistics by groups . 4.4 Graphics for grouped data . 4.4.1 Histograms . 4.4.2 Parallel boxplots . 4.4.3 Stripcharts .

4.5 Tables . 4.5.1 Generating tables . 4.5.2 Marginal tables and relative frequency .

4.6 Graphical display of tables . 4.6.1 Barplots . 4.6.2 Dotcharts . 4.6.3 Piecharts . Exercises .73

5 One- and two-sample tests 5.1 One-sample t test . 5.2 Wilcoxon signed-rank test . 5.3 Two-sample t test . 5.4 Comparison of variances . 5.5 Two-sample Wilcoxon test . 5.6 The paired t test . 5.7 The matched-pairs Wilcoxon test . 5.8 Exercises .95

6 Regression and correlation 6.1 Simple linear regression . 6.2 Residuals and fitted values . 6.3 Prediction and confidence bands .

6.4 Correlation . 6.4.1 Pearson correlation . 6.4.2 Spearman’s ρ . 6.4.3 Kendall’s τ . 6.5 Exercises .109

7 Analysis of variance and the Kruskal–Wallis test 7.1 One-way analysis of variance . 7.1.1 Pairwise comparisons and multiple testing . 7.1.2 Relaxing the variance assumption . 7.1.3 Graphical presentation . 7.1.4 Bartlett’s test .

7.2 Kruskal–Wallis test .

7.3 Two-way analysis of variance .127 7.3.1 Graphics for repeated measurements . 7.4 The Friedman test . 7.5 The ANOVA table in regression analysis . 7.6 Exercises .140

8 Tabular data 8.1 Single proportions . 8.2 Two independent proportions . 8.3 k proportions, test for trend . 8.4 r × c tables . 8.5 Exercises .145

9 Power and the computation of sample size 9.1 The principles of power calculations . 9.1.1 Power of one-sample and paired t tests . 9.1.2 Power of two-sample t test . 9.1.3 Approximate methods . 9.1.4 Power of comparisons of proportions .

9.2 Two-sample problems . 9.3 One-sample problems and paired tests . 9.4 Comparison of proportions . 9.5 Exercises .155

10 Advanced data handling

10.1 Recoding variables . 10.1.1 The cut function . 10.1.2 Manipulating factor levels . 10.1.3 Working with dates . 10.1.4 Recoding multiple variables .

10.2 Conditional calculations .

10.3 Combining and restructuring data frames . 10.3.1 Appending frames . 10.3.2 Merging data frames . 10.3.3 Reshaping data frames .

10.4 Per-group and per-case procedures . 10.5 Time splitting . 10.6 Exercises .163

11 Multiple regression

11.1 Plotting multivariate data . 11.2 Model specification and output . 11.3 Model search . 11.4 Exercises .185

12 Linear models

12.1 Polynomial regression . 12.2 Regression through the origin . 12.3 Design matrices and dummy variables . 12.4 Linearity over groups . 12.5 Interactions . 12.6 Two-way ANOVA with replication .

12.7 Analysis of covariance . 12.7.1 Graphical description . 12.7.2 Comparison of regression lines . 12.8 Diagnostics . 12.9 Exercises .195

13 Logistic regression 13.1 Generalized linear models .

13.2 Logistic regression on tabular data . 13.2.1 The analysis of deviance table . 13.2.2 Connection to test for trend .

13.3 Likelihood profiling . 13.4 Presentation as odds-ratio estimates . 13.5 Logistic regression using raw data . 13.6 Prediction . 13.7 Model checking . 13.8 Exercises .227

14 Survival analysis

14.1 Essential concepts . 14.2 Survival objects . 14.3 Kaplan–Meier estimates . 14.4 The log-rank test . 14.5 The Cox proportional hazards model . 14.6 Exercises .249

15 Rates and Poisson regression

15.1 Basic ideas . 15.1.1 The Poisson distribution . 15.1.2 Survival analysis with constant hazard . 15.2 Fitting Poisson models . 15.3 Computing rates . 15.4 Models with piecewise constant intensities . 15.5 Exercises .259

16 Nonlinear curve fitting

16.1 Basic usage . 16.2 Finding starting values .275 16.3 Self-starting models . 16.4 Profiling . 16.5 Finer control of the fitting algorithm . 16.6 Exercises .

A Obtaining and installing R and the ISwR package289 B Data sets in the ISwR package293 C Compendium325 D Answers to exercises337 Bibliography355 Index3571