Learn R for Applied Statistics, Eric
About the Author␈ix
About the Technical Reviewer␈xi
Acknowledgments␈xiii
Introduction␈xv
Chapter 1: ␇Introduction␈1
What Is R?␈1
High-Level and Low-Level Languages␈2
What Is Statistics?␈3
What Is Data Science?␈4
What Is Data Mining?␈6
Business Understanding␈8
Data Understanding␈8
Data Preparation␈8
Modeling␈9
Evaluation␈9
Deployment␈9
What Is Text Mining?␈9
Data Acquisition␈10
Text Preprocessing␈10
Modeling␈11
Evaluation/Validation␈11
Applications␈11
Table of Contents
Natural Language Processing␈11
Three Types of Analytics␈12
Descriptive Analytics␈12
Predictive Analytics␈13
Prescriptive Analytics␈13
Big Data␈13
Volume␈13
Velocity␈14
Variety␈14
Why R?␈15
Conclusion␈16
References␈18
Chapter 2: ␇Getting Started␈19
What Is R?␈19
The Integrated Development Environment␈20
RStudio: The IDE for R␈22
Installation of R and RStudio␈22
Writing Scripts in R and RStudio␈30
Conclusion␈36
References␈37
Chapter 3: ␇Basic Syntax␈39
Writing in R Console␈39
Using the Code Editor␈42
Adding Comments to the Code␈46
Variables␈47
Data Types␈48
Vectors␈50
Lists␈53
Matrix␈58
Data Frame␈63
Logical Statements␈67
Loops␈69
For Loop␈69
While Loop␈71
Break and Next Keywords␈72
Repeat Loop␈74
Functions␈75
Create Your Own Calculator␈80
Conclusion␈83
References␈84
Chapter 4: ␇Descriptive Statistics␈87
What Is Descriptive Statistics?␈87
Reading Data Files␈88
Reading a CSV File␈89
Writing a CSV File␈91
Reading an Excel File␈92
Writing an Excel File␈93
Reading an SPSS File␈94
Writing an SPSS File␈96
Reading a JSON File␈96
Basic Data Processing␈97
Selecting Data␈97
Sorting␈99
Filtering␈101
Removing Missing Values␈102
Removing Duplicates␈103
Some Basic Statistics Terms␈104
Types of Data␈104
Mode, Median, Mean␈105
Interquartile Range, Variance, Standard Deviation␈110
Normal Distribution␈115
Binomial Distribution␈121
Conclusion␈124
References␈125
Chapter 5: ␇Data Visualizations␈129
What Are Data Visualizations?␈129
Bar Chart and Histogram␈130
Line Chart and Pie Chart␈137
Scatterplot and Boxplot␈142
Scatterplot Matrix␈146
Social Network Analysis Graph Basics␈147
Using ggplot2␈150
What Is the Grammar of Graphics?␈151
The Setup for ggplot2␈151
Aesthetic Mapping in ggplot2␈152
Geometry in ggplot2␈152
Labels in ggplot2␈155
Themes in ggplot2␈156
ggplot2 Common Charts␈158
Bar Chart␈158
Histogram␈160
Density Plot␈161
Scatterplot␈161
Line chart␈162
Boxplot␈163
Interactive Charts with Plotly and ggplot2␈166
Conclusion␈169
References␈170
Chapter 6: ␇Inferential Statistics and Regressions␈173
What Are Inferential Statistics and Regressions?␈173
apply(), lapply(), sapply()␈175
Sampling␈178
Simple Random Sampling␈178
Stratified Sampling␈179
Cluster Sampling␈179
Correlations␈183
Covariance␈185
Hypothesis Testing and P-Value␈186
T-Test␈187
Types of T-Tests␈187
Assumptions of T-Tests␈188
Type I and Type II Errors␈188
One-Sample T-Test␈188
Two-Sample Independent T-Test␈190
Two-Sample Dependent T-Test␈193
Chi-Square Test␈194
Goodness of Fit Test␈194
Contingency Test␈196
ANOVA␈198
Grand Mean␈198
Hypothesis␈198
Assumptions␈199
Between Group Variability␈199
Within Group Variability␈201
One-Way ANOVA␈202
Two-Way ANOVA␈204
MANOVA␈206
Nonparametric Test␈209
Wilcoxon Signed Rank Test␈209
Wilcoxon-Mann-Whitney Test␈213
Kruskal-Wallis Test␈216
Linear Regressions␈218
Multiple Linear Regressions␈223
Conclusion␈229
References␈231
␇Index␈237