Learn R for Applied Statistics, Eric

About the Author␈ix

About the Technical Reviewer␈xi

Acknowledgments␈xiii

Introduction␈xv

Chapter 1: ␇Introduction␈1

What Is R?␈1

High-Level and Low-Level Languages␈2

What Is Statistics?␈3

What Is Data Science?␈4

What Is Data Mining?␈6

Business Understanding␈8

Data Understanding␈8

Data Preparation␈8

Modeling␈9

Evaluation␈9

Deployment␈9

What Is Text Mining?␈9

Data Acquisition␈10

Text Preprocessing␈10

Modeling␈11

Evaluation/Validation␈11

Applications␈11

Table of Contents

Natural Language Processing␈11

Three Types of Analytics␈12

Descriptive Analytics␈12

Predictive Analytics␈13

Prescriptive Analytics␈13

Big Data␈13

Volume␈13

Velocity␈14

Variety␈14

Why R?␈15

Conclusion␈16

References␈18

Chapter 2: ␇Getting Started␈19

What Is R?␈19

The Integrated Development Environment␈20

RStudio: The IDE for R␈22

Installation of R and RStudio␈22

Writing Scripts in R and RStudio␈30

Conclusion␈36

References␈37

Chapter 3: ␇Basic Syntax␈39

Writing in R Console␈39

Using the Code Editor␈42

Adding Comments to the Code␈46

Variables␈47

Data Types␈48

Vectors␈50

Lists␈53

Matrix␈58

Data Frame␈63

Logical Statements␈67

Loops␈69

For Loop␈69

While Loop␈71

Break and Next Keywords␈72

Repeat Loop␈74

Functions␈75

Create Your Own Calculator␈80

Conclusion␈83

References␈84

Chapter 4: ␇Descriptive Statistics␈87

What Is Descriptive Statistics?␈87

Reading Data Files␈88

Reading a CSV File␈89

Writing a CSV File␈91

Reading an Excel File␈92

Writing an Excel File␈93

Reading an SPSS File␈94

Writing an SPSS File␈96

Reading a JSON File␈96

Basic Data Processing␈97

Selecting Data␈97

Sorting␈99

Filtering␈101

Removing Missing Values␈102

Removing Duplicates␈103

Some Basic Statistics Terms␈104

Types of Data␈104

Mode, Median, Mean␈105

Interquartile Range, Variance, Standard Deviation␈110

Normal Distribution␈115

Binomial Distribution␈121

Conclusion␈124

References␈125

Chapter 5: ␇Data Visualizations␈129

What Are Data Visualizations?␈129

Bar Chart and Histogram␈130

Line Chart and Pie Chart␈137

Scatterplot and Boxplot␈142

Scatterplot Matrix␈146

Social Network Analysis Graph Basics␈147

Using ggplot2␈150

What Is the Grammar of Graphics?␈151

The Setup for ggplot2␈151

Aesthetic Mapping in ggplot2␈152

Geometry in ggplot2␈152

Labels in ggplot2␈155

Themes in ggplot2␈156

ggplot2 Common Charts␈158

Bar Chart␈158

Histogram␈160

Density Plot␈161

Scatterplot␈161

Line chart␈162

Boxplot␈163

Interactive Charts with Plotly and ggplot2␈166

Conclusion␈169

References␈170

Chapter 6: ␇Inferential Statistics and Regressions␈173

What Are Inferential Statistics and Regressions?␈173

apply(), lapply(), sapply()␈175

Sampling␈178

Simple Random Sampling␈178

Stratified Sampling␈179

Cluster Sampling␈179

Correlations␈183

Covariance␈185

Hypothesis Testing and P-Value␈186

T-Test␈187

Types of T-Tests␈187

Assumptions of T-Tests␈188

Type I and Type II Errors␈188

One-Sample T-Test␈188

Two-Sample Independent T-Test␈190

Two-Sample Dependent T-Test␈193

Chi-Square Test␈194

Goodness of Fit Test␈194

Contingency Test␈196

ANOVA␈198

Grand Mean␈198

Hypothesis␈198

Assumptions␈199

Between Group Variability␈199

Within Group Variability␈201

One-Way ANOVA␈202

Two-Way ANOVA␈204

MANOVA␈206

Nonparametric Test␈209

Wilcoxon Signed Rank Test␈209

Wilcoxon-Mann-Whitney Test␈213

Kruskal-Wallis Test␈216

Linear Regressions␈218

Multiple Linear Regressions␈223

Conclusion␈229

References␈231

␇Index␈237