Temario
Chapter 1: An Introduction to Data Analysis
Data Analysis
Knowledge Domains of the Data Analyst
Computer Science
Mathematics and Statistics
Machine Learning and Artificial Intelligence
Professional Fields of Application
Understanding the Nature of the Data
When the Data Become Information
When the Information Becomes Knowledge
Types of Data
The Data Analysis Process
Problem Definition
Data Extraction
Data Preparation
Data Exploration/Visualization
Predictive Modeling
Model Validation
Deployment
Quantitative and Qualitative Data Analysis
Open Data
Python and Data Analysis
Conclusions
Chapter 2: Introduction to the Python World
Python—The Programming Language
15
The Interpreter and the Execution Phases of the Code
16
Installing Python
18
Python Distributions
19 Using Python 23 Writing Python Code
26 IPython 30
PyPI—The Python Package Index
36
The IDEs for Python
37
SciPy
42
NumPy
42 Pandas
43 matplotlib
43
Conclusions
43
■ Chapter 3: The NumPy Library 45
NumPy: A Little History
45 The NumPy Installation
46 ndarray: The Heart of the Library
47
Create an Array
48 Types of Data
49 The dtype Option
50 Intrinsic Creation of an Array
50
Basic Operations
51
Arithmetic Operators
52 The Matrix Product
53
Increment and Decrement Operators
54 Universal Functions (ufunc)
54 Aggregate Functions 55
Indexing, Slicing, and Iterating
55
Indexing
55 Slicing
57 Iterating an Array
59
Conditions and Boolean Arrays 60 Shape Manipulation
61 Array Manipulation
62
Joining Arrays
62 Splitting Arrays
63
General Concepts
64
Copies or Views of Objects
64 Vectorization
65 Broadcasting
66
Structured Arrays
68 Reading and Writing Array Data on Files
70
Loading and Saving Data in Binary Files
70 Reading Files with Tabular Data
70
Conclusions
72
- ■ Chapter 4: The pandas Library—An Introduction
73
pandas: The Python Data Analysis Library
73 Installation of pandas
74
Installation from Anaconda
74 Installation from PyPI 78
Getting Started with pandas 78 Introduction to pandas Data Structures
79
The Series 80 The Dataframe
87 The Index Objects
94
Other Functionalities on Indexes
96
Reindexing
96 Dropping
98 Arithmetic and Data Alignment
99
Operations Between Data Structures
100
Flexible Arithmetic Methods
100 Operations Between Dataframes and Series
101
Function Application and Mapping
102
Functions by Element
102 Functions by Row or Column
102 Statistics Functions
103
Sorting and Ranking
104 Correlation and Covariance
107 “Not a Number” Data
108
Assigning a NaN Value
108 Filtering Out NaN Values
109 Filling in NaN Occurrences
110
Hierarchical Indexing and Leveling
110
Reordering and Sorting Levels
112 Summary Statistics with groupby Instead of with Level
113
Conclusions
114
■ Chapter 5: pandas: Reading and Writing Data 115
I/O API Tools
115 CSV and Textual Files
116 Reading Data in CSV or Text Files 116
Using Regexp to Parse TXT Files
119 Reading TXT Files Into Parts
121 Writing Data in CSV 121
Reading and Writing HTML Files
123
Writing Data in HTML
124 Reading Data from an HTML File
126
Reading Data from XML
- 127 Reading and Writing Data on Microsoft Excel Files
129 JSON
Data
131 The HDF5 Format
135 Pickle—Python Object Serialization 136
Serialize a Python Object with cPickle
136 Pickling with pandas
137
Interacting with Databases
137
Loading and Writing Data with SQLite3
- 138 Loading and Writing Data with PostgreSQL in a Docker Container
140
- Reading and Writing Data with a NoSQL Database: MongoDB
146 Conclusions
148
- ■ Chapter 6: pandas in Depth: Data Manipulation
149
Data Preparation
149
Merging
150
Concatenating
154
Combining
156 Pivoting
157 Removing
160
Data Transformation
161
Removing Duplicates
161 Mapping 162
Discretization and Binning
166
Detecting and Filtering Outliers
168
Permutation
169
Random Sampling
170
String Manipulation
170
Built-in Methods for String Manipulation
170 Regular Expressions
172
Data Aggregation
173
GroupBy
174 A Practical Example
175 Hierarchical Grouping
176
Group Iteration 176
Chain of Transformations
177 Functions on Groups
178
Advanced Data Aggregation
179 Conclusions
181
■ Chapter 7: Data Visualization with matplotlib and Seaborn 183
The matplotlib Library
183 Installation 184 The matplotlib Architecture
185
Backend Layer
186 Artist Layer
186 Scripting Layer (pyplot)
188 pylab and pyplot
188
pyplot
189
The Plotting Window
189
Data Visualization with Jupyter Notebook
191
Set the Properties of the Plot 192 matplotlib and NumPy
194
Using kwargs 196
Working with Multiple Figures and Axes
196
Adding Elements to the Chart 198
Adding Text
198 Adding a Grid
202 Adding a Legend
203
Saving Your Charts
206
Saving the Code 206 Saving Your Notebook as an HTML File or as Other File Formats
207
Saving Your Chart Directly as an Image 208
Handling Date Values
208 Chart Typology
211 Line Charts
211
Line Charts with pandas
217
Histograms
218 Bar Charts
219
Horizontal Bar Charts 222 Multiserial Bar Charts
223 Multiseries Bar Charts with a pandas Dataframe
225 Multiseries Stacked Bar Charts
227 Stacked Bar Charts with a pandas Dataframe 229 Other Bar Chart Representations
230
Pie Charts
231
Pie Charts with a pandas Dataframe
234
Advanced Charts
235
Contour Plots
235 Polar Charts
236
The mplot3d Toolkit
237
3D Surfaces
238 Scatter Plots in 3D
239 Bar Charts in 3D
240
Multipanel Plots 241
Display Subplots Within Other Subplots
241 Grids of Subplots
243
The Seaborn Library
245 Conclusions
257
- ■ Chapter 8: Machine Learning with scikit-learn
259
The scikit-learn Library
259 Machine Learning 259
Supervised and Unsupervised Learning
259 Training Set and Testing Set
260
Supervised Learning with scikit-learn
260 The Iris Flower Dataset
261
The PCA Decomposition 264
K-Nearest Neighbors Classifier
267 Diabetes Dataset
- 271 Linear Regression: The Least Square Regression
272
Support Vector Machines (SVMs)
276
Support Vector Classification (SVC)
277 Nonlinear SVC
281 Plotting Different SVM Classifiers Using the Iris Dataset 283 Support Vector Regression (SVR) 285
Conclusions
287
- ■ Chapter 9: Deep Learning with TensorFlow
289
- Artificial Intelligence, Machine Learning, and Deep Learning
289
Artificial Intelligence
289 Machine Learning Is a Branch of Artificial Intelligence
290 Deep Learning Is a Branch of Machine Learning
- 290 The Relationship Between Artificial Intelligence, Machine Learning, and Deep Learning
290
Deep Learning
291
Neural Networks and GPUs 291 Data Availability: Open Data Source, Internet of Things, and Big Data 292 Python 292 Deep Learning Python Frameworks
292
Artificial Neural Networks
293
How Artificial Neural Networks Are Structured 293 Single Layer Perceptron (SLP)
294 Multilayer Perceptron (MLP)
- 296 Correspondence Between Artificial and Biological Neural Networks
297
TensorFlow
298
TensorFlow: Google’s Framework
298 TensorFlow: Data Flow Graph
298
Start Programming with TensorFlow
299
TensorFlow 2x vs TensorFlow 1x
299 Installing TensorFlow
300 Programming with the Jupyter Notebook 300 Tensors
300 Loading Data Into a Tensor from a pandas Dataframe
303 Loading Data in a Tensor from a CSV File
304 Operation on Tensors
306
Developing a Deep Learning Model with TensorFlow 307 Model Building
307 Model Compiling
308 Model Training and Testing
309 Prediction Making
309 Practical Examples with TensorFlow 2x
310
Single Layer Perceptron with TensorFlow
- 310 Multilayer Perceptron (with One Hidden Layer) with TensorFlow
317
- Multilayer Perceptron (with Two Hidden Layers) with TensorFlow
319
Conclusions
- 321 ■ Chapter 10: An Example—Meteorological Data
323
A Hypothesis to Be Tested: The Influence of the Proximity of the Sea 323
The System in the Study: The Adriatic Sea and the Po Valley
323
Finding the Data Source
327 Data Analysis on Jupyter Notebook 328
Analysis of Processed Meteorological Data
332 The RoseWind
343
Calculating the Mean Distribution of the Wind Speed
347
Conclusions
348 ■ Chapter 11: Embedding the JavaScript D3 Library in the IPython Notebook 349
The Open Data Source for Demographics
349 The JavaScript D3 Library
352 Drawing a Clustered Bar Chart
355 The Choropleth Maps
- 358 The Choropleth Map of the US Population in 2022
362
Conclusions
366
■ Chapter 12: Recognizing Handwritten Digits 367 Handwriting Recognition
- 367 Recognizing Handwritten Digits with scikit-learn
367 The
Digits Dataset
368 Learning and Predicting
- 370 Recognizing Handwritten Digits with TensorFlow
372
Learning and Predicting with an SLP
376 Learning and Predicting with an MLP
381 Conclusions
384
- ■ Chapter 13: Textual Data Analysis with NLTK
385 Text
Analysis Techniques
385
The Natural Language Toolkit (NLTK)
386 Import the NLTK Library and the NLTK Downloader Tool 386 Search for a Word with NLTK
389 Analyze the Frequency of Words
390 Select Words from Text
392 Bigrams and Collocations
393 Preprocessing Steps
394
Use Text on the Network
397 Extract the Text from the HTML Pages 398 Sentiment Analysis
399
Conclusions
401 ■ Chapter 14: Image Analysis and Computer Vision with OpenCV 403 Image Analysis and Computer Vision
403 OpenCV and Python
404 OpenCV and Deep Learning
404 Installing OpenCV
- 404 First Approaches to Image Processing and Analysis
404
Before Starting
404 Load and Display an Image
405 Work with Images
406 Save the New Image 407 Elementary Operations on Images 407 Image Blending 411
Image Analysis
412 Edge Detection and Image Gradient Analysis
413
Edge Detection
413 The Image Gradient Theory 413 A Practical Example of Edge Detection with the Image Gradient Analysis
415
A Deep Learning Example: Face Detection 420 Conclusions
- 422 ■ Appendix A: Writing Mathematical Expressions with LaTeX
423
■ Appendix B: Open Data Sources
435 Index
437