Temario ======= Chapter 1: An Introduction to Data Analysis ------------------------------------------- Data Analysis ============= Knowledge Domains of the Data Analyst ===================================== Computer Science Mathematics and Statistics Machine Learning and Artificial Intelligence Professional Fields of Application Understanding the Nature of the Data ==================================== When the Data Become Information When the Information Becomes Knowledge Types of Data The Data Analysis Process ========================= Problem Definition Data Extraction Data Preparation Data Exploration/Visualization Predictive Modeling Model Validation Deployment Quantitative and Qualitative Data Analysis ================================ Open Data ========= Python and Data Analysis ======================== Conclusions =========== Chapter 2: Introduction to the Python World ------------------------------------------- Python—The Programming Language 15 The Interpreter and the Execution Phases of the Code 16 Installing Python 18 Python Distributions 19 Using Python 23 Writing Python Code 26 IPython 30 PyPI—The Python Package Index 36 The IDEs for Python 37 SciPy 42 NumPy 42 Pandas 43 matplotlib 43 Conclusions 43 ■ Chapter 3: The NumPy Library 45 NumPy: A Little History 45 The NumPy Installation 46 ndarray: The Heart of the Library 47 Create an Array 48 Types of Data 49 The dtype Option 50 Intrinsic Creation of an Array 50 Basic Operations 51 Arithmetic Operators 52 The Matrix Product 53 Increment and Decrement Operators 54 Universal Functions (ufunc) 54 Aggregate Functions 55 Indexing, Slicing, and Iterating 55 Indexing 55 Slicing 57 Iterating an Array 59 Conditions and Boolean Arrays 60 Shape Manipulation 61 Array Manipulation 62 Joining Arrays 62 Splitting Arrays 63 General Concepts 64 Copies or Views of Objects 64 Vectorization 65 Broadcasting 66 Structured Arrays 68 Reading and Writing Array Data on Files 70 Loading and Saving Data in Binary Files 70 Reading Files with Tabular Data 70 Conclusions 72 ■ Chapter 4: The pandas Library—An Introduction 73 pandas: The Python Data Analysis Library 73 Installation of pandas 74 Installation from Anaconda 74 Installation from PyPI 78 Getting Started with pandas 78 Introduction to pandas Data Structures 79 The Series 80 The Dataframe 87 The Index Objects 94 Other Functionalities on Indexes 96 Reindexing 96 Dropping 98 Arithmetic and Data Alignment 99 Operations Between Data Structures 100 Flexible Arithmetic Methods 100 Operations Between Dataframes and Series 101 Function Application and Mapping 102 Functions by Element 102 Functions by Row or Column 102 Statistics Functions 103 Sorting and Ranking 104 Correlation and Covariance 107 “Not a Number” Data 108 Assigning a NaN Value 108 Filtering Out NaN Values 109 Filling in NaN Occurrences 110 Hierarchical Indexing and Leveling 110 Reordering and Sorting Levels 112 Summary Statistics with groupby Instead of with Level 113 Conclusions 114 ■ Chapter 5: pandas: Reading and Writing Data 115 I/O API Tools 115 CSV and Textual Files 116 Reading Data in CSV or Text Files 116 Using Regexp to Parse TXT Files 119 Reading TXT Files Into Parts 121 Writing Data in CSV 121 Reading and Writing HTML Files 123 Writing Data in HTML 124 Reading Data from an HTML File 126 Reading Data from XML 127 Reading and Writing Data on Microsoft Excel Files 129 JSON Data 131 The HDF5 Format 135 Pickle—Python Object Serialization 136 Serialize a Python Object with cPickle 136 Pickling with pandas 137 Interacting with Databases 137 Loading and Writing Data with SQLite3 138 Loading and Writing Data with PostgreSQL in a Docker Container 140 Reading and Writing Data with a NoSQL Database: MongoDB 146 Conclusions 148 ■ Chapter 6: pandas in Depth: Data Manipulation 149 Data Preparation 149 Merging 150 Concatenating 154 Combining 156 Pivoting 157 Removing 160 Data Transformation 161 Removing Duplicates 161 Mapping 162 Discretization and Binning 166 Detecting and Filtering Outliers 168 Permutation 169 Random Sampling 170 String Manipulation 170 Built-in Methods for String Manipulation 170 Regular Expressions 172 Data Aggregation 173 GroupBy 174 A Practical Example 175 Hierarchical Grouping 176 Group Iteration 176 Chain of Transformations 177 Functions on Groups 178 Advanced Data Aggregation 179 Conclusions 181 ■ Chapter 7: Data Visualization with matplotlib and Seaborn 183 The matplotlib Library 183 Installation 184 The matplotlib Architecture 185 Backend Layer 186 Artist Layer 186 Scripting Layer (pyplot) 188 pylab and pyplot 188 pyplot 189 The Plotting Window 189 Data Visualization with Jupyter Notebook 191 Set the Properties of the Plot 192 matplotlib and NumPy 194 Using kwargs 196 Working with Multiple Figures and Axes 196 Adding Elements to the Chart 198 Adding Text 198 Adding a Grid 202 Adding a Legend 203 Saving Your Charts 206 Saving the Code 206 Saving Your Notebook as an HTML File or as Other File Formats 207 Saving Your Chart Directly as an Image 208 Handling Date Values 208 Chart Typology 211 Line Charts 211 Line Charts with pandas 217 Histograms 218 Bar Charts 219 Horizontal Bar Charts 222 Multiserial Bar Charts 223 Multiseries Bar Charts with a pandas Dataframe 225 Multiseries Stacked Bar Charts 227 Stacked Bar Charts with a pandas Dataframe 229 Other Bar Chart Representations 230 Pie Charts 231 Pie Charts with a pandas Dataframe 234 Advanced Charts 235 Contour Plots 235 Polar Charts 236 The mplot3d Toolkit 237 3D Surfaces 238 Scatter Plots in 3D 239 Bar Charts in 3D 240 Multipanel Plots 241 Display Subplots Within Other Subplots 241 Grids of Subplots 243 The Seaborn Library 245 Conclusions 257 ■ Chapter 8: Machine Learning with scikit-learn 259 The scikit-learn Library 259 Machine Learning 259 Supervised and Unsupervised Learning 259 Training Set and Testing Set 260 Supervised Learning with scikit-learn 260 The Iris Flower Dataset 261 The PCA Decomposition 264 K-Nearest Neighbors Classifier 267 Diabetes Dataset 271 Linear Regression: The Least Square Regression 272 Support Vector Machines (SVMs) 276 Support Vector Classification (SVC) 277 Nonlinear SVC 281 Plotting Different SVM Classifiers Using the Iris Dataset 283 Support Vector Regression (SVR) 285 Conclusions 287 ■ Chapter 9: Deep Learning with TensorFlow 289 Artificial Intelligence, Machine Learning, and Deep Learning 289 Artificial Intelligence 289 Machine Learning Is a Branch of Artificial Intelligence 290 Deep Learning Is a Branch of Machine Learning 290 The Relationship Between Artificial Intelligence, Machine Learning, and Deep Learning 290 Deep Learning 291 Neural Networks and GPUs 291 Data Availability: Open Data Source, Internet of Things, and Big Data 292 Python 292 Deep Learning Python Frameworks 292 Artificial Neural Networks 293 How Artificial Neural Networks Are Structured 293 Single Layer Perceptron (SLP) 294 Multilayer Perceptron (MLP) 296 Correspondence Between Artificial and Biological Neural Networks 297 TensorFlow 298 TensorFlow: Google’s Framework 298 TensorFlow: Data Flow Graph 298 Start Programming with TensorFlow 299 TensorFlow 2x vs TensorFlow 1x 299 Installing TensorFlow 300 Programming with the Jupyter Notebook 300 Tensors 300 Loading Data Into a Tensor from a pandas Dataframe 303 Loading Data in a Tensor from a CSV File 304 Operation on Tensors 306 Developing a Deep Learning Model with TensorFlow 307 Model Building 307 Model Compiling 308 Model Training and Testing 309 Prediction Making 309 Practical Examples with TensorFlow 2x 310 Single Layer Perceptron with TensorFlow 310 Multilayer Perceptron (with One Hidden Layer) with TensorFlow 317 Multilayer Perceptron (with Two Hidden Layers) with TensorFlow 319 Conclusions 321 ■ Chapter 10: An Example—Meteorological Data 323 A Hypothesis to Be Tested: The Influence of the Proximity of the Sea 323 The System in the Study: The Adriatic Sea and the Po Valley 323 Finding the Data Source 327 Data Analysis on Jupyter Notebook 328 Analysis of Processed Meteorological Data 332 The RoseWind 343 Calculating the Mean Distribution of the Wind Speed 347 Conclusions 348 ■ Chapter 11: Embedding the JavaScript D3 Library in the IPython Notebook 349 The Open Data Source for Demographics 349 The JavaScript D3 Library 352 Drawing a Clustered Bar Chart 355 The Choropleth Maps 358 The Choropleth Map of the US Population in 2022 362 Conclusions 366 ■ Chapter 12: Recognizing Handwritten Digits 367 Handwriting Recognition 367 Recognizing Handwritten Digits with scikit-learn 367 The Digits Dataset 368 Learning and Predicting 370 Recognizing Handwritten Digits with TensorFlow 372 Learning and Predicting with an SLP 376 Learning and Predicting with an MLP 381 Conclusions 384 ■ Chapter 13: Textual Data Analysis with NLTK 385 Text Analysis Techniques 385 The Natural Language Toolkit (NLTK) 386 Import the NLTK Library and the NLTK Downloader Tool 386 Search for a Word with NLTK 389 Analyze the Frequency of Words 390 Select Words from Text 392 Bigrams and Collocations 393 Preprocessing Steps 394 Use Text on the Network 397 Extract the Text from the HTML Pages 398 Sentiment Analysis 399 Conclusions 401 ■ Chapter 14: Image Analysis and Computer Vision with OpenCV 403 Image Analysis and Computer Vision 403 OpenCV and Python 404 OpenCV and Deep Learning 404 Installing OpenCV 404 First Approaches to Image Processing and Analysis 404 Before Starting 404 Load and Display an Image 405 Work with Images 406 Save the New Image 407 Elementary Operations on Images 407 Image Blending 411 Image Analysis 412 Edge Detection and Image Gradient Analysis 413 Edge Detection 413 The Image Gradient Theory 413 A Practical Example of Edge Detection with the Image Gradient Analysis 415 A Deep Learning Example: Face Detection 420 Conclusions 422 ■ Appendix A: Writing Mathematical Expressions with LaTeX 423 ■ Appendix B: Open Data Sources 435 Index 437