Machine learning is an interdisciplinary field; it includes statistics, probability theory, algebra, computer science, and much more. These disciplines come together in algorithms capable of learning iteratively from data and finding hidden insights that can be used to create intelligent applications. In spite of the immense possibilities offered by machine learning, a thorough mathematical understanding of many of these disciplines is necessary for a good understanding of the inner workings of the algorithms and getting good results.
There are many reasons why the statistics and algebra is important to build a machine learning system. Some of them are highlighted as follows:
- Select the right algorithm in terms of accuracy, training time, number of parameters, number of features, and complexity of the model
- Correctly set the parameters and choose validation strategies
- Recognize underfitting and overfitting
- Put appropriate confidence interval and uncertainty
MATLAB offers several functions that allow us to perform statistical analyses and algebraic operations on our data. For example, in MATLAB, computing descriptive statistics from sample data really is a breeze. It is possible to measure central tendency, dispersion, shape, correlation, covariance, quantiles, percentiles and much more. In addition, we can tabulate and cross-tabulate data, and compute summary statistics for grouped data. In case of missing (NaN) values, MATLAB arithmetic operation functions return NaN. To solve this problem, available functions in Statistics and Machine Learning Toolbox ignore these missing values and return a numerical value calculated using the remaining values.
Furthermore we can use statistical visualization to understand how data is distributed and how that compares to other datasets and distributions. In MATLAB, we may explore single-variable distributions using univariate plots such as box plots and histograms. As well as we can discover the relationships between variables applying bivariate plots such as grouped scatter plots and bivariate histograms. We visualize the relationship between multiple variables using multivariate plots such as Andrews and glyph plots. Finally we may customize our plot by adding case names, least-squares lines, and reference curves.
As for statistical analysis, and also for linear algebra, MATLAB offers many solutions ready to use. To remind you, linear algebra is the study of linear equations and their properties and is based on numerical matrices. MATLAB offers many tools for manipulating matrices, easily understood by people who are not experts in them. MATLAB makes it easy to perform computations with vectors and matrices.
Figure 1.22: Histogram of the sample data with a normal density fit
MATLAB provides several functions for:
- Matrix operations and transformations
- Linear equations
- Matrix factorization and decomposition
- Eigenvalues and eigenvectors
- Matrix analysis and vector calculus
- Normal forms and special matrices
- Matrix functions
With the help of these functions, performing linear algebra tasks in a MATLAB environment is really easy.