Book Image

Learning SciPy for Numerical and Scientific Computing

By : Francisco J. Blanco-Silva
Book Image

Learning SciPy for Numerical and Scientific Computing

By: Francisco J. Blanco-Silva

Overview of this book

<p>It's essential to incorporate workflow data and code from various sources in order to create fast and effective algorithms to solve complex problems in science and engineering. Data is coming at us faster, dirtier, and at an ever increasing rate. There is no need to employ difficult-to-maintain code, or expensive mathematical engines to solve your numerical computations anymore. SciPy guarantees fast, accurate, and easy-to-code solutions to your numerical and scientific computing applications.<br /><br />"Learning SciPy for Numerical and Scientific Computing" unveils secrets to some of the most critical mathematical and scientific computing problems and will play an instrumental role in supporting your research. The book will teach you how to quickly and efficiently use different modules and routines from the SciPy library to cover the vast scope of numerical mathematics with its simplistic practical approach that's easy to follow.<br /><br />The book starts with a brief description of the SciPy libraries, showing practical demonstrations for acquiring and installing them on your system. This is followed by the second chapter which is a fun and fast-paced primer to array creation, manipulation, and problem-solving based on these techniques.<br /><br />The rest of the chapters describe the use of all different modules and routines from the SciPy libraries, through the scope of different branches of numerical mathematics. Each big field is represented: numerical analysis, linear algebra, statistics, signal processing, and computational geometry. And for each of these fields all possibilities are illustrated with clear syntax, and plenty of examples. The book then presents combinations of all these techniques to the solution of research problems in real-life scenarios for different sciences or engineering — from image compression, biological classification of species, control theory, design of wings, to structural analysis of oxides.</p>
Table of Contents (15 chapters)

SciPy organization


SciPy is organized as a family of modules. We like to think of each module as a different field of mathematics. And as such, each has its own particular techniques and tools. The following is an exhaustive list of the different modules in SciPy:

scipy.constants

scipy.cluster

scipy.fftpack

scipy.integrate

scipy.interpolate

scipy.io

scipy.lib

scipy.linalg

scipy.misc

scipy.optimize

scipy.signal

scipy.sparse

scipy.spatial

scipy.special

scipy.stats

scipy.weave

The names of the modules are mostly self explanatory. For instance, the field of statistics deals with the study of the collection, organization, analysis, interpretation, and presentation of data. The objects with which statisticians deal for their research are usually represented as arrays of multiple dimensions. The result of certain operations on these arrays then offers information about the objects they represent (for example, the mean and standard deviation of a dataset). A well-known set of applications is based upon these operations; confidence intervals for the mean, hypothesis testing, or data mining, for instance. When facing any research problem that needs any tool of this branch of mathematics, we access the corresponding functions from the scipy.stats module.

Let us use some of its functions to solve a simple problem.

The following table shows the IQ test scores of 31 individuals:

114

100

104

89

102

91

114

114

103

105

108

130

120

132

111

128

118

119

86

72

111

103

74

112

107

103

98

96

112

112

93

 

A stem plot of the distribution of these 31 scores shows that there are no major departures from normality, and thus we assume the distribution of the scores to be close to normal. Estimate the mean IQ score for this population, using a 99 percent confidence interval.

We start by loading the data into memory, as follows:

>>> scores=numpy.array([114, 100, 104, 89, 102, 91, 114, 114, 103, 105, 108, 130, 120, 132, 111, 128, 118, 119, 86, 72, 111, 103, 74, 112, 107, 103, 98, 96, 112, 112, 93])

At this point, if we type scores followed by a dot [.], and press the Tab key, the system offers us all possible methods inherited by the data from the NumPy library, as it is customary in Python. Technically, we could compute at this point the required mean, xmean, and corresponding confidence interval according to the formula, xmean ± zcrit * sigma / sqrt(n), where sigma and n are respectively the standard deviation and size of the data, and zcrit is the critical value corresponding to the confidence. In this case, we could look up a table on any statistics book to obtain a crude approximation to its value, zcrit = 2.576. The remaining values may be computed in our session and properly combined, as follows:

>>>xmean = numpy.mean(scores)
>>> sigma = numpy.std(scores)
>>> n = numpy.size(scores)
>>>xmean, xmean - 2.576*sigma /numpy.sqrt(n), \
... xmean + 2.756*sigma / numpy.sqrt(n)
(105.83870967741936, 99.343223715529746, 112.78807276397517)

We have thus computed the estimated mean IQ score (with value 105.83870967741936) and the interval of confidence (from about 99.34 to approximately 112.79). We have done so using purely NumPy-based operations, while following a known formula. But instead of making all these computations by hand, and looking for critical values on tables, we could directly ask SciPy for assistance.

Note how the scipy.stats module needs to be loaded before we use any of its functions, or request any help on them:

>>> from scipy import stats
>>> result=scipy.stats.bayes_mvs(scores)

The variable result contains the solution of our problem, and some more information. Note first that result is a tuple with three entries, as the help documentation suggests the following:

>>> help(scipy.stats.bayes_mvs)

This gives us the following output:

The solution to our problem is then the first entry of the tuple result. To show the contents of this entry, we request it as usual:

>>> result[0]
(105.83870967741936, (98.789863768428674, 112.88755558641004))

Note how this output gives us the same average, but a slightly different confidence interval. This is, of course, more accurate than the one we computed in the previous steps.