Book Image

The Art of Data-Driven Business

By : Alan Bernardo Palacio
Book Image

The Art of Data-Driven Business

By: Alan Bernardo Palacio

Overview of this book

One of the most valuable contributions of data science is toward helping businesses make the right decisions. Understanding this complicated confluence of two disparate worlds, as well as a fiercely competitive market, calls for all the guidance you can get. The Art of Data-Driven Business is your invaluable guide to gaining a business-driven perspective, as well as leveraging the power of machine learning (ML) to guide decision-making in your business. This book provides a common ground of discussion for several profiles within a company. You’ll begin by looking at how to use Python and its many libraries for machine learning. Experienced data scientists may want to skip this short introduction, but you’ll soon get to the meat of the book and explore the many and varied ways ML with Python can be applied to the domain of business decisions through real-world business problems that you can tackle by yourself. As you advance, you’ll gain practical insights into the value that ML can provide to your business, as well as the technical ability to apply a wide variety of tried-and-tested ML methods. By the end of this Python book, you’ll have learned the value of basing your business decisions on data-driven methodologies and have developed the Python skills needed to apply what you’ve learned in the real world.
Table of Contents (17 chapters)
1
Part 1: Data Analytics and Forecasting with Python
4
Part 2: Market and Customer Insights
9
Part 3: Operation and Pricing Optimization

Using NumPy for statistics and algebra

NumPy is a Python library used for working with arrays. Additionally, it provides functions for working with matrices, the Fourier transform, and the area of linear algebra. Large, multi-dimensional arrays and matrices are now supported by NumPy, along with a wide range of sophisticated mathematical operations that may be performed on these arrays. They use a huge number of sophisticated mathematical functions to process massive multidimensional arrays and matrices, as well as basic scientific computations in machine learning, which makes them highly helpful. It gives the n-dimensional array, a straightforward yet effective data structure. Learning NumPy is the first step on every Python data scientist’s path because it serves as the cornerstone on which nearly all of the toolkit’s capabilities are constructed.

The array, which is a grid of values all of the same type that’s indexed by a tuple of nonnegative integers, is the fundamental building block utilized by NumPy. Similar to how the dimensions of a matrix are defined in algebra, the array’s rank is determined by its number of dimensions. A tuple of numbers indicating the size of the array along each dimension makes up the shape of an array:

import numpy as np
arr = np.array([1, 2, 3, 4, 5])
print(arr)
print(type(arr))

A NumPy array is a container that can house a certain number of elements, all of which must be of the same type, as was previously specified. The majority of data structures employ arrays to carry out their algorithms. Similar to how you can slice a list, you can also slice a NumPy array, but in more than one dimension. Similar to indexing, slicing a NumPy array returns an array that is a view of the original array.

Slicing in Python means taking elements from one given index to another given index. We can select certain elements of an array by slicing the array using [start:end], where we reference the elements of the array from where we can start and where we want to finish. We can also define the step using [start:end:step]:

print('select elements by index:',arr[0])
print('slice elements of the array:',arr[1:5])
print('ending point of the array:',arr[4:])
print('ending point of the array:',arr[:4])

There are three different sorts of indexing techniques: field access, fundamental slicing, and advanced indexing. Basic slicing is the n-dimensional extension of Python’s fundamental slicing notion. By passing start, stop, and step parameters to the built-in slice function, a Python slice object is created. Writing understandable, clear, and succinct code is made possible through slicing. An iterable element is referred to by its position within the iterable when it is “indexed.” Getting a subset of elements from an iterable, depending on their indices, is referred to as “slicing.”

To combine (concatenate) two arrays, we must copy each element in both arrays to result by using the np.concatenate() function:

arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])
arr = np.concatenate((arr1, arr2))
print(arr)

Arrays can be joined using NumPy stack methods as well. We can combine two 1D arrays along the second axis to stack them on top of one another, a process known as stacking. The stack() method receives a list of arrays that we wish to connect with the axis:

arr = np.stack((arr1, arr2), axis=1)
print(arr)

The axis parameter can be used to reference the axis over which we want to make the concatenation:

arr = np.stack((arr1, arr2), axis=0)
print(arr)

The NumPy mean() function is used to compute the arithmetic mean along the specified axis:

np.mean(arr,axis=1)

You need to use the NumPy mean() function with axis=0 to compute the average by column. To compute the average by row, you need to use axis=1:

np.mean(arr,axis=0)

In the next section, we will introduce pandas, a library for data analysis and manipulation. pandas is one of the most extensively used Python libraries in data science, much like NumPy. It offers high-performance, simple-to-use data analysis tools. In contrast to the multi-dimensional array objects provided by the NumPy library, pandas offers an in-memory 2D table object called a DataFrame.