-
Book Overview & Buying
-
Table Of Contents
Mastering Predictive Analytics with Python
By :
From quarterly financial projections to customer surveys, analytics help businesses to make decisions and plan for the future. While data visualizations such as pie charts and trend lines using spreadsheet programs have been used for decades, recent years have seen a growth in both the volume and diversity of data sources available to the business analyst and the sophistication of tools used to interpret this information.
The rapid growth of the Internet, through e-commerce and social media platforms, has generated a wealth of data, which is available faster than ever before for analysis. Photographs, search queries, and online forum posts are all examples of unstructured data that can't be easily examined in a traditional spreadsheet program. With the proper tools, these kinds of data offer new insights, in conjunction with or beyond traditional data sources.
Traditionally, data such as historical customer records appear in a structured, tabular form that is stored in an electronic data warehouse and easily imported into a spreadsheet program. Even in the case of such tabular data, the volume of records and the rate at which they are available are increasing in many industries. While the analyst might have historically transformed raw data through interactive manipulation, robust analytics increasingly requires automated processing that can scale with the volume and velocity of data being received by a business.
Along with the data itself, the methods used to examine it have become more powerful and complex. Beyond summarizing historical patterns or projecting future events using trend lines derived from a few key input variables, advanced analytics emphasizes the use of sophisticated predictive modeling (see the goals of predictive analytics, as follows) to understand the present and forecast near and long-term outcomes.
Diverse methods for generating such predictions typically require the following common elements:
While predictive modeling techniques can be used in powerful analytic applications to discover complex relationships between seemingly unrelated inputs, they also present a new set of challenges to the business analyst:
In this book, we will show you how to address these challenges by developing analytic solutions that transform data into powerful insights for you and your business. The main tasks involved in building these applications are:
Throughout this volume, we will use open-source tools written in the Python programming language to build these sorts of applications. Why Python? The Python language strikes an attractive balance between robust compiled languages such as Java, C++, and Scala, and pure statistical packages such as R, SAS, or MATLAB. We can work interactively with Python using the command line (or, as we will use in subsequent chapters, browser-based notebook environments), plotting data, and prototyping commands. Python also provides extensive libraries, allowing us to transform this exploratory work into web applications (such as Flask, CherryPy, and Celery, as we will see in Chapter 8, Sharing Models with Prediction Services), or scale them to large datasets (using PySpark, as we will explore in future chapters). Thus we can both analyze data and develop software applications within the same language.
Before diving into the technical details of these tools, let's take a high-level look at the concepts behind these applications and how they are structured. In this chapter, we will:
The goals of predictive analytics
The term predictive analytics, along with others such as data mining and machine learning, are often used to describe the techniques used in this book to build analytic solutions. However, it is important to keep in mind that there are two distinct goals these methods can address. Inference involves building models in order to evaluate the significance of a parameter on an outcome and emphasizes interpretation and transparency over predictive performance. For example, the coefficients of a regression model (Chapter 4, Connecting the Dots with Models – Regression Methods) can be used to estimate the effect of variation in a particular model input (for example, customer age or income) on an output variable (for example, sales). The predictions from a model developed for inference may be less accurate than other techniques, but provide valuable conceptual insights that may guide business decisions. Conversely, prediction emphasizes the accuracy of the estimated outcome, even if the model itself is a black box where the connection between an input and the resulting output is not always clear. For example, Deep Learning (Chapter 7, Learning from the Bottom Up – Deep Networks and Unsupervised Features) can produce state-of-the-art models and extremely accurate predictions from complex sets of inputs, but the connection between the input parameters and the prediction may be hard to interpret.
Change the font size
Change margin width
Change background colour