Sign In Start Free Trial
Account

Add to playlist

Create a Playlist

Modal Close icon
You need to login to use this feature.
  • Book Overview & Buying Training Systems Using Python Statistical Modeling
  • Table Of Contents Toc
Training Systems Using Python Statistical Modeling

Training Systems Using Python Statistical Modeling

By : Curtis Miller
1 (1)
close
close
Training Systems Using Python Statistical Modeling

Training Systems Using Python Statistical Modeling

1 (1)
By: Curtis Miller

Overview of this book

Python's ease-of-use and multi-purpose nature has made it one of the most popular tools for data scientists and machine learning developers. Its rich libraries are widely used for data analysis, and more importantly, for building state-of-the-art predictive models. This book is designed to guide you through using these libraries to implement effective statistical models for predictive analytics. You’ll start by delving into classical statistical analysis, where you will learn to compute descriptive statistics using pandas. You will focus on supervised learning, which will help you explore the principles of machine learning and train different machine learning models from scratch. Next, you will work with binary prediction models, such as data classification using k-nearest neighbors, decision trees, and random forests. The book will also cover algorithms for regression analysis, such as ridge and lasso regression, and their implementation in Python. In later chapters, you will learn how neural networks can be trained and deployed for more accurate predictions, and understand which Python libraries can be used to implement them. By the end of this book, you will have the knowledge you need to design, build, and deploy enterprise-grade statistical models for machine learning using Python and its rich ecosystem of libraries for predictive analytics.
Table of Contents (9 chapters)
close
close

Diving into Bayesian analysis

Welcome to the first section on Bayesian analysis. This section discusses the basic concepts used in Bayesian statistics. This branch of statistics often involves classical statistics and requires more knowledge of mathematics and probability, but it seems to be popular in computer science. This section will get you up to speed with what you need to know to understand and perform Bayesian statistics.

All Bayesian statistics are based on Bayes' theorem; in Bayesian statistics, we consider an event or parameter as a random variable. For example, suppose that we're talking about a parameter; we give a prior distribution to the parameter, and a likelihood of observing a certain outcome given the value of the parameter. Bayes' theorem lets us compute the posterior distribution of the parameter, which we can use to reach conclusions about it. The following formula shows Bayes' theorem:

All Bayesian statistics are an exercise in applying this theorem. The α symbol means proportional to, that is, that the two sides differ by a multiplicative factor.

How Bayesian analysis works

I assume that we are interested in the value of a parameter, such as the mean or proportion. We start by giving this parameter a prior distribution quantifying our beliefs about where the parameter is located, based on what we believe about it before collecting data. There are lots of ways to pick the prior; for example, we could pick an uninformative prior that says little about a parameter's value. Alternatively, we could use a prior that gives beliefs based on, say, previous studies, therefore biasing the value of the parameter to these values.

Then, we collect data and use it to compute the posterior distribution of the parameter, which is our updated belief about its location after seeing new evidence. This posterior distribution is then used to answer all our questions about the parameter's location. Note that the posterior distribution will answer all questions with probabilities. This means that we don't say whether the parameter is in a particular region or not, but the probability that it is located in that region instead. In general, the posterior distribution is difficult to compute. Often, we need to rely on computationally intensive methods such as Monte Carlo simulation to estimate posterior quantities. So, let's examine a very simple example of Bayesian analysis.

Using Bayesian analysis to solve a hit-and-run

In this case, we're going to be solving a hit-and-run. In a certain city, 95% of cabs are owned by the Yellow Cab Company, and 5% are owned by Green Cab, Inc. Recently, a cab was involved in a hit-and-run accident, injuring a pedestrian. A witness saw the accident and claimed that the cab that hit the pedestrian was a green cab. Tests by investigators revealed that, under similar circumstances, this witness is correctly able to identify a green cab 90% of the time and correctly identify a yellow cab 85% of the time. This means that they incorrectly call a yellow cab a green cab 15% of the time, and incorrectly call a green cab a yellow cab 10% of the time. So, the question is, should we pursue Green Cab, Inc.?

The following formula shows Bayes' theorem:

Here, H represents the event that a green cab hit the pedestrian, while G represents the event that the witness claims to have seen a green cab.

So, let's encode these probabilities, as follows:

Now that we have these probabilities, we can use Bayes' theorem to compute the posterior probability, which is given as follows:

So, the result is that the prior probability that the cab was actually green was 0.05, which was very low. The posterior probability, that is, the probability that the cab that hit the pedestrian was green, given that the witness said the cab was green, is now 32%, which is higher than that number, but it is still less than 50%.

Additionally, considering that this city consists only of yellow cabs and green cabs, this indicates that even though the witness saw a green cab or claimed to have seen a green cab, there are too few green cabs and the witness is not accurate enough to override how few green cabs there are. This means that it's still more likely that the pedestrian was hit by a yellow cab and that the witness made a mistake.

Now, let's take a look at some useful applications of Bayesian analysis. We will go through topics similar to those seen previously, but from the Bayesian perspective.

CONTINUE READING
83
Tech Concepts
36
Programming languages
73
Tech Tools
Icon Unlimited access to the largest independent learning library in tech of over 8,000 expert-authored tech books and videos.
Icon Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.
Icon 50+ new titles added per month and exclusive early access to books as they are being written.
Training Systems Using Python Statistical Modeling
notes
bookmark Notes and Bookmarks search Search in title playlist Add to playlist download Download options font-size Font size

Change the font size

margin-width Margin width

Change margin width

day-mode Day/Sepia/Night Modes

Change background colour

Close icon Search
Country selected

Close icon Your notes and bookmarks

Confirmation

Modal Close icon
claim successful

Buy this book with your credits?

Modal Close icon
Are you sure you want to buy this book with one of your credits?
Close
YES, BUY

Submit Your Feedback

Modal Close icon
Modal Close icon
Modal Close icon