Chapter 9: Outlier Detection: Just Because They're Odd Doesn't Mean They're Unimportant

Book Overview & Buying
Table Of Contents

Data Smart

By : John W. Foreman

Data Smart

By: John W. Foreman

Overview of this book

Data Science gets thrown around in the press like it's magic. Major retailers are predicting everything from when their customers are pregnant to when they want a new pair of Chuck Taylors. It's a brave new world where seemingly meaningless data can be transformed into valuable insight to drive smart business decisions. But how does one exactly do data science? Do you have to hire one of these priests of the dark arts, the "data scientist," to extract this gold from your data? Nope. Data science is little more than using straight-forward steps to process raw data into actionable insight. And in Data Smart, author and data scientist John Foreman will show you how that's done within the familiar environment of a spreadsheet. Why a spreadsheet? It's comfortable! You get to look at the data every step of the way, building confidence as you learn the tricks of the trade. Plus, spreadsheets are a vendor-neutral place to learn data science without the hype. But don't let the Excel sheets fool you. This is a book for those serious about learning the analytic techniques, math and the magic, behind big data.

Free Chapter

Cover

Credits

About the Author

About the Technical Editors

Acknowledgments

Introduction

What Am I Doing Here?

A Workable Definition of Data Science

But Wait, What about Big Data?

Who Am I?

Who Are You?

No Regrets. Spreadsheets Forever

Conventions

Let's Get Going

Chapter 1: Everything You Ever Needed to Know about Spreadsheets but Were Too Afraid to Ask

Some Sample Data

Moving Quickly with the Control Button

Copying Formulas and Data Quickly

Formatting Cells

Paste Special Values

Inserting Charts

Locating the Find and Replace Menus

Formulas for Locating and Pulling Values

Using VLOOKUP to Merge Data

Filtering and Sorting

Using PivotTables

Using Array Formulas

Solving Stuff with Solver

OpenSolver: I Wish We Didn't Need This, but We Do

Wrapping Up

Chapter 2: Cluster Analysis Part I: Using K-Means to Segment Your Customer Base

Girls Dance with Girls, Boys Scratch their Elbows

Getting Real: K-Means Clustering Subscribers in E-mail Marketing

K-Medians Clustering and Asymmetric Distance Measurements

Wrapping Up

Chapter 3: Naïve Bayes and the Incredible Lightness of Being an Idiot

When You Name a Product Mandrill, You're Going to Get Some Signal and Some Noise

The World's Fastest Intro to Probability Theory

Using Bayes Rule to Create an AI Model

Let's Get This Excel Party Started

Wrapping Up

Chapter 4: Optimization Modeling: Because That “Fresh Squeezed” Orange Juice Ain't Gonna Blend Itself

Why Should Data Scientists Know Optimization?

Starting with a Simple Trade-Off

Fresh from the Grove to Your Glass…with a Pit Stop through a Blending Model

Modeling Risk

Wrapping Up

Chapter 5: Cluster Analysis Part II: Network Graphs and Community Detection

What Is a Network Graph?

Visualizing a Simple Graph

Brief Introduction to Gephi

Building a Graph from the Wholesale Wine Data

How Much Is an Edge Worth? Points and Penalties in Graph Modularity

Let's Get Clustering!

There and Back Again: A Gephi Tale

Wrapping Up

Chapter 6: The Granddaddy of Supervised Artificial Intelligence—Regression

Wait, What? You're Pregnant?

Don't Kid Yourself

Predicting Pregnant Customers at RetailMart Using Linear Regression

Predicting Pregnant Customers at RetailMart Using Logistic Regression

For More Information

Wrapping Up

Chapter 7: Ensemble Models: A Whole Lot of Bad Pizza

Using the Data from Chapter 6

Bagging: Randomize, Train, Repeat

Boosting: If You Get It Wrong, Just Boost and Try Again

Wrapping Up

Chapter 8: Forecasting: Breathe Easy, You Can't Win

The Sword Trade Is Hopping

Getting Acquainted with Time Series Data

Starting Slow with Simple Exponential Smoothing

You Might Have a Trend

Holt's Trend-Corrected Exponential Smoothing

Multiplicative Holt-Winters Exponential Smoothing

Wrapping Up

Chapter 9: Outlier Detection: Just Because They're Odd Doesn't Mean They're Unimportant

Outliers Are (Bad?) People, Too

The Fascinating Case of Hadlum v. Hadlum

Terrible at Nothing, Bad at Everything

Wrapping Up

Chapter 10: Moving From Spreadsheets into R

Getting Up and Running with R

Doing Some Actual Data Science

Wrapping Up

Conclusion

Where Am I? What Just Happened?

Before You Go-Go

Get Creative and Keep in Touch!

End User License Agreement

Data Smart

By : John W. Foreman

Data Smart

By: John W. Foreman

Overview of this book

Confirmation

Buy this book with your credits?

Submit Your Feedback

Create a Free Account To Continue Reading

Sign in to activate your 7-day free access