Book Image

Mastering SQL Server 2014 Data Mining

By : Amarpreet Singh Bassan, Debarchan Sarkar
Book Image

Mastering SQL Server 2014 Data Mining

By: Amarpreet Singh Bassan, Debarchan Sarkar

Overview of this book

<p>Whether you are new to data mining or are a seasoned expert, this book will provide you with the skills you need to successfully create, customize, and work with Microsoft Data Mining Suite. Starting with the basics, this book will cover how to clean the data, design the problem, and choose a data mining model that will give you the most accurate prediction.</p> <p>Next, you will be taken through the various classification models such as the decision tree data model, neural network model, as well as Naïve Bayes model. Following this, you'll learn about the clustering and association algorithms, along with the sequencing and regression algorithms, and understand the data mining expressions associated with each algorithm. With ample screenshots that offer a step-by-step account of how to build a data mining solution, this book will ensure your success with this cutting-edge data mining system.</p>
Table of Contents (17 chapters)
Mastering SQL Server 2014 Data Mining
Credits
About the Authors
About the Reviewers
www.PacktPub.com
Preface
Index

The feature selection


The feature selection requires a special notice because it helps in the selection of the attributes that will be effective in the analysis. It also helps in the rejection of the attributes that are either too noisy or are very insignificant in the analysis. Hence, the feature selection mostly involves the selection of appropriate attributes. A default feature selection is done automatically based on the model selected, the data types of the attributes, and the parameters (if any) that might be set up while designing the model. Every attribute that is designated to be a part of the model is assigned a score; the score threshold is also modifiable. The feature selection comprises of various methods that depends on whether the data is continuous or discrete. For continuous data, we use the interestingness score to select the columns that are more closely related, whereas for discrete methods, we use Shannon's entropy, Bayesian algorithm with K2 Prior, and the Bayesian...