Book Image

Mastering SQL Server 2014 Data Mining

By : Amarpreet Singh Bassan, Debarchan Sarkar
Book Image

Mastering SQL Server 2014 Data Mining

By: Amarpreet Singh Bassan, Debarchan Sarkar

Overview of this book

<p>Whether you are new to data mining or are a seasoned expert, this book will provide you with the skills you need to successfully create, customize, and work with Microsoft Data Mining Suite. Starting with the basics, this book will cover how to clean the data, design the problem, and choose a data mining model that will give you the most accurate prediction.</p> <p>Next, you will be taken through the various classification models such as the decision tree data model, neural network model, as well as Naïve Bayes model. Following this, you'll learn about the clustering and association algorithms, along with the sequencing and regression algorithms, and understand the data mining expressions associated with each algorithm. With ample screenshots that offer a step-by-step account of how to build a data mining solution, this book will ensure your success with this cutting-edge data mining system.</p>
Table of Contents (17 chapters)
Mastering SQL Server 2014 Data Mining
Credits
About the Authors
About the Reviewers
www.PacktPub.com
Preface
Index

The Microsoft Decision Tree algorithm


The Decision Tree algorithm is a classification and regression algorithm built for both discrete and continuous attribute predictions. The model works by building a tree with a number of splits or nodes. There is a new split or node added every time a column is found to be significantly related to the predictable columns.

The feature selection process, as explained in the previous section, is used to select the most useful attributes because the inclusion of too many attributes might cause a degradation of the performance during the processing of an algorithm and will eventually lead to low memory. When the performance is paramount, the following methods can:

  • Increase the value of the COMPLEXITY_PENALTY parameter to limit the tree growth

  • Limit the number of items in the association models to limit the number of trees built

  • Increase the value of the MINIMUM_SUPPORT parameter to avoid overfitting

  • Restrict the number of discrete values for any attribute to...