Book Image

Mastering SQL Server 2014 Data Mining

By : Amarpreet Singh Bassan, Debarchan Sarkar
Book Image

Mastering SQL Server 2014 Data Mining

By: Amarpreet Singh Bassan, Debarchan Sarkar

Overview of this book

<p>Whether you are new to data mining or are a seasoned expert, this book will provide you with the skills you need to successfully create, customize, and work with Microsoft Data Mining Suite. Starting with the basics, this book will cover how to clean the data, design the problem, and choose a data mining model that will give you the most accurate prediction.</p> <p>Next, you will be taken through the various classification models such as the decision tree data model, neural network model, as well as Naïve Bayes model. Following this, you'll learn about the clustering and association algorithms, along with the sequencing and regression algorithms, and understand the data mining expressions associated with each algorithm. With ample screenshots that offer a step-by-step account of how to build a data mining solution, this book will ensure your success with this cutting-edge data mining system.</p>
Table of Contents (17 chapters)
Mastering SQL Server 2014 Data Mining
Credits
About the Authors
About the Reviewers
www.PacktPub.com
Preface
Index

Troubleshooting the data mining structure performance


There can be many reasons why this might happen; in our case, if we make all the columns as input and try to generate the data mining structure, it will definitely take a long time because there will be a large number of trees or clusters to be generated depending on the algorithm that we are using. We will discuss a few of the commonly used algorithms and the parameters that can be altered to improve the processing performance.

The Decision Tree algorithm

The information that is required to classify data will proportionately increase with the increase in the input values. Therefore, there is a need to optimize the performance. The performance can be optimized by the following aspects:

  • Reducing the number of inputs

  • While grouping the items into bins, group only those values that provide the maximum information

We want to reduce the tree growth while trying not to lose the consistency and accuracy of the model. The following parameters help...