Book Image

Mastering SQL Server 2014 Data Mining

By : Amarpreet Singh Bassan, Debarchan Sarkar
Book Image

Mastering SQL Server 2014 Data Mining

By: Amarpreet Singh Bassan, Debarchan Sarkar

Overview of this book

<p>Whether you are new to data mining or are a seasoned expert, this book will provide you with the skills you need to successfully create, customize, and work with Microsoft Data Mining Suite. Starting with the basics, this book will cover how to clean the data, design the problem, and choose a data mining model that will give you the most accurate prediction.</p> <p>Next, you will be taken through the various classification models such as the decision tree data model, neural network model, as well as Naïve Bayes model. Following this, you'll learn about the clustering and association algorithms, along with the sequencing and regression algorithms, and understand the data mining expressions associated with each algorithm. With ample screenshots that offer a step-by-step account of how to build a data mining solution, this book will ensure your success with this cutting-edge data mining system.</p>
Table of Contents (17 chapters)
Mastering SQL Server 2014 Data Mining
Credits
About the Authors
About the Reviewers
www.PacktPub.com
Preface
Index

Getting the real-world data


There are many publicly available datasets that we can download and extract information from, determine some trends, and so on. We will use the Housing Affordability Data System (HADS) dataset to get a sneak peek into the average housing conditions in 2011. The data contains details such as affordability, income, fair market rent, and so on, which will be particularly helpful to derive a few deep observations.

We can download the data from http://www.huduser.org/portal/datasets/hads/hads2011(ASCII).zip. There is only one file in the dataset, named thads2011.txt. We will create a database in the SQL Server instance of the name HousingAffordabilityData and import this file into the database with the table name thads2011. We will have to import the file as a flat file source into the SQL table. The following screenshot shows a portion of the data in a few columns:

Sample portion of the HADS data imported into SQL Server

Although we used only one year's data, that is...