Book Image

Mastering .NET Machine Learning

By : Jamie Dixon, Damian R Mingle
Book Image

Mastering .NET Machine Learning

By: Jamie Dixon, Damian R Mingle

Overview of this book

.Net is one of the widely used platforms for developing applications. With the meteoric rise of Machine learning, developers are now keen on finding out how can they make their .Net applications smarter. Also, .NET developers are interested into moving into the world of devices and how to apply machine learning techniques to, well, machines. This book is packed with real-world examples to easily use machine learning techniques in your business applications. You will begin with introduction to F# and prepare yourselves for machine learning using .NET framework. You will be writing a simple linear regression model using an example which predicts sales of a product. Forming a base with the regression model, you will start using machine learning libraries available in .NET framework such as Math.NET, Numl.NET and Accord.NET with the help of a sample application. You will then move on to writing multiple linear regressions and logistic regressions. You will learn what is open data and the awesomeness of type providers. Next, you are going to address some of the issues that we have been glossing over so far and take a deep dive into obtaining, cleaning, and organizing our data. You will compare the utility of building a KNN and Naive Bayes model to achieve best possible results. Implementation of Kmeans and PCA using Accord.NET and Numl.NET libraries is covered with the help of an example application. We will then look at many of issues confronting creating real-world machine learning models like overfitting and how to combat them using confusion matrixes, scaling, normalization, and feature selection. You will now enter into the world of Neural Networks and move your line of business application to a hybrid scientific application. After you have covered all the above machine learning models, you will see how to deal with very large datasets using MBrace and how to deploy machine learning models to Internet of Thing (IoT) devices so that the machine can learn and adapt on the fly.
Table of Contents (18 chapters)
Mastering .NET Machine Learning
Credits
About the Author
Acknowledgments
About the Reviewers
www.PacktPub.com
Preface
Index

Preface

The .NET Framework is one of the most successful application frameworks in history. Literally billions of lines of code have been written on the .NET Framework, with billions more to come. For all of its success, it can be argued that the .NET Framework is still underrepresented for data science endeavors. This book attempts to help address this issue by showing how machine learning can be rapidly injected into the common .NET line of business applications. It also shows how typical data science scenarios can be addressed using the .NET Framework. This book quickly builds upon an introduction to machine learning models and techniques in order to build real-world applications using machine learning. While by no means a comprehensive study of predictive analytics, it does address some of the more common issues that data scientists encounter when building their models.

Many books about machine learning are written with every chapter centering around a dataset and how to implement a model on that dataset. While this is a good way to build a mental blueprint (as well as some code boilerplate), this book is going to take a slightly different approach. This book centers around introducing the same application for the line of business development and one common open data dataset for the scientific programmer. We will then introduce different machine techniques, depending on the business scenario. This means you will be putting on different hats for each chapter. If you are a line of business software engineer, Chapters 2, 3, 6, and 9 will seem like old hat. If you are a research analyst, Chapters 4, 7, and 10 will be very familiar to you. I encourage you to try all chapters, regardless of your background, as you will perhaps gain a new perspective that will make you more effective as a data scientist. As a final note, one word you will not find in this book is "simply". It drives me nuts when I read a tutorial-based book and the author says "it is simply this" or "simply do that". If it was simple, I wouldn't need the book. I hope you find each of the chapters accessible and the code samples interesting, and these two factors can help you immediately in your career.

What this book covers

Chapter 1, Welcome to Machine Learning Using the .NET Framework, contextualizes machine learning in the .NET stack, introduces some of the libraries that we will use throughout the book, and provides a brief primer to F#.

Chapter 2, AdventureWorks Regression, introduces the business that we will use in this book—AdventureWorks Bicycle company. We will then look at a business problem where customers are dropping orders based on reviews of the product. It looks at creating a linear regression by hand, using Math.NET and Accord.NET to solve this business problem. It then adds this regression to the line of business application.

Chapter 3, More AdventureWorks Regression, looks at creating a multiple linear regression and a logistic regression to solve different business problems at AdventureWorks. It will look at different factors that affect bike sales and then categorize potential customers into potential sales or potential lost leads. It will then implement the models to help our website convert potential lost leads into potential sales.

Chapter 4, Traffic Stops – Barking Up the Wrong Tree?, takes a break from AdventureWorks. You will put on your data scientist hat, use an open dataset of traffic stops, and see if we can understand why some people get a verbal warning and why others get a ticket at a traffic stop. We will use basic summary statistics and decision trees to help in understanding the results.

Chapter 5, Time Out – Obtaining Data, stops with introducing datasets and machine learning models and concentrates on one of the hardest parts of machine learning—obtaining and cleaning the data. We will look at using F# type providers as a very powerful language feature that can vastly speed up this process of "data munging".

Chapter 6, AdventureWorks Redux – k-NN and Naïve Bayes Classifiers, goes back to AdventureWorks and looks at a business problem of how to improve cross sales. We will implement two popular machine learning classification models, k-NN and Naïve Bayes, to see which is better at solving this problem.

Chapter 7, Traffic Stops and Crash Locations – When Two Datasets Are Better Than One, returns back to the traffic stop data and adds in two other open datasets that can be used to improve the predictions and gain new insights. The chapter will introduce two common unsupervised machine learning techniques: k-means and PCA.

Chapter 8, Feature Selection and Optimization, takes another break from introducing new machine learning models and looks at another key part of building machine learning models—selecting the right data for the model, preparing the data for the model, and introducing some common techniques to deal with outliers and other data abnormalities.

Chapter 9, AdventureWorks Production – Neural Networks, goes back to AdventureWorks and looks at how to improve bike production by using a popular machine learning technique called neural networks.

Chapter 10, Big Data and IoT, wraps up by looking at a more recent problem—how to build machine learning models on top of data that is characterized by massive volume, variability, and velocity. We will then look at how IoT devices can generate this big data and how to deploy machine learning models onto these devices so that they become self-learning.

What you need for this book

You will need Visual Studio 2013 (any version) or beyond installed on your computer. You can also use VS Code or Mono Develop. The examples in this book use Visual Studio 2015 Update 1.

Who this book is for

The lines between business computing and scientific computing are becoming increasingly blurred. Indeed, an argument can be made that the distinction was never really as clear as it has been made out to be in the past. With that, machine learning principles and models are making their way into mainstream computing applications. Consider the Uber app that shows how far Uber drivers are from you, and product recommendations built into online retail sites such as Jet.

Also, the nature of the .NET software developer's job is changing. Earlier, when the cliché of ours is a changing industry was being thrown around, it was about languages (need to know JavaScript, C#, and TSql) and frameworks (Angular, MVC, WPF, and EF). Now, the cliché means that the software developer needs to know how to make sure their code is correct (test-driven development), how to get their code off of their machine onto the customer's machine (DevOps), and how to make their applications smarter (machine learning).

Also, the same forces that are pushing the business developer to retool are pushing the research analyst into unfamiliar territory. Earlier, analysts focused on data collection, exploration, and visualization in the context of an application (Excel, PowerBI, and SAS) for point-in-time analysis. The analyst would start with a question, grab some data, build some models, and then present the findings. Any kind of continuous analysis was done via report writing or just re-running the models. Today, analysts are being asked to sift through massive amounts of data (IoT telemetry, user exhaust, and NoSQL data lakes), where the questions may not be known beforehand. Also, once models are created, they are pushed into production applications where they are continually being re-trained in real time. No longer just a decision aid for humans, research is being done by computers to impact users immediately.

The newly-minted data scientist title is at the confluence of these forces. Typically, no one person can be an expert on both sides of the divide, so the data scientist is a bit of a jack of all trades, master of none who knows machine learning a little bit better than all of the other software engineers on the team and knows software engineering a little bit better than any researcher on the team. The goal of this book is to help move from either software engineer or business analyst to data scientist.

Conventions

In this book, you will find a number of text styles that distinguish between different kinds of information. Here are some examples of these styles and an explanation of their meaning.

Code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles are shown as follows: "The Script1.fsx file is then added to the project."

A block of code is set as follows:

let multipliedAndIsEven = 
    ints
    |> Array.map (fun i -> multiplyByTwo i)
    |> Array.map (fun i -> isEven i)

Any command-line input or output is written as follows:

val multipliedAndIsEven : string [] =
  [|"even"; "even"; "even"; "even"; "even"; "even"|]

New terms and important words are shown in bold. Words that you see on the screen, for example, in menus or dialog boxes, appear in the text like this: "When the Add New Item dialog box appears, select Script File."

Note

Warnings or important notes appear in a box like this.

Tip

Tips and tricks appear like this.

Reader feedback

Feedback from our readers is always welcome. Let us know what you think about this book—what you liked or disliked. Reader feedback is important for us as it helps us develop titles that you will really get the most out of.

To send us general feedback, simply e-mail , and mention the book's title in the subject of your message.

If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, see our author guide at www.packtpub.com/authors.

Customer support

Now that you are the proud owner of a Packt book, we have a number of things to help you to get the most from your purchase.

Downloading the example code

You can download the example code files for this book from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.

You can download the code files by following these steps:

  1. Log in or register to our website using your e-mail address and password.

  2. Hover the mouse pointer on the SUPPORT tab at the top.

  3. Click on Code Downloads & Errata.

  4. Enter the name of the book in the Search box.

  5. Select the book for which you're looking to download the code files.

  6. Choose from the drop-down menu where you purchased this book from.

  7. Click on Code Download.

Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:

  • WinRAR / 7-Zip for Windows

  • Zipeg / iZip / UnRarX for Mac

  • 7-Zip / PeaZip for Linux

Errata

Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you find a mistake in one of our books—maybe a mistake in the text or the code—we would be grateful if you could report this to us. By doing so, you can save other readers from frustration and help us improve subsequent versions of this book. If you find any errata, please report them by visiting http://www.packtpub.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details of your errata. Once your errata are verified, your submission will be accepted and the errata will be uploaded to our website or added to any list of existing errata under the Errata section of that title.

To view the previously submitted errata, go to https://www.packtpub.com/books/content/support and enter the name of the book in the search field. The required information will appear under the Errata section.

Piracy

Piracy of copyrighted material on the Internet is an ongoing problem across all media. At Packt, we take the protection of our copyright and licenses very seriously. If you come across any illegal copies of our works in any form on the Internet, please provide us with the location address or website name immediately so that we can pursue a remedy.

Please contact us at with a link to the suspected pirated material.

We appreciate your help in protecting our authors and our ability to bring you valuable content.

Questions

If you have a problem with any aspect of this book, you can contact us at , and we will do our best to address the problem.