Book Image

Machine Learning With Go

Book Image

Machine Learning With Go

Overview of this book

The mission of this book is to turn readers into productive, innovative data analysts who leverage Go to build robust and valuable applications. To this end, the book clearly introduces the technical aspects of building predictive models in Go, but it also helps the reader understand how machine learning workflows are being applied in real-world scenarios. Machine Learning with Go shows readers how to be productive in machine learning while also producing applications that maintain a high level of integrity. It also gives readers patterns to overcome challenges that are often encountered when trying to integrate machine learning in an engineering organization. The readers will begin by gaining a solid understanding of how to gather, organize, and parse real-work data from a variety of sources. Readers will then develop a solid statistical toolkit that will allow them to quickly understand gain intuition about the content of a dataset. Finally, the readers will gain hands-on experience implementing essential machine learning techniques (regression, classification, clustering, and so on) with the relevant Go packages. Finally, the reader will have a solid machine learning mindset and a powerful Go toolkit of techniques, packages, and example implementations.
Table of Contents (11 chapters)

Validation

So, now we know some ways to measure how well our model is performing. In fact, if we wanted to, we could create a super sophisticated, complicated model that could predict every observation without error. For example, we could create a model that would take the index of the row of the observation and return the exact answer for each of those rows. It might be a really big function with a lot of parameters, but it would return the correct answers.

So, what's the problem with this? Well, the problem is that it would not generalize to new data. Our complicated model would predict really well for the data that we would expose it to, but once we try some new input data (that isn't part of our training dataset), the model would likely perform poorly.

We call this type of model (that doesn't generalize) a model that has been overfit. That is, our process of...