Book Image

Go Machine Learning Projects

By : Xuanyi Chew
Book Image

Go Machine Learning Projects

By: Xuanyi Chew

Overview of this book

Go is the perfect language for machine learning; it helps to clearly describe complex algorithms, and also helps developers to understand how to run efficient optimized code. This book will teach you how to implement machine learning in Go to make programs that are easy to deploy and code that is not only easy to understand and debug, but also to have its performance measured. The book begins by guiding you through setting up your machine learning environment with Go libraries and capabilities. You will then plunge into regression analysis of a real-life house pricing dataset and build a classification model in Go to classify emails as spam or ham. Using Gonum, Gorgonia, and STL, you will explore time series analysis along with decomposition and clean up your personal Twitter timeline by clustering tweets. In addition to this, you will learn how to recognize handwriting using neural networks and convolutional neural networks. Lastly, you'll learn how to choose the most appropriate machine learning algorithms to use for your projects with the help of a facial detection project. By the end of this book, you will have developed a solid machine learning mindset, a strong hold on the powerful Go toolkit, and a sound understanding of the practical implementations of machine learning algorithms in real-world projects.
Table of Contents (12 chapters)

Variables

This is a variable declaration:

var a int

It says a is an int. That means a can contain any value that has the type int. Typical int would be like 0, 1, 2, and so on and so forth. It may seem odd to read the previous sentence, but typically is used correctly. All values of type int are typically int.

This is a variable declaration, followed by putting a value into the variable:

s := "Hello World"

Here, we're saying, define s as a string, and let the value be "Hello World". The := syntax can only be used within a function body. The main reason for this is not to cause the programmer to have to type var s string = "Hello World".

A note about the use of variables: variables in Go should be thought of as buckets with a name on them, in that they hold values. The names are important insofar as they inform the readers about the values they are supposed to hold. However, names do not necessarily have to cross barriers. I frequently name my return values with retVal, but give it a different name elsewhere. A concrete example is shown:

 func foo(...) (retVal int) { ... return retVal }
func main() {
something := foo()
...
}

I have taught programming and ML for a number of years now, and I believe that this is a hump every programmer has got to get over. Frequently students or junior team members may get confused by the difference in naming. They would rather prefer something like this:

 func foo(...) (something int) { ... return something }
func main() {
something := foo()
...
}

This is fine. However again, speaking strictly from experience, this tends to dampen the ability to think abstractly, which is a useful skill to have, especially in ML. My advice is, get used to using different names, it makes you think more abstractly.

In particular, names do not persist past what my friend James Koppel calls an abstraction barrier. What is an abstraction barrier? A function is an abstraction barrier. Whatever happens inside the function body, happens inside the function body and cannot be accessed by other things in the language. Therefore if you name a value fooBar inside the function body, the meaning of fooBar is only valid within the function.

Later we will see another form of abstraction barrier—the package.

Values

A value is what a program deals with. If you wrote a calculator program, then the values of the program are numbers. If you wrote a text search program, then the values are strings.

The programs we deal with nowadays as programmers are much more complicated than calculators. We deal with different types of values, ranging from number types (int, float64, and so on) to text (string).

A variable holds a value:

var a int = 1

The preceding line indicates that a is a variable that holds an int with the value 1. We've seen previous examples with the "Hello World" string.

Types

Like all major programming languages (yes, including Python and JavaScript), values in Go are typed. Unlike Python or JavaScript however, Go's functions and variables are also typed, and strongly so. What this means is that the following code will cause the program not to compile:

var a int
a = "Hello World"

This sort of behavior is known outside the academic world as strongly-typed. Within academic circles, strongly-typed is generally meaningless.

Go allows programmers to define their own types too:

 type email string

Here, we're defining a new type email. The underlying kind of data is a string.

Why would you want to do this? Consider this function:

 func emailSomeone(address, person string) { ... }

If both are string, it would be very easy to make a mistake—we might accidentally do something like this:

var address, person string
address = "John Smith"
person = "[email protected]"
emailSomeone(address, person)

In fact, you could even do this: emailSomeone(person, address) and the program would still compile correctly!

Imagine, however, if emailSomeone is defined thus:

func emailSomeone(address email, person string) {...}

Then the following will fail to compile:

var address email
var person string
person = "John Smith"
address = "[email protected]"
emailSomeone(person, address)

This is a good thing—it prevents bad things from happening. No more shall be said on this matter.

Go also allows programmers to define their own complex types:

type Record struct {
Name string
Age int
}

Here, we defined a new type called Record. It's a struct that contains two values: Name of type string and Age of type int.

What is a struct? Simply put, a struct is a data structure. The Name and Age in Record are called the fields of the struct.

A struct, if you come from Python, is equivalent to a tuple, but acts as a NamedTuple, if you are familiar with those. The closest equivalent in JavaScript is that it's an object. Likewise the closest equivalent in Java is that it's a plain old Java object. The closest equivalent in C# would be a plain old CLR object. In C++, the equivalent would be plain old data.

Note my careful use of the words closest equivalent and equivalent. The reason why I have delayed introduction to struct is because in most modern languages that the reader is likely to come from, it may have some form of Java-esque object orientation. A struct is not a class. It's just a definition of how data is arranged in the CPU. Hence the comparison with Python's tuples instead of Python's classes, or even Python's new data classes.

Given a value that is of type Record, one might want to extract its inner data. This can be done as so:

 r := Record {
Name: "John Smith",
Age: 20,
}
r.Name

The snippet here showcases a few things:

  • How to write a struct—kinded value—simply write the name of the type, and then fill in the fields.
  • How to read the fields of a struct—the .Name syntax is used.

Throughout this book, I shall use .FIELDNAME as a notation to get the field name of a particular data structure. It is expected that the reader is able to understand which data structure I am talking about from context. Occasionally I may use a full term, like r.Name, to make it clear which fields I am talking about.

Methods

Let's say we wrote these functions, and we have defined email as before:

 type email string

func check(a email) { ... }
func send(a email, msg string) { ... }

Observe that email is always the first type in the function parameters.

Calling the functions look something like this:

e := "[email protected]"
check(e)
send(e, "Hello World")

We may want to make that into a method of the email type. We can do so as follows:

type email string

func (e email) check() { ... }
func (e email) send(msg string) { ... }

(e email) is called the receiver of the method.

Having defined the methods thus, we may then proceed to call them:

e := "[email protected]"
e.check()
e.send("Hello World")

Observe the difference between the functions and methods. check(e) becomes e.check(). send(e, "Hello World") becomes e.send("Hello World"). What's the difference other than syntactic difference? The answer is, not much.

A method in Go is exactly the same as a function in Go, with the receiver of the method as the first parameter of the function. It is unlike methods of classes in object-oriented programming languages.

So why bother with methods? For one, it solves the expression problem quite neatly. To see how, we'll look at the feature of Go that ties everything together nicely: interfaces.

Interfaces

An interface is a set of methods. We can define an interface by listing out the methods it's expected to support. For example, consider the following interface:

var a interface {
check()
}

Here we are defining a to be a variable that has the type interface{ check() }. What on earth does that mean?

It means that you can put any value into a, as long as the value has a type that has a method called check().

Why is this valuable? It's valuable when considering multiple types that do similar things. Consider the following:

 type complicatedEmail struct {...}

func (e complicatedEmail) check() {...}
func (e complicatedEmail) send(a string) {...}

type simpleEmail string

func (e simpleEmail) check() {...}
func (e simpleEmail) send(a string) {...}

Now we want to write a function do, which does two things:

  • Check that an email address is correct
  • Send "Hello World" to the email

You would need two do functions:

func doC(a complicatedEmail) {
a.check()
a.send("Hello World")
}

func doS(a simpleEmail) {
a.check()
a.send("Hello World")
}

Instead, if that's all the bodies of the functions are, we may opt to do this:

func do(a interface{
check()
send(a string)
}) {
a.check()
a.send("Hello World")
}

This is quite hard to read. So let's give the interface a name:

type checkSender interface{
check()
send(a string)
}

Then we can simply redefine do to be the following:

func do(a checkSender) {
a.check()
a.send("Hello World")
}

A note on naming interfaces in Go. It is customary to name interfaces with a -er suffix. If a type implements check(), then the interface name should be called checker. This encourages the interfaces to be small. An interface should only define a small number of methods—larger interfaces are signs of poor program design.

Packages and imports

Finally, we come to the concept of packages and imports. For the majority of the book, the projects described live in something called a main package. The main package is a special package. Compiling a main package will yield an executable file that you can run.

Having said that, it's also often a good idea to organize your code into multiple packages. Packages are a form of abstraction barrier that we discussed previously with regards to variables and names. Exported names are accessible from outside the package. Exported fields of structs are also accessible from outside the package.

To import a package, you need to invoke an import statement at the top of the file:

package main
import "PACKAGE LOCATION"

Throughout this book I will be explicit in what to import, especially with external libraries that cannot be found in the Go standard library. We will be using a number of those, so I will be explicit.

Go enforces code hygiene. If you import a package and don't use it, your program will not compile. Again, this is a good thing as it makes it less likely to confuse yourself at a later point in time. I personally use a tool called goimports to manage my imports for me. Upon saving my file, goimports adds the import statements for me, and removes any unused packages from my import statements.

To install goimports, run the following command in your Terminal:

go get golang.org/x/tools/cmd/goimports