Book Image

Machine Learning With Go

Book Image

Machine Learning With Go

Overview of this book

The mission of this book is to turn readers into productive, innovative data analysts who leverage Go to build robust and valuable applications. To this end, the book clearly introduces the technical aspects of building predictive models in Go, but it also helps the reader understand how machine learning workflows are being applied in real-world scenarios. Machine Learning with Go shows readers how to be productive in machine learning while also producing applications that maintain a high level of integrity. It also gives readers patterns to overcome challenges that are often encountered when trying to integrate machine learning in an engineering organization. The readers will begin by gaining a solid understanding of how to gather, organize, and parse real-work data from a variety of sources. Readers will then develop a solid statistical toolkit that will allow them to quickly understand gain intuition about the content of a dataset. Finally, the readers will gain hands-on experience implementing essential machine learning techniques (regression, classification, clustering, and so on) with the relevant Go packages. Finally, the reader will have a solid machine learning mindset and a powerful Go toolkit of techniques, packages, and example implementations.
Table of Contents (11 chapters)

SQL-like databases

Although there is a good bit of hype around interesting NoSQL databases and key-value stores, SQL-like databases are still ubiquitous. Every data scientist will, at some point, be processing data from an SQL-like database, such as Postgres, MySQL, or SQLite.

For example, we may be required to query one or more tables in a Postgres database to generate a set of features for model training. After using that model to make predictions or identify anomalies, we may send results to another database table that drives a dashboard or other reporting tool.

Go, of course, interacts nicely with all the popular data stores, such as SQL, NoSQL, key-value, and so on, but here, we will focus on SQL-like interactions. We will utilize database/sql for these interactions throughout the book.

Connecting to an SQL database

The first thing we need do before connecting to an SQL-like database is identify the particular database that we will be interacting with and import a corresponding driver. In the following examples, we will be connecting to a Postgres database and will utilize the github.com/lib/pq database driver for database/sql. This driver can be loaded via an empty import (with a corresponding comment):

import (
"database/sql"
"fmt"
"log"
"os"

// pq is the library that allows us to connect
// to postgres with databases/sql.
_ "github.com/lib/pq"
)

Now let's assume that you have exported the Postgres connection string to an environmental variable PGURL. We can easily create an sql.DB value for our connection via the follow code:

// Get the postgres connection URL. I have it stored in
// an environmental variable.
pgURL := os.Getenv("PGURL")
if pgURL == "" {
log.Fatal("PGURL empty")
}

// Open a database value. Specify the postgres driver
// for databases/sql.
db, err := sql.Open("postgres", pgURL)
if err != nil {
log.Fatal(err)
}
defer db.Close()

Note that we need to defer the close method on this value. Also, note that creating this value does not mean that you have made a successful connection to the database. This is merely a value used by database/sql to connect to the database when triggered to do so by certain operations (such as a query).

To ensure that we can make a successful connection to the database, we can use the Ping method:

if err := db.Ping(); err != nil {
log.Fatal(err)
}

Querying the database

Now that we know how to connect to the database, let's see how we can get data out of the database. We won't cover the specifics of SQL queries and statements in this book. If you are not familiar with SQL, I would highly recommend that you learn how to query, insert, and so on, but for our purposes here, you should know that there are basically two types of operations we want to perform as related to SQL databases:

  • A Query operation selects, groups, or aggregates data in the database and returns rows of data to us
  • An Exec operation updates, inserts, or otherwise modifies the state of the database without an expectation that portions of the data stored in the database should be returned

As you might expect, to get data out of our database, we will use a Query operation. To do this, we need to query the database with an SQL statement string. For example, imagine we have a database storing a bunch of iris flower measurements (petal length, petal width, and so on), we could query some of that data related to a particular iris species as follows:

// Query the database.
rows, err := db.Query(`
SELECT
sepal_length as sLength,
sepal_width as sWidth,
petal_length as pLength,
petal_width as pWidth
FROM iris
WHERE species = $1`, "Iris-setosa")
if err != nil {
log.Fatal(err)
}
defer rows.Close()

Note that this returns a pointer to an sql.Rows value, and we need to defer the closing of this rows value. Then we can loop over our rows and parse the data into values of expected type. We utilize the Scan method on rows to parse out the columns returned by the SQL query and print them to standard out:

// Iterate over the rows, sending the results to
// standard out.
for rows.Next() {

var (
sLength float64
sWidth float64
pLength float64
pWidth float64
)

if err := rows.Scan(&sLength, &sWidth, &pLength, &pWidth); err != nil {
log.Fatal(err)
}

fmt.Printf("%.2f, %.2f, %.2f, %.2f\n", sLength, sWidth, pLength, pWidth)
}

Finally, we need to check for any errors that might have occurred while processing our rows. We want to maintain the integrity of our data handling, and we cannot assume that we looped over all the rows without encountering an error:

// Check for errors after we are done iterating over rows.
if err := rows.Err(); err != nil {
log.Fatal(err)
}

Modifying the database

As mentioned earlier, there is another flavor of interaction with the database called Exec. With these types of statements, we are concerned with updating, adding to, or otherwise modifying the state of one or more tables in the database. We use the same type of database connection, but instead of calling db.Query, we will call db.Exec.

For example, let's say we want to update some of the values in our iris database table:

// Update some values.
res, err := db.Exec("UPDATE iris SET species = 'setosa' WHERE species = 'Iris-setosa'")
if err != nil {
log.Fatal(err)
}

But how do we know whether we were successful and changed something? Well, the res function returned here allows us to see how many rows of our table were affected by our update:

// See how many rows where updated.
rowCount, err := res.RowsAffected()
if err != nil {
log.Fatal(err)
}

// Output the number of rows to standard out.
log.Printf("affected = %d\n", rowCount)