Book Image

Machine Learning With Go

Book Image

Machine Learning With Go

Overview of this book

The mission of this book is to turn readers into productive, innovative data analysts who leverage Go to build robust and valuable applications. To this end, the book clearly introduces the technical aspects of building predictive models in Go, but it also helps the reader understand how machine learning workflows are being applied in real-world scenarios. Machine Learning with Go shows readers how to be productive in machine learning while also producing applications that maintain a high level of integrity. It also gives readers patterns to overcome challenges that are often encountered when trying to integrate machine learning in an engineering organization. The readers will begin by gaining a solid understanding of how to gather, organize, and parse real-work data from a variety of sources. Readers will then develop a solid statistical toolkit that will allow them to quickly understand gain intuition about the content of a dataset. Finally, the readers will gain hands-on experience implementing essential machine learning techniques (regression, classification, clustering, and so on) with the relevant Go packages. Finally, the reader will have a solid machine learning mindset and a powerful Go toolkit of techniques, packages, and example implementations.
Table of Contents (11 chapters)

JSON

In a world in which the majority of data is accessed via the web, and most engineering organizations implement some number of microservices, we are going to encounter data in JSON format fairly frequently. We may only need to deal with it when pulling some random data from an API, or it might actually be the primary data format that drives our analytics and machine learning workflows.

Typically, JSON is used when ease of use is the primary goal of data interchange. Since JSON is human readable, it is easy to debug if something breaks. Remember that we want to maintain the integrity of our data handling as we process data with Go, and part of that process is ensuring that, when possible, our data is interpretable and readable. JSON turns out to be very useful in achieving these goals (which is why it is also used for logging, in many cases).

Go offers really great JSON functionality in its standard library with encoding/json. We will utilize this standard library functionality throughout the book.

Parsing JSON

To understand how to parse (that is, unmarshal) JSON data in Go, we will be using some data from the Citi Bike API (https://www.citibikenyc.com/system-data), a bike-sharing service operating in New York City. Citi Bike provides frequently updated operational information about its network of bike sharing stations in JSON format at https://gbfs.citibikenyc.com/gbfs/en/station_status.json:

{
"last_updated": 1495252868,
"ttl": 10,
"data": {
"stations": [
{
"station_id": "72",
"num_bikes_available": 10,
"num_bikes_disabled": 3,
"num_docks_available": 26,
"num_docks_disabled": 0,
"is_installed": 1,
"is_renting": 1,
"is_returning": 1,
"last_reported": 1495249679,
"eightd_has_available_keys": false
},
{
"station_id": "79",
"num_bikes_available": 0,
"num_bikes_disabled": 0,
"num_docks_available": 33,
"num_docks_disabled": 0,
"is_installed": 1,
"is_renting": 1,
"is_returning": 1,
"last_reported": 1495248017,
"eightd_has_available_keys": false
},

etc...

{
"station_id": "3464",
"num_bikes_available": 1,
"num_bikes_disabled": 3,
"num_docks_available": 53,
"num_docks_disabled": 0,
"is_installed": 1,
"is_renting": 1,
"is_returning": 1,
"last_reported": 1495250340,
"eightd_has_available_keys": false
}
]
}
}

To parse the import and this type of data in Go, we first need to import encoding/json (along with a couple of other things from a standard library, such as net/http, because we are going to pull this data off of the previously mentioned website). We will also define struct that mimics the structure of the JSON shown in the preceding code:

import (
"encoding/json"
"fmt"
"io/ioutil"
"log"
"net/http"
)

// citiBikeURL provides the station statuses of CitiBike bike sharing stations.
const citiBikeURL = "https://gbfs.citibikenyc.com/gbfs/en/station_status.json"

// stationData is used to unmarshal the JSON document returned form citiBikeURL.
type stationData struct {
LastUpdated int `json:"last_updated"`
TTL int `json:"ttl"`
Data struct {
Stations []station `json:"stations"`
} `json:"data"`
}

// station is used to unmarshal each of the station documents in stationData.
type station struct {
ID string `json:"station_id"`
NumBikesAvailable int `json:"num_bikes_available"`
NumBikesDisabled int `json:"num_bike_disabled"`
NumDocksAvailable int `json:"num_docks_available"`
NumDocksDisabled int `json:"num_docks_disabled"`
IsInstalled int `json:"is_installed"`
IsRenting int `json:"is_renting"`
IsReturning int `json:"is_returning"`
LastReported int `json:"last_reported"`
HasAvailableKeys bool `json:"eightd_has_available_keys"`
}

Note a couple of things here: (i) we have followed Go idioms by avoiding the struct field name with underscores, but (ii) we have utilized the json struct tags to label the struct fields with the corresponding expected fields in the JSON data.

Note, to properly parse JSON data, the struct fields need to be exported fields. That is, the fields need to begin with a capital letter. encoding/json does cannot view fields using reflect unless they are exported.

Now we can get the JSON data from the URL and unmarshal it into a new stationData value. This will produce a struct variable with the respective fields filled with the data in the tagged JSON data fields. We can check it by printing out some data associated with one of the stations:

// Get the JSON response from the URL.
response, err := http.Get(citiBikeURL)
if err != nil {
log.Fatal(err)
}
defer response.Body.Close()

// Read the body of the response into []byte.
body, err := ioutil.ReadAll(response.Body)
if err != nil {
log.Fatal(err)
}

// Declare a variable of type stationData.
var sd stationData

// Unmarshal the JSON data into the variable.
if err := json.Unmarshal(body, &sd); err != nil {
log.Fatal(err)
}

// Print the first station.
fmt.Printf("%+v\n\n", sd.Data.Stations[0])

When we run this, we can see that our struct contains the parsed data from the URL:

$ go build
$ ./myprogram
{ID:72 NumBikesAvailable:11 NumBikesDisabled:0 NumDocksAvailable:25 NumDocksDisabled:0 IsInstalled:1 IsRenting:1 IsReturning:1 LastReported:1495252934 HasAvailableKeys:false}

JSON output

Now let's say that we have the Citi Bike station data in our stationData struct value and we want to save that data out to a file. We can do this with json.marshal:

// Marshal the data.
outputData, err := json.Marshal(sd)
if err != nil {
log.Fatal(err)
}

// Save the marshalled data to a file.
if err := ioutil.WriteFile("citibike.json", outputData, 0644); err != nil {
log.Fatal(err)
}