Mastering Concurrency in Go

Mastering Concurrency in Go

By : Nathan Kozyra

Buy this Book

Mastering Concurrency in Go

By: Nathan Kozyra

Buy this Book

Overview of this book

<p>This book will take you through the history of concurrency, how Go utilizes it, how Go differs from other languages, and the features and structures of Go's concurrency core. Each step of the way, the book will present real, usable examples with detailed descriptions of the methodologies used. By the end, you will feel comfortable designing a safe, data-consistent, high-performance concurrent application in Go.</p> <p>The focus of this book is on showing you how Go can be used to program high-performance, robust concurrent programs with Go's unique form of multithreading, which employs goroutines that communicate and synchronize across channels. Designed for any curious developer or systems administrator with an interest in fast, non-blocking, and resource-thrifty systems applications, this book is an invaluable resource to help you understand Go's powerful concurrency focus.</p>

Mastering Concurrency in Go

Credits

About the Author

About the Reviewers

www.PacktPub.com

Preface

Free Chapter

An Introduction to Concurrency in Go

Introducing goroutines

Implementing the defer control mechanism

Understanding goroutines versus coroutines

Implementing channels

Closures and goroutines

Building a web spider using goroutines and channels

Summary

Understanding the Concurrency Model

Understanding the working of goroutines

Synchronous versus asynchronous goroutines

Visualizing concurrency

RSS in action

A little bit about CSP

Go and the actor model

Object orientation

Using concurrency

Managing threads

Using sync and mutexes to lock data

Summary

Developing a Concurrent Strategy

Applying efficiency in complex concurrency

Identifying race conditions with race detection

Synchronizing our concurrent operations

The project – multiuser appointment calendar

A multiuser Appointments Calendar

A note on style

A note on immutability

Summary

Data Integrity in an Application

Getting deeper with mutexes and sync

The cost of goroutines

Working with files

Getting low – implementing C

Distributed Go

Some common consistency models

Using memcached

Summary

Locks, Blocks, and Better Channels

Understanding blocking methods in Go

Cleaning up goroutines

Creating channels of channels

Pprof – yet another awesome tool

Handling deadlocks and errors

Summary

C10K – A Non-blocking Web Server in Go

Attacking the C10K problem

Building our C10K web server

Serving pages

Multithreading and leveraging multiple cores

Exploring our web server

Summary

Performance and Scalability

High performance in Go

Using the App Engine

Distributed Go

Some helpful libraries

Memory preservation

Summary

Concurrent Application Architecture

Designing our concurrent application

Identifying our requirements

Using NoSQL as a data store in Go

Monitoring filesystem changes

Managing logfiles

Handling configuration files

Detecting file changes

Backing up our files

Designing our web interface

Reverting a file's history – command line

Checking the health of our server

Summary

Logging and Testing Concurrency in Go

Handling errors and logging

Using the log4go package for robust logging

Using the runtime package for granular stack traces

Summary

Advanced Concurrency and Best Practices

Going beyond the basics with channels

Building workers

Implementing nil channel blocks

Implementing more granular control over goroutines with tomb

Timing out with channels

Building a load balancer with concurrent patterns

Choosing unidirectional and bidirectional channels

Using an indeterminate channel type

Using Go with unit testing

Using Google App Engine

Utilizing best practices

Summary

Index

Customer Reviews

5 star

4 star

3 star

2 star

1 star

Building a web spider using goroutines and channels

Let's take the largely useless capitalization application and do something practical with it. Here, our goal is to build a rudimentary spider. In doing so, we'll accomplish the following tasks:

Read five URLs
Read those URLs and save the contents to a string
Write that string to a file when all URLs have been scanned and read

These kinds of applications are written every day, and they're the ones that benefit the most from concurrency and non-blocking code.

It probably goes without saying, but this is not a particularly elegant web scraper. For starters, it only knows a few start points—the five URLs that we supply it. Also, it's neither recursive nor is it thread-safe in terms of data integrity.

That said, the following code works and demonstrates how we can use channels and the select statements:

package main

import(
  "fmt"
  "io/ioutil"
  "net/http"
  "time"
)

var applicationStatus bool
var urls []string
var urlsProcessed int
var foundUrls []string
var fullText string
var totalURLCount int
var wg sync.WaitGroup

var v1 int

First, we have our most basic global variables that we'll use for the application state. The applicationStatus variable tells us that our spider process has begun and urls is our slice of simple string URLs. The rest are idiomatic data storage variables and/or application flow mechanisms. The following code snippet is our function to read the URLs and pass them across the channel:

func readURLs(statusChannel chan int, textChannel chan string) {

  time.Sleep(time.Millisecond * 1)
  fmt.Println("Grabbing", len(urls), "urls")
  for i := 0; i < totalURLCount; i++ {

    fmt.Println("Url", i, urls[i])
    resp, _ := http.Get(urls[i])
    text, err := ioutil.ReadAll(resp.Body)

    textChannel <- string(text)

    if err != nil {
      fmt.Println("No HTML body")
    }

    statusChannel <- 0

  }

}

The readURLs function assumes statusChannel and textChannel for communication and loops through the urls variable slice, returning the text on textChannel and a simple ping on statusChannel. Next, let's look at the function that will append scraped text to the full text:

func addToScrapedText(textChannel chan string, processChannel chan bool) {

  for {
    select {
    case pC := <-processChannel:
      if pC == true {
        // hang on
      }
      if pC == false {

        close(textChannel)
        close(processChannel)
      }
    case tC := <-textChannel:
      fullText += tC

    }

  }

}

We use the addToScrapedText function to accumulate processed text and add it to a master text string. We also close our two primary channels when we get a kill signal on our processChannel. Let's take a look at the evaluateStatus() function:

func evaluateStatus(statusChannel chan int, textChannel chan string, processChannel chan bool) {

  for {
    select {
    case status := <-statusChannel:

      fmt.Print(urlsProcessed, totalURLCount)
      urlsProcessed++
      if status == 0 {

        fmt.Println("Got url")

      }
      if status == 1 {

        close(statusChannel)
      }
      if urlsProcessed == totalURLCount {
        fmt.Println("Read all top-level URLs")
        processChannel <- false
        applicationStatus = false

      }
    }

  }
}

At this juncture, all that the evaluateStatus function does is determine what's happening in the overall scope of the application. When we send a 0 (our aforementioned ping) through this channel, we increment our urlsProcessed variable. When we send a 1, it's a message that we can close the channel. Finally, let's look at the main function:

func main() {
  applicationStatus = true
  statusChannel := make(chan int)
  textChannel := make(chan string)
  processChannel := make(chan bool)
  totalURLCount = 0

  urls = append(urls, "http://www.mastergoco.com/index1.html")
  urls = append(urls, "http://www.mastergoco.com/index2.html")
  urls = append(urls, "http://www.mastergoco.com/index3.html")
  urls = append(urls, "http://www.mastergoco.com/index4.html")
  urls = append(urls, "http://www.mastergoco.com/index5.html")

  fmt.Println("Starting spider")

  urlsProcessed = 0
  totalURLCount = len(urls)

  go evaluateStatus(statusChannel, textChannel, processChannel)

  go readURLs(statusChannel, textChannel)

  go addToScrapedText(textChannel, processChannel)

  for {
    if applicationStatus == false {
      fmt.Println(fullText)
      fmt.Println("Done!")
      break
    }
    select {
    case sC := <-statusChannel:
      fmt.Println("Message on StatusChannel", sC)

    }
  }

}

This is a basic extrapolation of our last function, the capitalization function. However, each piece here is responsible for some aspect of reading URLs or appending its respective content to a larger variable.

In the following code, we created a sort of master loop that lets you know when a URL has been grabbed on statusChannel:

  for {
    if applicationStatus == false {
      fmt.Println(fullText)
      fmt.Println("Done!")
      break
    }
    select {
      case sC := <- statusChannel:
        fmt.Println("Message on StatusChannel",sC)

    }
  }

Often, you'll see this wrapped in go func() as part of a WaitGroup struct, or not wrapped at all (depending on the type of feedback you require).

The control flow, in this case, is evaluateStatus, which works as a channel monitor that lets us know when data crosses each channel and ends execution when it's complete. The readURLs function immediately begins reading our URLs, extracting the underlying data and passing it on to textChannel. At this point, our addToScrapedText function takes each sent HTML file and appends it to the fullText variable. When evaluateStatus determines that all URLs have been read, it sets applicationStatus to false. At this point, the infinite loop at the bottom of main() quits.

As mentioned, a crawler cannot come more rudimentary than this, but seeing a real-world example of how goroutines can work in congress will set us up for safer and more complex examples in the coming chapters.

Mastering Concurrency in Go

By : Nathan Kozyra

Mastering Concurrency in Go

By: Nathan Kozyra

Overview of this book

Related Content you might be interested in

Current Title:

Mastering Concurrency in Go

Building a web spider using goroutines and channels