Book Image

Hands-On High Performance with Go

By : Bob Strecansky
Book Image

Hands-On High Performance with Go

By: Bob Strecansky

Overview of this book

Go is an easy-to-write language that is popular among developers thanks to its features such as concurrency, portability, and ability to reduce complexity. This Golang book will teach you how to construct idiomatic Go code that is reusable and highly performant. Starting with an introduction to performance concepts, you’ll understand the ideology behind Go’s performance. You’ll then learn how to effectively implement Go data structures and algorithms along with exploring data manipulation and organization to write programs for scalable software. This book covers channels and goroutines for parallelism and concurrency to write high-performance code for distributed systems. As you advance, you’ll learn how to manage memory effectively. You’ll explore the compute unified device architecture (CUDA) application programming interface (API), use containers to build Go code, and work with the Go build cache for quicker compilation. You’ll also get to grips with profiling and tracing Go code for detecting bottlenecks in your system. Finally, you’ll evaluate clusters and job queues for performance optimization and monitor the application for performance regression. By the end of this Go programming book, you’ll be able to improve existing code and fulfill customer requirements by writing efficient programs.
Table of Contents (20 chapters)
Section 1: Learning about Performance in Go
Section 2: Applying Performance Concepts in Go
Section 3: Deploying, Monitoring, and Iterating on Go Programs with Performance in Mind

Understanding performance in computer science

Performance in computer science is a measure of work that can be accomplished by a computer system. Performant code is vital to many different groups of developers. Whether you're part of a large-scale software company that needs to quickly deliver masses of data to customers, an embedded computing device programmer who has limited computing resources available, or a hobbyist looking to squeeze more requests out of the Raspberry Pi that you are using for your pet project, performance should be at the forefront of your development mindset. Performance matters, especially when your scale continues to grow.

It is important to remember that we are sometimes limited by physical bounds. CPU, memory, disk I/O, and network connectivity all have performance ceilings based on the hardware that you either purchase or rent from a cloud provider. There are other systems that may run concurrently alongside our Go programs that can also consume resources, such as OS packages, logging utilities, monitoring tools, and other binaries—it is prudent to remember that our programs are very frequently not the only tenants on the physical machines they run on.

Optimized code generally helps in many ways, including the following:

  • Decreased response time: The total amount of time it takes to respond to a request.
  • Decreased latency: The time delay between a cause and effect within a system.
  • Increased throughput: The rate at which data can be processed.
  • Higher scalability: More work can be processed within a contained system.

There are many ways to service more requests within a computer system. Adding more individual computers (often referred to as horizontal scaling) or upgrading to more powerful computers (often referred to as vertical scaling) are common practices used to handle demand within a computer system. One of the fastest ways to service more requests without needing additional hardware is to increase code performance. Performance engineering acts as a way to help with both horizontal and vertical scaling. The more performant your code is, the more requests you can handle on a single machine. This pattern can potentially result in fewer or less expensive physical hosts to run your workload. This is a large value proposition for many businesses and hobbyists alike, as it helps to drive down the cost of operation and improves the end user experience.

A brief note on Big O notation

Big O notation ( is commonly used to describe the limiting behavior of a function based on the size of the inputs. In computer science, Big O notation is used to explain how efficient algorithms are in comparison to one another—we'll discuss this more in detail in Chapter 2, Data Structures and Algorithms. Big O notation is important in optimizing performance because it is used as a comparison operator in explaining how well algorithms will scale. Understanding Big O notation will help you to write more performant code, as it will help to drive performance decisions in your code as the code is being composed. Knowing at what point different algorithms have relative strengths and weaknesses helps you to determine the correct choice for the implementation at hand. We can't improve what we can't measure—Big O notation helps us to give a concrete measurement to the problem statement at hand.

Methods to gauge long term performance

As we make our performance improvements, we will need to continually monitor our changes to view impact. Many methods can be used to monitor the long-term performance of computer systems. A couple of examples of these methods would be the following:

We will discuss these concepts further in Chapter 15, Comparing Code Quality Across Versions. These paradigms help us to make smart decisions about the performance optimizations in our code as well as avoid premature optimization. Premature optimization plays as a very crucial aspect for many a computer programmers. Very frequently, we have to determine what fast enough is. We can waste our time trying to optimize a small segment of code when many other code paths have an opportunity to improve from a performance perspective. Go's simplicity allows for additional optimization without cognitive load overhead or an increase in code complexity. The algorithms that we will discuss in Chapter 2, Data Structures and Algorithms, will help us to avoid premature optimization.

Optimization strategies overview

In this book, we will also attempt to understand what exactly we are optimizing for. The techniques for optimizing for CPU or memory utilization may look very different than optimizing for I/O or network latency. Being cognizant of your problem space as well as your limitations within your hardware and upstream APIs will help you to determine how to optimize for the problem statement at hand. Optimization also often shows diminishing returns. Frequently the return on development investment for a particular code hotspot isn't worthwhile based on extraneous factors, or adding optimizations will decrease readability and increase risk for the whole system. If you can determine whether an optimization is worth doing early on, you'll be able to have a more narrowly scoped focus and will likely continue to develop a more performant system.

It can be helpful to understand baseline operations within a computer system. Peter Norvig, the Director of Research at Google, designed a table (the image that follows) to help developers understand the various common timing operations on a typical computer (

Having a clear understanding of how different parts of a computer can interoperate with one another helps us to deduce where our performance optimizations should lie. As derived from the table, it takes quite a bit longer to read 1 MB of data sequentially from disk versus sending 2 KBs over a 1 Gbps network link. Being able to have back-of-the-napkin math comparison operators for common computer interactions can very much help to deduce which piece of your code you should optimize next. Determining bottlenecks within your program becomes easier when you take a step back and look at a snapshot of the system as a whole.

Breaking down performance problems into small, manageable sub problems that can be improved upon concurrently is a helpful shift into optimization. Trying to tackle all performance problems at once can often leave the developer stymied and frustrated, and often lead to many performance efforts failing. Focusing on bottlenecks in the current system can often yield results. Fixing one bottleneck will often quickly identify another. For example, after you fix a CPU utilization problem, you may find that your system's disk can't write the values that are computed fast enough. Working through bottlenecks in a structured fashion is one of the best ways to create a piece of performant and reliable software.

Optimization levels

Starting at the bottom of the pyramid in the following image, we can work our way up to the top. This diagram shows a suggested priority for making performance optimizations. The first two levels of this pyramidthe design level and algorithm and data structures levelwill often provide more than ample real-world performance optimization targets. The following diagram shows an optimization strategy that is often efficient. Changing the design of a program alongside the algorithms and data structures are often the most efficient places to improve the speed and quality of code bases:

Design-level decisions often have the most measurable impact on performance. Determining goals during the design level can help to determine the best methodology for optimization. For example, if we are optimizing for a system that has slow disk I/O, we should prioritize lowering the number of calls to our disk. Inversely, if we are optimizing for a system that has limited compute resources, we need to calculate only the most essential values needed for our program's response. Creating a detailed design document at the inception of a new project will help with understanding where performance gains are important and how to prioritize time within the project. Thinking from a perspective of transferring payloads within a compute system can often lead to noticing places where optimization can occur. We will talk more about design patterns in Chapter 3, Understanding Concurrency.

Algorithm and data structure decisions often have a measurable performance impact on a computer program. We should focus on trying to utilize constant O(1), logarithmic O(log n), linear O(n), and log-linear O(n log n) functions while writing performant code. Avoiding quadratic complexity O(n2) at scale is also important for writing scalable programs. We will talk more about O notation and its relation to Go in Chapter 2, Data Structures and Algorithms.