Book Image

Mastering High Performance with Kotlin

Book Image

Mastering High Performance with Kotlin

Overview of this book

The ease with which we write applications has been increasing, but with it comes the need to address their performance. A balancing act between easily implementing complex applications and keeping their performance optimal is a present-day requirement In this book, we explore how to achieve this crucial balance, while developing and deploying applications with Kotlin. The book starts by analyzing various Kotlin specifcations to identify those that have a potentially adverse effect on performance. Then, we move on to monitor techniques that enable us to identify performance bottlenecks and optimize performance metrics. Next, we look at techniques that help to us achieve high performance: memory optimization, concurrency, multi threading, scaling, and caching. We also look at fault tolerance solutions and the importance of logging. We'll also cover best practices of Kotlin programming that will help you to improve the quality of your code base. By the end of the book, you will have gained some insight into various techniques and solutions that will help to create high-performance applications in the Kotlin environment
Table of Contents (12 chapters)

Memory model

The memory model describes how the JVM interacts with a computer's memory. By computer memory, we mean not only Random Access Memory (RAM) but also registers and cache memory of the CPU. So we consider the memory model as a simplified abstraction of the hardware memory architecture.

We can consider the whole JVM as a model of a computer that provides the ability to run a program on a wide range of processors and operating systems.

An understanding of the Java Memory Model is important because it specifies how different threads interact in memory. Concurrent programming involves plenty of different pitfalls in synchronization between threads that have shared variables and compliance with the consistency of a sequence of operations.

The problem of concurrency and parallelism

While concurrency is executing independent subtasks out of order without affecting the final result, parallelism is the executing subtasks that are carried out simultaneously. Parallelism involves concurrency, but concurrency is not necessarily executed in a parallel manner.

The compiler feels free to reorder instructions to perform optimization. This means that there are cases in which accesses to variables, during the execution of a program, may differ from the order specified in the code. Data is moved between registers, caches, and RAM all the time. There are no requirements for the compiler to perform synchronization between threads perfectly because this would cost too much from the performance point of view. This leads to cases when different threads may read different values from the same shared variable. A simplified example of the case described here may look like this:

fun main(vars: Array<String>) {
var sharedVariableA = 0
var sharedVariableB = 0
val threadPool = Executors.newFixedThreadPool(10)
val threadA = Runnable {
sharedVariableA = 3
sharedVariableB = 4
}
val threadB = Runnable {
val localA = sharedVariableA
val localB = sharedVariableB
}
threadPool.submit(threadA)
threadPool.submit(threadB)
}

In a body of the threadB thread, the value of the localA variable is 3, and the value of the localB variable is 4. But if the compiler reorders the operations, the final values of the local variables may differ. To get a better understanding of this issue, we need some knowledge of the internal system of the Java Memory Model.

Java Memory Model (JMM)

The JMM divides the memory space between thread stacks and the heap. Each application has at least one thread, which is referred to as the main thread. When a new thread starts, a new stack is created. If an application has several threads, the simplified memory model may look like this:

The thread stack is a stack of blocks. Whenever a thread calls a method, a new block is created and added to the stack. This is also referred to as the call stack. This block contains all the local variables that were created in its scope. The local variables cannot be shared between threads, even if threads are executing the same method. A block fully stores all local variables of primitive types and references to objects. One thread can only pass copies of local primitive variables or references to another thread:

Kotlin doesn't have primitive types, in contrast to Java, but Kotlin does compile into the same bytecode as Java. And if you don't manipulate a variable in the same way as an object, then the generated bytecode will contain the variable of a primitive type:

fun main(vars: Array<String>) {

val localVariable = 0

}

The simplified generated bytecode will look like this:

public final static main([Ljava/lang/String;)V

LOCALVARIABLE localVariable I L2 L3 1

But if you specify the type of the localVariable as Nullable, as follows:

val localVariable: Int? = null

Then this variable will be represented as an object in the bytecode:

LOCALVARIABLE localVariable Ljava/lang/Integer; L2 L3 1

All objects are contained in the heap even if they're local variables. In the case of local primitive variables, they'll be destroyed automatically when the execution point of a program leaves the scope of the method. The object can be destroyed only with the GC. So the use of local primitive variables is preferable. Since the Kotlin compiler applies optimizations to variables that can be primitive, in most cases the bytecode will contain variables of primitive types.

This diagram illustrates how two threads can share the same object:

Synchronization

As we already know, the JMM is a simplified model of the hardware memory architecture. If a variable is used frequently, it can be copied to the CPU cache. If several threads have a shared variable, then several CPU caches have their own duplicate of this variable. This is needed to increase access speed to variables. The hardware memory architecture has a hierarchy that is illustrated in the following diagram:

When several caches have duplicates of a variable that's stored in the main memory, the problem with visibility of shared objects may occur. This problem is referred to as a data race. This is a case when two or more threads change the values that were copied to caches. But one thread doesn't know about changes that were applied to the copied value by another thread. And when the thread updates the original variable in the main memory, the value that was assigned to the shared object by another thread can be erased.

The following example clarifies the described case. Two threads run on two CPUs at the same time. And they have a shared object with the count variable that's been copied to caches of both CPUs. Both threads increment the copied values at the same time. But these changes aren't visible to each other because the updates haven't been flushed back to the main memory yet. The following diagram illustrates this:

To solve the problem with synchronization, you can use the volatile keyword, synchronized methods, or blocks, and so on. But all of these approaches bring overhead and make your code complex. It's better just to avoid shared mutable objects and use only immutable objects in a multithreading environment. This strategy helps keep your code simple and reliable.