Book Image

F# High Performance

By : Eriawan Kusumawardhono
Book Image

F# High Performance

By: Eriawan Kusumawardhono

Overview of this book

F# is a functional programming language and is used in enterprise applications that demand high performance. It has its own unique trait: it is a functional programming language and has OOP support at the same time. This book will help you make F# applications run faster with examples you can easily break down and take into your own work. You will be able to assess the performance of the program and identify bottlenecks. Beginning with a gentle overview of concurrency features in F#, you will get to know the advanced topics of concurrency optimizations in F#, such as F# message passing agent of MailboxProcessor and further interoperation with .NET TPL. Based on this knowledge, you will be able to enhance the performance optimizations when implementing and using other F# language features. The book also covers optimization techniques by using F# best practices and F# libraries. You will learn how the concepts of concurrency and parallel programming will help in improving the performance. With this, you would be able to take advantage of multi-core processors and track memory leaks, root causes, and CPU issues. Finally, you will be able to test their applications to achieve scalability.
Table of Contents (15 chapters)
F# High Performance
Credits
About the Author
About the Reviewer
www.PacktPub.com
Customer Feedback
Preface

Common samples of misunderstood concurrent problems


Many of us, when dealing with concurrent problems, sometimes try to use a hammer for every nail. There is no silver bullet for all of the problems of implementing concurrency.

It is also recommended to understand concurrency, as concurrency is now becoming more relevant because of the many core models in the releases of modern microprocessors (or simply processors) in the last 7 years. This fact is also becoming a trend as the clock speed of the latest processors has been usually limited to 3.2 GHz for the last 3 years.

Microsoft's Visual C++ architect, Herb Sutter, has written a very thorough article in the form of a whitepaper famously known as The Free Lunch Is Over

http://www.gotw.ca/publications/concurrency-ddj.htm

Let's understand first what concurrency is and the F# supports.

Introduction to concurrency in F#

Before we dive deeper into concurrency in F#, we should understand the definition of concurrency.

Concurrency is one of the main disciplines of computer science and it is still one of the main problems of computations.

Simply defined, concurrency is the composition of the order of independent process units or partially-ordered process units that can be executed in parallel or not in parallel, but not in sequential order. The term order in this context means ordered as sequentially.

The following diagram illustrates the concept of sequential (not concurrent) in action:

Process 1 to Process 4 as shown in the preceding diagram is executed sequentially step by step. Process 2 must wait for Process 1 to be completed first, as do Process 3 and Process 4.

This sequence is also called a synchronous process or is simply referred to as being synchronous.

The following figure is a sample illustration of a parallel concurrency combination of parallel and synchronous processes:

Processes  1A, 2A, and 3A run in parallel, although each parallel lane has its own sequence of processes that are run sequentially.

The term parallel means that it is not just executing simultaneously in parallel, but parallel also means that it may run on many processors or on many cores, as is common in modern processors that have multiple cores.

Defining asynchronous

A simple definition of asynchronous means not synchronous. This means that if we have an asynchronous flow, the process is not run synchronously.

These are the implications of an asynchronous flow:

  • Processes run not sequentially. For example, if the first process is running asynchronously, the next process doesn't have to wait for the first process to be completed.

  • There has to be a way of scheduling and telling the scheduler to inform that the asynchronous process is completed. Typically, the asynchronous process is usually related to blocking I/O or some long computations.

  • At first, the processes may look sequential, but the next process run may not be sequential at all.

This is a sample case of asynchronous: a customer is going to have dinner in a restaurant. The flows are:

  1. Customer A orders some food or drinks, and the order is noted by waiter X. Usually, most restaurants have more than one waiter, but for this illustration, the waiter available currently to serve customer A is waiter X.

  2. Waiter X then gives the list of the customer's order to chef Y.

  3. Chef Y accepts the order, and checks if he is currently occupied or not. If he is occupied, the order is registered as part of his cooking queue. Otherwise, he will start to cook the order.

  4. The waiter does not have to wait for the chef to complete his cooking. He can then serve other customers who have just arrived or there might be customers that want to add more food or drinks as well.

  5. Chef Y finishes his cooking for customer A, and then gives a notification to waiter X to inform that his cooking for customer A is finished. Or he can inform all waiters to tell that the cooking for customer A is finished. This concept of informing to tell a process is finished is commonly called a callback.

  6. Waiter X (or any other waiter) delivers the finished food to customer A.

The asynchronous model that uses a notification to inform that a process is completed is called asynchronous callback.

The result returned at the end of the execution later (or in the future) is called a Future. It is also the future, in a sense, when many processes are executed in parallel, having results later.

This is the official documentation of Future in MSDN Library: 

https://msdn.microsoft.com/en-us/library/ff963556.aspx

For I/O operations, such as printing a document, we cannot determine whether the printing is successful or not, so the notification of the end process is not available. We can implement an asynchronous operation on I/O, and the fact that there is no observable notification of this is why this asynchronous model is called the asynchronous fire and forget model.

Misunderstood concurrency problems

Many developers, even seasoned or experienced developers, still think that concurrency and parallel programming are different. Actually, parallel programming is just one member within the concurrency discipline, together with the differentiation of asynchronous and synchronous processing models.

This is also one of the most misunderstood concurrency concepts or problems, and there are many more regarding how we approach concurrency.

These are some common organized sample cases of misunderstood concurrency problems:

  1. Assuming all concurrent problems can be solved using parallel programming.

    Fact: Not all concurrent problems are easily solved with parallelism.

  2. Assuming all implementation of asynchronous is asynchronous.

    Fact: This depends on how we implement async; sometimes the execution of an async construct is executed synchronously.

  3. Ignoring blocking threads such as I/O.

    Fact: Blocking I/O threads should be handled asynchronously; otherwise, the current thread is always waiting indefinitely until the I/O thread is finished.

  4. The synchronized lock is blocking.

    Fact: The lock is not a blocking thread.

  5. Relying on the CPU speed.

    Fact: The CPU speed increase is becoming less of an issue. The research and development of modern CPUs is focusing on multiple core CPUs.

A few sample cases of concurrent problems are mentioned as follows:

The case samples of the first case are:

  • Ordering or sorting a collection: Ordering is by default a sequential process, and it requires iterating all the elements of the collection. Therefore, it's useless to use parallelism.

  • Grouping data: Grouping data is implicitly one of the sequential processes; it is also quite useless to use parallelism.

  • Printing reports: Printing is part of I/O and I/O is intrinsically without support for parallelism. Unless the I/O is part of I/O parallelism, it is useless to use parallelism in this context.

Sample cases of the second case are listed as follows:

  • Mixing Parallel.For that has F# async in it. The implications of having Parallel.For is by default taking a multiple core or a CPU to run it is not the same as running asynchronously, as it is not guaranteed to run as a combined async in parallel.

  • Using Thread.Sleep instead of Async.Sleep to signify a wait operation. The call to Thread.Sleep will instead make the flow synchronous, as the Sleep method simply puts on hold the current thread as a delay synchronously.

Note

RAID array in the storage I/O is one of the best samples of parallelism in I/O. It stores data in parallel across multiple disks. It is faster than common I/O because data is stored in parts (not whole data to a disk) to several disks in parallel.

The third case is related to all of the I/O operations including sending data to a printer and saving large data into a disk. These operations are always blocking threads.

For the case of lock, Microsoft has issued official statements that lock in .NET used by C# and VB is executed without any interruption, and it only locks an object until it has finished executing the block in the synchronized lock. It's still allowing other threads to run without waiting for the thread that has the lock to finish.

This is the official thread synchronization of C# and VB in MSDN: 

https://msdn.microsoft.com/en-us/library/ms173179.aspx

It is recommended to always check online the MSDN Library of the .NET class library, as this is always updated.

Introduction to concurrency support in .NET and F#

Concurrency support in F# is based on the existing work of concurrency support features in .NET BCL (the Base Class Library). It's also by design, since F# runs on top of .NET CLR and can use .NET BCL. F# also has its unique ways that bring more features other than just language features (for example, asynchronous computations).

The .NET BCL part of concurrency has basic support for the following:

  • Thread

  • Lock

  • Mutex

Beginning with .NET 4.0, we have the Task Parallel Library (TPL). This library makes concurrent support easier. TPL consists of the following:

  • Data parallelism (for example: Parallel.For and ForEach)

  • Task parallelism

  • Asynchronous task (this is also the base foundation of C#/VB's async-await)

  • Parallel LINQ (often abbreviated as PLINQ)

For a more complete reference of concurrency support in .NET, please visit https://msdn.microsoft.com/en-us/library/hh156548(v=vs.110).aspx.

Note

.NET has no support yet for fiber API in Win32 API. Microsoft currently has no definite plan for fiber support.

F# has its own unique features of concurrency supports. They are:

  • Asynchronous workflow or computation

  • MailboxProcessor

  • Parallel async

  • Parallel async combined with I/O

More on concurrency support in F# is available in Chapter 4, Introduction to Concurrency in F# and Chapter 5, Advanced Concurrency Support in F#.

Now it's time to dive more into some codes. To start writing F# code, we can use F# and Visual Studio combined. This includes IDE supports for F#.