Book Image

Mastering C# Concurrency

Book Image

Mastering C# Concurrency

Overview of this book

Starting with the traditional approach to concurrency, you will learn how to write multithreaded concurrent programs and compose ways that won't require locking. You will explore the concepts of parallelism granularity, and fine-grained and coarse-grained parallel tasks by choosing a concurrent program structure and parallelizing the workload optimally. You will also learn how to use task parallel library, cancellations, timeouts, and how to handle errors. You will know how to choose the appropriate data structure for a specific parallel algorithm to achieve scalability and performance. Further, you'll learn about server scalability, asynchronous I/O, and thread pools, and write responsive traditional Windows and Windows Store applications. By the end of the book, you will be able to diagnose and resolve typical problems that could happen in multithreaded applications.
Table of Contents (17 chapters)
Mastering C# Concurrency
Credits
About the Authors
About the Reviewers
www.PacktPub.com
Preface
Index

Spin lock


Using operating system level synchronization primitives requires quite a noticeable amount of resources, because of the context switching and all the entire corresponding overhead. Besides this, there is such thing as lock latency; that is, the time required for a lock to be notified about the state change of another lock. This means that when the current lock is being released, it takes some additional time for another lock to be signaled. This is the reason why when we need short time locks, it could be significantly faster to use a single thread without any locks than to parallelize these operations using OS level locking mechanics.

To avoid unnecessary context switches in such a situation, we can use a loop, which checks the other locks in each iteration. Since the locks should be very short, we would not use too much CPU, and we have a significant performance boost by not using the operating system resources and by lowering lock latency to the lowest amount.

This pattern is not so easy to implement, and, to be effective, you would need to use specific CPU instructions. Fortunately, there is a standard implementation of this pattern in the .NET Framework starting with version 3.5. The implementation contains the following methods and classes:

Thread.SpinWait

Thread.SpinWait just spins an infinite loop. It's like Thread.Sleep, only without context switching and using CPU time. It is used rarely in common scenarios, but could be useful in some specific cases, such as simulating real CPU work.

System.Threading.SpinWait

System.Threading.SpinWait is a structure implementing a loop with a condition check. It is used internally in spinlock implementation.

System.Threading.SpinLock

Here we will be discussing about the spinlock implementation itself.

Note that it is a structure which allows to save on class instance allocation and reduces GC overhead.

The spinlock can optionally use a memory barrier (or a memory fencing instruction) to notify other threads that the lock has been released. The default behavior is to use a memory barrier, which prevents memory access operation reordering by compiler or hardware, and improves the fairness of the lock at the expense of performance. The other case is faster, but could lead to incorrect behavior in some situations.

Usually, it's not encouraged to use a spinlock directly unless you are 100% sure what you're doing. Make sure that you have confirmed the performance bottleneck with tests and you know that your locks are really short.

The code inside a spin lock should not do the following:

  • Use regular locks, or a code that uses locks

  • Acquire more than one spinlock at a time

  • Perform dynamic dispatched calls (virtual methods, interface methods, or delegate calls)

  • Call any third-party code, which is not controlled by you

  • Perform memory allocation, including new operator usage

The following is a sample test for a spinlock:

static class Program
{
  private const int _count = 10000000;

  static void Main()
  {
    // Warm up
    var map = new Dictionary<double, double>();
    var r = Math.Sin(0.01);

    // lock
    map.Clear();
    var prm = 0d;
    var lockFlag = new object();
    var sw = Stopwatch.StartNew();
    for (int i = 0; i < _count; i++)
      lock (lockFlag)
      {
        map.Add(prm, Math.Sin(prm));
        prm += 0.01;
      }
    sw.Stop();
    Console.WriteLine("Lock: {0}ms", sw.ElapsedMilliseconds);

    // spinlock with memory barrier
    map.Clear();
    var spinLock = new SpinLock();
    prm = 0;
    sw = Stopwatch.StartNew();
    for (int i = 0; i < _count; i++)
    {
      var gotLock = false;
      try
      {
        spinLock.Enter(ref gotLock);
        map.Add(prm, Math.Sin(prm));
        prm += 0.01;
      }
      finally
      {
        if (gotLock)
          spinLock.Exit(true);
      }
    }
    sw.Stop();
    Console.WriteLine("Spinlock with memory barrier: {0}ms", sw.ElapsedMilliseconds);

    // spinlock without memory barrier
    map.Clear();
    prm = 0;
    sw = Stopwatch.StartNew();
    for (int i = 0; i < _count; i++)
    {
      var gotLock = false;
      try
      {
        spinLock.Enter(ref gotLock);
        map.Add(prm, Math.Sin(prm));
        prm += 0.01;
      }
      finally
      {
        if (gotLock)
          spinLock.Exit(false);
      }
    }
    sw.Stop();
    Console.WriteLine("Spinlock without memory barrier: {0}ms", sw.ElapsedMilliseconds);
  }
}

Executing this code on Core i7 2600K and x64 OS in Release configuration gives the following results:

Lock: 1906ms
Spinlock with memory barrier: 1761ms
Spinlock without memory barrier: 1731ms

Note that the performance boost is very small even with short duration locks. Also note that starting from .NET Framework 3.5, the Monitor, ReaderWriterLock, and ReaderWriterLockSlim classes are implemented with spinlock.

Note

The main disadvantage of spinlocks is intensive CPU usage. The endless loop consumes energy, while the blocked thread does not. However, now the standard Monitor class can use spinlock for a short time lock and then turn to usual lock, so in real world scenarios the difference would be even less noticeable than in this test.