Haskell High Performance Programming

Haskell High Performance Programming

By : Samuli Thomasson

Buy this Book

Haskell High Performance Programming

By: Samuli Thomasson

Buy this Book

Overview of this book

Haskell, with its power to optimize the code and its high performance, is a natural candidate for high performance programming. It is especially well suited to stacking abstractions high with a relatively low performance cost. This book addresses the challenges of writing efficient code with lazy evaluation and techniques often used to optimize the performance of Haskell programs. We open with an in-depth look at the evaluation of Haskell expressions and discuss optimization and benchmarking. You will learn to use parallelism and we'll explore the concept of streaming. We’ll demonstrate the benefits of running multithreaded and concurrent applications. Next we’ll guide you through various profiling tools that will help you identify performance issues in your program. We’ll end our journey by looking at GPGPU, Cloud and Functional Reactive Programming in Haskell. At the very end there is a catalogue of robust library recommendations with code samples. By the end of the book, you will be able to boost the performance of any app and prepare it to stand up to real-world punishment.

Haskell High Performance Programming

Credits

About the Author

About the Reviewer

www.PacktPub.com

Preface

Free Chapter

Identifying Bottlenecks

Meeting lazy evaluation

Memoization and CAFs

Recursion and accumulators

Inspecting time and space usage

Compiler code optimizations

Summary

Choosing the Correct Data Structures

Annotating strictness and unpacking datatype fields

Handling numerical data

Handling binary and textual data

Handling sequential data

Handling tabular data

Handling sparse data

Ephemeral data structures

Working with monads and monad stacks

Summary

Profile and Benchmark to Your Heart's Content

Profiling time and allocations

Heap profiling

Benchmarking using the criterion library

Profile and monitor in real time

Summary

The Devil's in the Detail

The anatomy of a Haskell project

Erroring and handling exceptions

Writing tests for Haskell

Trivia at term-level

Trivia at type-level

Useful GHC extensions

Summary

Parallelize for Performance

Primitive parallelism and the Runtime System

The Eval monad and strategies

The Par monad and schedules

Diagnosing parallelism – ThreadScope

Data parallel programming – Repa

Summary

I/O and Streaming

Reading, writing, and handling resources

Streaming with side-effects

Logging in Haskell

Summary

Concurrency and Performance

Threads and concurrency primitives

Software Transactional Memory

Runtime System and threads

Asynchronous processing

Lifting up from I/O

Summary

Tweaking the Compiler and Runtime System (GHC)

Using GHC like a pro

Tuning GHC's Runtime System

Summary of useful GHC options

Summary of useful RTS options

Summary

GHC Internals and Code Generation

Interpreting GHC's internal representations

Primitive GHC-specific features

Datatype generic programming

Generating Haskell with Haskell

Summary

Foreign Function Interface

From Haskell to C and C to Haskell

Data marshal and stable pointers

Summary

Programming for the GPU with Accelerate

Writing Accelerate programs

Running with the CUDA backend

More Accelerate concepts

Summary

Scaling to the Cloud with Cloud Haskell

Processes and message-passing

Handling failure

Nodes and networking

Summary

Functional Reactive Programming

The tiny discrete-time Elerea

Events and signal functions with Yampa

Reactive-banana – Safe and simple semantics

Combining events and behaviors

Summary

Library Recommendations

Representing data

Functional graphs

Numeric data for special use

Encoding and serialization

Persistent storage, SQL, and NoSQL

Networking and HTTP

Cryptography

Web technologies

Parsing and pretty-printing

Pretty-printing and text formatting

Control and utility libraries

Working with monads and transformers

Handling exceptions

Random number generators

Parallel and concurrent programming

Functional Reactive Programming

Mathematics, statistics, and science

Tools for research and sketching

The HaskellR project

Creating charts and diagrams

Scripting and CLI applications

Testing and benchmarking

Summary

Index

Customer Reviews

5 star

4 star

3 star

2 star

1 star

Compiler code optimizations

Haskell compilers perform aggressive optimization transformations on code. GHC optimization passes are highly sophisticated, so much that one rarely needs to worry about performance. We have seen some of the effects of ghc -O1 in our examples so far; in all cases,-O1increased performance relative to no optimizations, or -Onot, and in some optimizations passes were the difference between constant and exponential complexity.

Inlining and stream fusion

GHC performs aggressive inlining, which simply means rewriting a function call with the function's definition. Because all values in Haskell are referentially transparent, any function can be inlined within the scope of its definition. Especially in loops, inlining improves performance drastically. The GHC inliner does inlining within a module, but also to some extent cross-module and cross-package.

Some rules of thumb regarding inlining:

If a definition is only used once, and isn't exported, it will always be inlined.
When a function body is small, it will almost certainly be inlined no matter where or how often it is used.
Bigger functions may be inlined cross-module. To ensure that foo is always inlined, add a {-# INLINE foo #-} pragma near the definition of foo.

With these easy rules, you rarely need to worry about problems from bad inlining. For completeness's sake, there is also a NOINLINE pragma which ensures a definition is never inlined. NOINLINE is mostly used for hacks that would break referential transparency; see Chapter 4, The Devil's in the Detail.

Another powerful technique is stream fusion. Behind that fancy name is just a bunch of equations that are used to perform code rewriting (see Chapter 4, The Devil's in the Detail for the technicalities).

When working with lists, you may be tempted to rewrite code like this:

map f . map g . map h

Rather than to use intermediate lists:

map (f . g . h)

But there is no other reason than cosmetics to do this, because with optimizations GHC performs stream fusion, after which both expressions are time- and space-equivalent. Stream fusion is also performed for other structures than [], which we will take a look at in the next chapter.

Polymorphism performance

In principle, (ad hoc) polymorphic programs should carry a performance cost. To evaluate a polymorphic function, a dictionary must be passed in, which contains the specializations for the type specified on the caller side. However, almost always GHC can fill in the dictionary already at compile time, reducing the cost of polymorphism to zero. The big and obvious exception is code that uses reflection (Typeable). Also, some sufficiently complex polymorphic code might defer the dictionary passing to runtime, although, most of the time you can expect a zero cost.

Either way, it might ease your mind to have some notion of the cost of dictionary passing in runtime. Let's write a program with both general and specialized versions of the same function, compile it without optimizations, and compare the performance. Our program will just iterate a simple calculation with double-precision values:

-- file: class_performance.hs

class Some a where
    next :: a -> a -> a

instance Some Double where
    next a b = (a + b) / 2

goGeneral :: Some a => Int -> a -> a
goGeneral 0 x = x
goGeneral n x = goGeneral (n-1) (next x x)

goSpecialized :: Int -> Double -> Double
goSpecialized 0 x = x
goSpecialized n x = goSpecialized (n-1) (next' x x)

next' :: Double -> Double -> Double
next' a b = (a + b) / 2

I compiled and ran both versions separately with their own main entry points using the following command lines:

ghc class_performance.hs
time ./class_performance +RTS -s

On my machine, with 5,000,000 iterations, the general version does 1.09 GB of allocation and takes 3.4s. The specialized version does 1.01 GB of allocation and runs in about 3.2s. So the extra memory cost was about 8%, which is considerable. But by enabling optimizations, both versions will have exactly the same performance.

Partial functions

Here's a puzzle: given the following definition, which is faster, partial or total?

partialHead :: [a] -> a
partialHead (x:_) = x

totalHead :: [a] -> Maybe a
totalHead []    = Nothing
totalHead (x:_) = Just x

partial = print $ partialHead [1..]

total = print $ case totalHead [1..] of
                  Nothing -> 1
                    Just n -> n

The total variant uses a head that wraps its result inside a new data constructor, whereas the partial one results in a crash when a case is not matched, but in exchange doesn't perform any extra wrapping. Surely the partial variant must be faster, right? Well, almost always it is not. Both functions have exactly the same time and space requirements.

Partial functions are justified in some situations, but performance is rarely if ever one of them. In the example, the Maybe-wrapper of total will have a zero performance cost. The performance cost of the case analysis will be left, however, but a similar analysis is done in the partial variant too; the error case must be handled anyway, so that the program can exit gracefully. Of course, even GHC is not a silver bullet and you should always keep in mind that it might miss some optimizations. If you absolutely need to rely on certain optimizations to take place, you should test your program to confirm the correct results.

Haskell High Performance Programming

By : Samuli Thomasson

Haskell High Performance Programming

By: Samuli Thomasson

Overview of this book

Related Content you might be interested in

Current Title:

Haskell High Performance Programming

Compiler code optimizations

Inlining and stream fusion

Polymorphism performance

Partial functions