Haskell High Performance Programming

Haskell High Performance Programming

By : Samuli Thomasson

Buy this Book

Haskell High Performance Programming

By: Samuli Thomasson

Buy this Book

Overview of this book

Haskell, with its power to optimize the code and its high performance, is a natural candidate for high performance programming. It is especially well suited to stacking abstractions high with a relatively low performance cost. This book addresses the challenges of writing efficient code with lazy evaluation and techniques often used to optimize the performance of Haskell programs. We open with an in-depth look at the evaluation of Haskell expressions and discuss optimization and benchmarking. You will learn to use parallelism and we'll explore the concept of streaming. We’ll demonstrate the benefits of running multithreaded and concurrent applications. Next we’ll guide you through various profiling tools that will help you identify performance issues in your program. We’ll end our journey by looking at GPGPU, Cloud and Functional Reactive Programming in Haskell. At the very end there is a catalogue of robust library recommendations with code samples. By the end of the book, you will be able to boost the performance of any app and prepare it to stand up to real-world punishment.

Haskell High Performance Programming

Credits

About the Author

About the Reviewer

www.PacktPub.com

Preface

Free Chapter

Identifying Bottlenecks

Meeting lazy evaluation

Memoization and CAFs

Recursion and accumulators

Inspecting time and space usage

Compiler code optimizations

Summary

Choosing the Correct Data Structures

Annotating strictness and unpacking datatype fields

Handling numerical data

Handling binary and textual data

Handling sequential data

Handling tabular data

Handling sparse data

Ephemeral data structures

Working with monads and monad stacks

Summary

Profile and Benchmark to Your Heart's Content

Profiling time and allocations

Heap profiling

Benchmarking using the criterion library

Profile and monitor in real time

Summary

The Devil's in the Detail

The anatomy of a Haskell project

Erroring and handling exceptions

Writing tests for Haskell

Trivia at term-level

Trivia at type-level

Useful GHC extensions

Summary

Parallelize for Performance

Primitive parallelism and the Runtime System

The Eval monad and strategies

The Par monad and schedules

Diagnosing parallelism – ThreadScope

Data parallel programming – Repa

Summary

I/O and Streaming

Reading, writing, and handling resources

Streaming with side-effects

Logging in Haskell

Summary

Concurrency and Performance

Threads and concurrency primitives

Software Transactional Memory

Runtime System and threads

Asynchronous processing

Lifting up from I/O

Summary

Tweaking the Compiler and Runtime System (GHC)

Using GHC like a pro

Tuning GHC's Runtime System

Summary of useful GHC options

Summary of useful RTS options

Summary

GHC Internals and Code Generation

Interpreting GHC's internal representations

Primitive GHC-specific features

Datatype generic programming

Generating Haskell with Haskell

Summary

Foreign Function Interface

From Haskell to C and C to Haskell

Data marshal and stable pointers

Summary

Programming for the GPU with Accelerate

Writing Accelerate programs

Running with the CUDA backend

More Accelerate concepts

Summary

Scaling to the Cloud with Cloud Haskell

Processes and message-passing

Handling failure

Nodes and networking

Summary

Functional Reactive Programming

The tiny discrete-time Elerea

Events and signal functions with Yampa

Reactive-banana – Safe and simple semantics

Combining events and behaviors

Summary

Library Recommendations

Representing data

Functional graphs

Numeric data for special use

Encoding and serialization

Persistent storage, SQL, and NoSQL

Networking and HTTP

Cryptography

Web technologies

Parsing and pretty-printing

Pretty-printing and text formatting

Control and utility libraries

Working with monads and transformers

Handling exceptions

Random number generators

Parallel and concurrent programming

Functional Reactive Programming

Mathematics, statistics, and science

Tools for research and sketching

The HaskellR project

Creating charts and diagrams

Scripting and CLI applications

Testing and benchmarking

Summary

Index

Customer Reviews

5 star

4 star

3 star

2 star

1 star

Memoization and CAFs

Memoization is a dynamic programming technique where intermediate results are saved and later reused. Many string and graph algorithms make use of memoization. Calculating the Fibonacci sequence, instances of the knapsack problem, and many bioinformatics algorithms are almost inherently solvable only with dynamic programming. A classic example in Haskell is the algorithm for the nth Fibonacci number, of which one variant is the following:

-- file: fib.hs

fib_mem :: Int -> Integer
fib_mem = (map fib [0..] !!)
  where fib 0 = 1
        fib 1 = 1
        fib n = fib_mem (n-2) + fib_mem (n-1)

Try it with a reasonable input size (10000) to confirm it does memoize the intermediate numbers. The time for lookups grows in size with larger numbers though, so a linked list is not a very appropriate data structure here. But let's ignore that for the time being and focus on what actually enables the values of this function to be memoized.

Looking at the top level, fib_mem looks like a normal function that takes input, does a computation, returns a result, and forgets everything it did with regard to its internal state. But in reality, fib_mem will memoize the results of all inputs it will ever be called with during its lifetime. So if fib_mem is defined at the top level, the results will persist in memory over the lifetime of the program itself!

The short story of why memoization is taking place in fib_mem stems from the fact that in Haskell functions exist at the same level with normal values such as integers and characters; that is, they are all values. Because the parameter of fib_mem does not occur in the function body, the body can be reduced irrespective of the parameter value. Compare fib_mem to this fib_mem_arg:

fib_mem_arg :: Int -> Integer
fib_mem_arg x = map fib [0..] !! x
  where fib 0 = 1
        fib 1 = 1
        fib n = fib_mem_arg (n-2) + fib_mem_arg (n-1)

Running fib_mem_arg with anything but very small arguments, one can confirm it does no memoization. Even though we can see that map fib [0..] does not depend on the argument number and could be memorized, it will not be, because applying an argument to a function will create a new expression that cannot implicitly have pointers to expressions from previous function applications. This is equally true with lambda abstractions as well, so this fib_mem_lambda is similarly stateless:

fib_mem_lambda :: Int -> Integer
fib_mem_lambda = \x -> map fib [0..] !! x
  where fib 0 = 1
        fib 1 = 1
        fib n = fib_mem_lambda (n-2) + fib_mem_lambda (n-1)

With optimizations, both fib_mem_arg and fib_mem_lambda will get rewritten into a form similar to fib_mem. So in simple cases, the compiler will conveniently fix our mistakes, but sometimes it is necessary to reorder complex computations so that different parts are memoized correctly.

Tip

Be wary of memoization and compiler optimizations. GHC performs aggressive inlining (Explained in the section, Inlining and stream fusion) as a routine optimization, so it's very likely that values (and functions) get recalculated more often than was intended.

Constant applicative form

The formal difference between fib_mem and the others is that the fib_mem is something called a constant applicative form, or CAF for short. The compact definition of a CAF is as follows: a supercombinator that is not a lambda abstraction. We already covered the not-a-lambda abstraction, but what is a supercombinator?

A supercombinator is either a constant, say 1.5 or ['a'..'z'], or a combinator whose subexpressions are supercombinators. These are all supercombinators:

\n -> 1 + n
\f n -> f 1 n
\f -> f 1 . (\g n -> g 2 n)

But this one is not a supercombinator:

\f g -> f 1 . (\n -> g 2 n)

This is because g is not a free variable of the inner lambda abstraction.

CAFs are constant in the sense that they contain no free variables, which guarantees that all thunks a CAF references directly are also constants. Actually, the constant subvalues are a part of the value. Subvalues are automatically memoized within the value itself.

A top-level [Int], say, is just as valid a value as the fib_mem function for holding references to other values. You should pay attention to CAFs in your code because memoized values are space leaks when the memoization was unintended. All code that allocates lots of memory should be wrapped in functions that take one or more parameters.

Haskell High Performance Programming

By : Samuli Thomasson

Haskell High Performance Programming

By: Samuli Thomasson

Overview of this book

Related Content you might be interested in

Current Title:

Haskell High Performance Programming

Memoization and CAFs

Tip

Constant applicative form