#### Overview of this book

Delphi is a cross-platform Integrated Development Environment (IDE) that supports rapid application development for Microsoft Windows, Apple Mac OS X, Google Android, iOS, and now Linux with RAD Studio 10.2. This book will be your guide to build efficient high performance applications with Delphi. The book begins by explaining how to find performance bottlenecks and apply the correct algorithm to fix them. It will teach you how to improve your algorithms before taking you through parallel programming. You’ll then explore various tools to build highly concurrent applications. After that, you’ll delve into improving the performance of your code and master cross-platform RTL improvements. Finally, we’ll go through memory management with Delphi and you’ll see how to leverage several external libraries to write better performing programs. By the end of the book, you’ll have the knowledge to create high performance applications with Delphi.
Title Page
Packt Upsell
Contributors
Preface
Free Chapter
Fixing the Algorithm
Fine-Tuning the Code
Memory Management
Getting Started with the Parallel World
Working with Parallel Tools
Exploring Parallel Practices
Using External Libraries
Best Practices
Other Books You May Enjoy
Index

## Algorithm complexity

Before we start with the dirty (and fun) job of improving program speed, I'd like to present a bit of computer science theory, namely the Big O notation.

You don't have to worry, I will not use pages of mathematical formulas and talk about infinitesimal asymptotics. Instead, I will just present the essence of the Big O notation, the parts that are important to every programmer.

In the literature and, of course, on the web, you will see expressions such as O(n), O(n^2), O(1) and similar. This fancy-looking notation hides a really simple story. It tells us how much slower the algorithm will become if we increase the data size by a factor of n.

### Note

The n^2 notation means "n to the power of two", or n2. This notation is frequently used on the internet because it can be written with the standard ASCII characters. This book uses the more readable variant O(n2).

Let's say we have an algorithm with complexity of O(n), which on average takes T seconds to process input data of size N. If we increase the size of the data by a factor of 10 (to 10*N), then the algorithm will (on average) also use 10 times more time (that is, 10*T) to process the data. If we process 1,000 times more data, the program will also run 1,000 times slower.

If the algorithm complexity is O(n2), increasing the size of the data by a factor of 10 will cause the algorithm to run 102 or 100 times longer. If we want to process 1,000 times more data, then the algorithm will take 1,0002 or a million times longer, which is quite a hit. Such algorithms are typically not very useful if we have to process large amounts of data.

### Note

Most of the time, we use the Big O notation to describe how the computation time relates to the input data size. When this is the case, we call the Big O notation time complexity. Nevertheless, sometimes the same notation is used to describe how much storage (memory) the algorithm is using. In that case, we are talking about a space complexity.

You may have noticed that I was using the word average a lot in the last few paragraphs. When talking about the algorithm complexity, we are mostly interested in the average behavior, but sometimes we will also need to know about the worst behavior. We rarely talk about best behavior because users don't really care much if the program is sometimes faster than average.

Let's look at an example. The following function checks whether a string parameter value is present in a string list:

function IsPresentInList(strings: TStrings; const value: string): Boolean;
var
i: Integer;
begin
Result := False;
for i := 0 to strings.Count - 1 do
if SameText(strings[i], value) then
Exit(True);
end;

What can we tell about this function? The best case is really simple—it will find that the value is equal to strings[0] and it will exit. Great! The best behavior for our function is O(1). That, sadly, doesn't tell us much as that won't happen frequently in practice.

The worst behavior is also easy to find. If the value is not present in the list, the code will have to scan all of the strings list before deciding that it should return False. In other words, the worst behavior is O(n), if the n represents the number of elements in the list. Incidentally (and without proof), the average behavior for this kind of search is also O(n).

### Note

The Big O limits don't care about constant factors. If an algorithm would use n/2 steps on average, or even just 0.0001 * n steps, we would still write this down as O(n). Of course, a O(10 * n) algorithm is slower than a O(n) algorithm and that is absolutely important when we fine-tune the code, but no constant factor C will make O(C * n) faster than O(log n) if n gets sufficiently large.

There are better ways to check whether an element is present in some data than searching the list sequentially. We will explore one of them in the next section, Big O and Delphi data structures.

While the function of n inside the O() notation can be anything, there are some O functions that appear constantly in standard programming problems. The following table shows those Big O limits and the most common examples of problems that belong to each class:

 Time complexity Common examples of problems with that time complexity O(1) Accessing array elements O(log n) Search in an ordered list O(n) Linear search O(n log n) Quick sort (average behavior) O(n2) Quick sort (worst behavior), naive sort (bubblesort, insertion sort, selection sort) O(cn) Recursive Fibonacci, travelling salesman problem using dynamic programming (c is some numeric constant)

If we care about program performance, then O(1) algorithms are of special interest to us as they present algorithms which don't get slower (at least not noticeably) when we increase the problem size. We'll see an example of such O(1) algorithms in the next section.

When we deal with algorithms that search in some datasets, we usually try to make them behave as O(log n), not O(n), as the former slows down much, much slower than the latter.

Another big class of problems deals with sorting the data. While the naive approaches sort in O(n2), better algorithms (such as mergesort and quicksort) need on average just O(n log n) steps.

The following image shows how the time complexity for these typical limits (we have used 2n as an example of a more generic cn) grows when we increase the problem size up to 20-fold:

Most frequently encountered Big-O limits

We can see that O(1) and O(log n) grow very slowly. While O(n log n) grows faster than O(n), it also grows much slower than O(n2), which we had to stop plotting when data was increased nine-fold.

The O(2n) starts slowly and looks like a great solution for small data sizes (small n), but then it starts rising terribly fast, much faster than O(n2).

The following table shows how fast O(n log n) and O(n2) are growing if we compare them with O(n) and how quickly O(2n) explodes.

The data column shows the data size increase factor. The number 10 in this column, for example, represents input with 10 times more elements than in the original data:

 Data size O(1) O(log n) O(n) O(n log n) O(n2) O(2n) 1 1 1 1 1 1 1 2 1 2 2 4 4 2 10 1 4 10 43 100 512 20 1 5 20 106 400 524,288 100 1 8 100 764 10,000 1029 300 1 9 300 2,769 90,000 1090

We can see from this table that O(log n) algorithms present a big improvement over O(n) algorithms (8 versus 100 times increase in time when data increases 100-fold). We can also see that the O(2n) quickly becomes completely unmanageable.

The last cell in this table is particularly interesting. There are different estimates for the number of elementary particles (electrons, protons, neutrons, and so on) in the visible universe, but they all lie somewhere around 1090. Suppose we have a computer which can solve an O(2n) in a reasonable time. If we would increase the input data by a factor of just 300, then we would need 1090 computers to solve the new problem in the same time. That is as much as the number of particles in the visible universe!

### Note

Don't use algorithms which have time complexity O(2n). It won't end well.

### Big O and Delphi data structures

Delphi's Run-Time Library (RTL) contains many data structures (classes that are specifically designed to store and retrieve data), mostly stored in System.Classes and System.Generics.Collection units that greatly simplify everyday work. We should, however, be aware of their good and bad sides.

Every data structure in the world is seeking a balance between four different types of data access: accessing the data, inserting the data, searching for data, and deleting data. Some data structures are good in some areas, others in different ones, but no data structure in this world can make all four operations independent of data size.

When designing a program, we should therefore know what our needs are. That will help us select the appropriate data structure for the job.

The most popular data structure in Delphi is undoubtedly TStringList. It can store a large amount of strings and assign an object to each of them. It can—and this is important—work in two modes, unsorted and sorted. The former, which is a default, keeps strings in the same order as they were added while the latter keeps them alphabetically ordered.

This directly affects the speed of some operations. While accessing any element in a string list can always be done in a constant time (O(1)), adding to a list can take O(1) when the list is not sorted and O(log n) when the list is sorted.

Why that big difference? When the list is unsorted, Add just adds a string at its end. If the list is, however, sorted, Add must first find a correct insertion place. It does this by executing a bisection search, which needs O(log n) steps to find the correct place.

The reverse holds true for searching in a string list. If it is not sorted, IndexOf needs to search (potentially) the whole list to find an element. In a sorted list, it can do it much faster (again by using a bisection) in O(log n) steps.

We can see that TStringList offers us two options - either a fast addition of elements or a fast lookup, but not both. In a practical situation, we must look at our algorithm and think wisely about what we really need and what will behave better.

To sort a string list, you can call its Sort method or you can set its Sorted property to True.  There is, however, a subtle difference that you should be aware of. While calling Sort sorts the list, it doesn't set its internal is sorted flag and all operations on the list will proceed as if the list is unsorted. Setting Sorted := True, on the other hand, does both - it sets the internal flag and calls the Sort method to sort the data.

To store any (non-string) data, we can use traditional TList and TObjectList classes or their more modern generic counterparts, TList<T> and TObjectList<T>. They all always work in an unsorted mode and so adding an element takes O(1) while finding and removing an element takes O(n) steps.

All provide a Sort function which sorts the data with a quicksort algorithm (O(n log n) on average) but only generic versions have a BinarySearch method, which searches for an element with a bisection search taking O(log n) steps. Be aware that BinarySearch requires the list to be sorted but doesn't make any checks to assert that. It is your responsibility to sort the list before you use this function.

If you need a very quick element lookup, paired with a fast addition and removal, then TDictionary is the solution. It has methods for adding (Add), removing (Remove) and finding a key (ContainsKey and TryGetValue) that, on average, function in a constant time, O(1). Their worst behavior is actually quite bad, O(n), but that will only occur on specially crafted sets of data that you will never see in practical applications.

I've told you before that there's no free lunch and so we can't expect that TDictionary is perfect. The big limitation is that we can't access the elements it is holding in a direct way. In other words, there is no TDictionary[i]. We can walk over all elements in a dictionary by using a for statement, but we can't access any of its elements directly. Another limitation of TDictionary is that it does not preserve the order in which elements were added.

Delphi also offers two simple data structures that mimic standard queue—TQueue<T>—and stack—TStack<T>. Both have very fast O(1) methods for adding and removing the data, but they don't offer any bells and whistles—there is no direct data access, we cannot search for data, and so on. We can only insert (Enqueue in queue or Push in stack) and remove (Dequeue and Pop) data.

To help you select the right tool for the job, I have put together a table showing the most important data structures and their most important methods, together with average and (when they differ from the average) worst-case time complexities:

 Data structure Operation Average Worst TStringList Direct access O(1) Add O(1) / O(log n) Insert O(1) Delete O(1) IndexOf O(n) / O(log n) Sort O(n log n) O(n2) TList, TObjectList Direct access O(1) Add O(1) Insert O(1) Delete O(1) Remove O(n) IndexOf O(n) Sort O(n log n) O(n2) TList, TObjectList Direct access O(1) Add O(1) Insert O(1) Delete O(1) Remove O(n) IndexOf O(n) BinarySearch O(log n) Sort O(n log n) O(n2) TDictionary Direct access Not possible Add O(1) O(n) Remove O(1) O(n) TryGetValue O(1) O(n) ContainsKey O(1) O(n) ContainsValue O(n) TQueue Direct access Not possible Enqueue O(1) Dequeue O(1) TStack Direct access Not possible Push O(1) Pop O(1)

The table shows the time complexity of the most important operations on built-in Delphi data structures. Complexity for the worst case is only listed if it differs from the average complexity.

### Data structures in practice

Enough with the theory already! I know that you, like me, prefer to talk through the code. As one program explains more than a thousand words could, I have prepared a simple demo project: RandomWordSearch.

This program functions as a very convoluted random word generator. When started, it will load a list of 370,101 English words from a file. It will also prepare three internal data structures preloaded with these words:

Random word generator

The program shows three buttons to the user. All three run basically the same code. The only difference is the test function which is passed to the centralized word generator as a parameter:

procedure TfrmRandomWordSearch.FindGoodWord(const wordTest: TWordCheckDelegate);
var
word: string;
isWordOK: boolean;
time: TStopwatch;
begin
time := TStopwatch.StartNew;
repeat
word := GenerateWord;
isWordOK := wordTest(word);
until isWordOK or (time.ElapsedMilliseconds > 10000);
if isWordOK then
lbWords.ItemIndex := lbWords.Items.Add(Format('%s (%d ms)', [word, time.ElapsedMilliseconds]))
else
end;

The core of the FindGoodWord method can be easily described:

1. Generate a random word by calling GenerateWord.
2. Call the test function wordTest on that word. If this function returns False, repeat Step 1. Otherwise show the word.

The code is a bit more complicated because it also checks that the word generation part runs for at most 10 seconds and reports a timeout if no valid word was found in that time.

The random word generator GenerateWord is incredibly simple. It just appends together lowercase English letters until the specified length (settable in the user interface) is reached:

function TfrmRandomWordSearch.GenerateWord: string;
var
pos: integer;
begin
Result := '';
for pos := 1 to inpWordLength.Value do
Result := Result + Chr(Ord('a') + Random(Ord('z') - Ord('a') + 1));
end;

Let's now check the data preparation phase. The not very interesting (and not shown here) OnCreate handler loads data from a file into a TStringList and calls the LoadWords method:

var
word: string;
begin
FWordsUnsorted := TStringList.Create;
FWordsUnsorted.Assign(wordList);

FWordsSorted := TStringList.Create;
FWordsSorted.Assign(wordList);
FWordsSorted.Sorted := True;

FWordsDictionary := TDictionary<string,boolean>.Create(wordList.Count);
for word in wordList do
end;

The first data structure is an unsorted TStringListFWordsUnsorted. Data is just copied from the list of all words by calling the Assign method.

The second data structure is a sorted TStringListFWordsSorted. Data is firstly copied from the list of all words. The list is then sorted by setting FWordsSorted.Sorted := True.

The last data structure is a TDictionaryFWordsDictionary. A TDictionary always stores pairs of keys and values. In our case, we only need the keys part as there is no data associated with any specific word, but Delphi doesn't allow us to ignore the values part and so the code defines the value as a boolean and always sets it to True.

Although the dictionaries can grow, they work faster if we can initially set the number of elements that will be stored inside. In this case that is simple—we can just use the length of the wordList as a parameter to TDictionary.Create.

The only interesting part of the code left is OnClick handlers for all three buttons. All three call the FindGoodWord method, but each passes in a different test function.

When you click on the Unsorted list button, the test function checks whether the word can be found in the FWordsUnsorted list by calling the IndexOf function. As we will mostly be checking non-English words (remember, they are just random strings of letters), this IndexOf will typically have to compare all 307,101 words before returning -1:

procedure TfrmRandomWordSearch.btnUnsortedListClick(Sender: TObject);
begin
FindGoodWord(
function (const word: string): boolean
begin
Result := FWordsUnsorted.IndexOf(word) >= 0;
end);
end;

When you click on the Sorted list button, the test function calls FWordsSorted.IndexOf. As this TStringList is sorted, IndexOf will use a binary search that will need at most log(307101) = 19 (rounded up) comparisons to find out that a word is not found in the list. As this is much less than 307.101, we can expect that finding words with this approach will be much faster:

procedure TfrmRandomWordSearch.btnSortedListClick(Sender: TObject);
begin
FindGoodWord(
function (const word: string): boolean
begin
Result := FWordsSorted.IndexOf(word) >= 0;
end);
end;

A click on the last button, Dictionary, calls FWordsDictionary.ContainsKey to check if the word can be found in the dictionary, and that can usually be done in just one step. Admittedly, this is a bit of a slower operation than comparing two strings, but still the TDictionary approach should be faster than any of the TStringList methods:

procedure TfrmRandomWordSearch.btnDictionaryClick(Sender: TObject);
begin
FindGoodWord(
function (const word: string): boolean
begin
Result := FWordsDictionary.ContainsKey(word);
end);
end;

If we use the terminology from the last section, we can say that the O(n) algorithm (unsorted list) will run much slower than the O(log n) algorithm (sorted list), and that the O(1) algorithm (dictionary) will be the fastest of them all. Let's check this in practice.

Start the program and click on the Unsorted list button a few times. You'll see that it typically needs few hundred milliseconds to a few seconds to generate a new word. As the process is random and dependent on CPU speed, your numbers may differ quite a lot from mine. If you are only getting timeout messages, you are running on a slow machine and you should decrease the Word length to 3.

If you increment the Word length to 5 and click the button again, you'll notice that the average calculation time will grow up to a few seconds. You may even get an occasional timeout. Increase it to 6 and you'll mostly be getting timeouts. We are clearly hitting the limits of this approach.

Prepare now to be dazed! Click on the Sorted list button (while keeping the Word length at 6) and words will again be calculated blazingly fast. On my computer, the code only needs 10 to 100 milliseconds to find a new word:

Testing with different word length and with first two algorithms

To better see the difference between a sorted list and a dictionary, we have to crank up the word length again. Setting it to 7 worked well for me. The sorted list needed from a few 100 milliseconds to a few seconds to find a new word while the dictionary approach mostly found a new word in under 100 milliseconds.

Increase the Word length to 8 and the sorted list will start to time out while the dictionary will still work. Our O(1) approach is indeed faster than the O(log n) code:

Comparing sorted list (first six words and the two after "timeout") with the dictionary approach (next five and last two).

While the knowledge of algorithms and their complexity is useful, it will not help in every occasion. Sometimes you already use the best possible algorithm but your program is still too slow. At such times, you'll want to find the slow parts of the program and speed them up with whatever trick possible. To do that, you have to find them first, which is the topic of the rest of this chapter.

### Mr. Smith's first program

While I was talking about theory and data structures and best practices, our friend Mr. Smith read everything about polar forests on the internet and then got really bored. In a moment of sheer panic—when he was thinking about watching videos of cute kittens for the whole polar night—he turned to programming. He checked various internet sources and decided that Delphi is the way to go. After all, he could use it to put together a new game for iOS and Android and make lots of money while waiting for the sun to show up!

As he didn't have any programming literature on the Antarctic base, he learned programming from internet resources. Sadly, he found most of his knowledge on Experts Exchange and Yahoo Answers and his programs sometimes reflect that.

As of this moment, he has written one working program (and by working I mean that it compiles), but he is not really sure what the program actually does. He only knows that the program is a bit slow and because of that he named it SlowCode.

His program is a console mode program which upon startup calls the Test method:

procedure Test;
var
data: TArray<Integer>;
highBound: Integer;
begin
repeat
Writeln('How many numbers (0 to exit)?');
Write('> ');
if highBound = 0 then
Exit;

data := SlowMethod(highBound);
ShowElements(data);
until false;
end;

That one, at least, is easy to grasp. It reads some number, passes it to something called SlowMethod (hmm, Mr. Smith really should work on naming techniques), and then passes the result (which is of type TArray<Integer>) to a method called ShowElements. When the user types in 0, the program exits.

Let's check the ShowElements function:

function SlowMethod(highBound: Integer): TArray<Integer>;
var
i: Integer;
temp: TList<Integer>;
begin
temp := TList<Integer>.Create;
try
for i := 2 to highBound do
if not ElementInDataDivides(temp, i) then
Result := Filter(temp);
finally
FreeAndNil(temp);
end;
end;

The code creates a list of integers. Then it iterates from a value 2 to the value that was entered by the user and for each value calls ElementInDataDivides, passing in the list and the current value. If that function returns True, the current value is entered into the list.

After that, SlowMethod calls the Filter method which does something with the list and converts it into an array of integers which is then returned as a function result:

function ElementInDataDivides(data: TList<Integer>; value: Integer): boolean;
var
i: Integer;
begin
Result := True;
for i in data do
if (value <> i) and ((value mod i) = 0) then
Exit;
Result := False;
end;

The ElementInDataDivides function iterates over all the numbers in the list and checks if any element in the list divides the value (with the additional constraint that this element in the list must not be equal to the value).

Let's check the last part of the puzzle—the Filter function:

function Reverse(s: string): string;
var
ch: char;
begin
Result := '';
for ch in s do
Result := ch + Result;
end;
function Filter(list: TList<Integer>): TArray<Integer>;
var
i: Integer;
reversed: Integer;
begin
SetLength(Result, 0);
for i in list do
begin
reversed := StrToInt(Reverse(IntToStr(i)));
if not ElementInDataDivides(list, reversed) then
begin
SetLength(Result, Length(Result) + 1);
Result[High(Result)] := i;
end;
end;
end;

This one again iterates over the list, reverses the numbers in each element (changes 123 to 321, 3341 to 1433, and so on) and calls ElementInDataDivides on the new number. If it returns True, the element is added to the returned result in a fairly inefficient way. I agree with Mr. Smith—it is hard to tell what the program does. Maybe it is easiest to run it and look at the output:

It looks like the program is outputting prime numbers. Not all prime numbers, just some of them. (For example, 19 is missing from the list, and so is 23.) Let's leave it at that for the moment.

### Looking at code through the Big O eyes

We can tell more about the program, about its good and bad parts, if we look at it through the eyes of time complexity, in terms of the Big O notation.

We'll start where the code starts—in the SlowMethod method. It has a loop iterating from 2 to the user-specified upper bound, for i := 2 to highBound do. The size of our data, or n, is therefore equal to highBound and this for loop has time complexity of O(n):

for i := 2 to highBound do
if not ElementInDataDivides(temp, i) then

Inside this loop, the code calls ElementInDataDivides followed by an occasional temp.Add. The latter will execute in O(1), but we can't say anything about the ElementInDataDivides before we examine it.

This method also has a loop iterating over the data list. We can't guess how many elements are in this list, but in the short test that we just performed, we know that the program writes out 13 elements when processing values from 2 to 100:

for i in data do
if (value <> i) and ((value mod i) = 0) then
Exit;

For the purpose of this very rough estimation I'll just guess that the for i in data do loop also has a time complexity of O(n).

In SlowMethod we therefore have an O(n) loop executing another O(n) loop for each element, which gives us a O(n2) performance.

The SlowMethod then calls the Filter method which also contains O(n) for loop calling ElementInDataDivides, which gives us O(n2) complexity for this part:

for i in list do
begin
reversed := StrToInt(Reverse(IntToStr(i)));
if not ElementInDataDivides(list, reversed) then
begin
SetLength(Result, Length(Result) + 1);
Result[High(Result)] := i;
end;
end;

There's also a conversion to string, some operation on that string, and conversion back to the integer StrToInt(Reverse(IntToStr(i))). It works on all elements of the list (O(n)) but in each iteration it processes all characters in the string representation of a number. As a length of the number is proportional to log n, we can say that this part has complexity of O(n log n), which can be ignored as it is much less than the O(n2) complexity of the whole method.

There are also some operations hidden inside SetLength, but at this moment we don't know yet what they are and how much they contribute to the whole program. We'll cover that area in Chapter 4, Memory Management.

The SlowMethod therefore consists of two parts, both with complexity O(n2). Added together, that would give us 2*n2, but as we ignore constant factors (that is, 2) in the Big O notation, we can only say that the time complexity of SlowMethod is O(n2).

So what can we say simply by looking at the code?

• The program probably runs in O(n2) time. It will take around 100 times longer to process 10,000 elements than 1,000 elements.
• There is a conversion from the integer to the string and back (Filter), which has complexity of only O(n log n) but it would still be interesting to know how fast this code really is.
• There's a time complexity hidden behind the SetLength call which we know nothing about.
• We can guess that most probably the ElementInDataDivides is the most time-consuming part of the code and any improvements in this method would probably help.
• Fixing the terrible idea of appending elements to an array with SetLength could probably speed up a program, too.

As the code performance is not everything, I would also like to inform Mr. Smith about a few places where his code is less than satisfactory:

• The prompt How many numbers is misleading. A user would probably expect it to represent the number of numbers outputted, while the program actually wants to know how many numbers to test.
• Appending to TArray<T> in that way is not a good idea. Use TList<T> for temporary storage and call its ToArray method at the end. If you need TArray<T>, that is. You can also do all processing using TList<T>.
• SlowMethod has two distinctive parts - data generation, which is coded as a part of SlowMethod, and data filtering, which is extracted in its own method, Filter. It would be better if the first part is extracted into its own method, too.
• Console program, really? Console programs are good for simple tests, but that is it. Learn VCL or FireMonkey, Mr. Smith!

We can now try and optimize parts of the code (ElementInDataDivides seems to be a good target for that) or, better, we can do some measuring to confirm our suspicions with hard numbers.

In a more complicated program (what we call a real life), it would usually be much simpler to measure the program performance than to do such analysis. This approach, however, proves to be a powerful tool if you are using it while designing the code. Once you hear a little voice nagging about the time complexities all the time while you're writing a code, you'll be on the way to becoming an excellent programmer.