Before we start with the dirty (and fun) job of improving program speed, I'd like to present a bit of computer science theory, namely the Big O notation.
You don't have to worry, I will not use pages of mathematical formulas and talk about infinitesimal asymptotics. Instead, I will just present the essence of the Big O notation, the parts that are important to every programmer.
In the literature and, of course, on the web, you will see expressions such as O(n), O(n^2), O(1) and similar. This fancy-looking notation hides a really simple story. It tells us how much slower the algorithm will become if we increase the data size by a factor of n.
Note
The n^2 notation means "n to the power of two", or n^{2}. This notation is frequently used on the internet because it can be written with the standard ASCII characters. This book uses the more readable variant O(n^{2}).
Let's say we have an algorithm with complexity of O(n), which on average takes T seconds to process input data of size N. If we increase the size of the data by a factor of 10 (to 10*N), then the algorithm will (on average) also use 10 times more time (that is, 10*T) to process the data. If we process 1,000 times more data, the program will also run 1,000 times slower.
If the algorithm complexity is O(n^{2}), increasing the size of the data by a factor of 10 will cause the algorithm to run 10^{2} or 100 times longer. If we want to process 1,000 times more data, then the algorithm will take 1,000^{2} or a million times longer, which is quite a hit. Such algorithms are typically not very useful if we have to process large amounts of data.
Note
Most of the time, we use the Big O notation to describe how the computation time relates to the input data size. When this is the case, we call the Big O notation time complexity. Nevertheless, sometimes the same notation is used to describe how much storage (memory) the algorithm is using. In that case, we are talking about a space complexity.
You may have noticed that I was using the word average a lot in the last few paragraphs. When talking about the algorithm complexity, we are mostly interested in the average behavior, but sometimes we will also need to know about the worst behavior. We rarely talk about best behavior because users don't really care much if the program is sometimes faster than average.
Let's look at an example. The following function checks whether a string parameter value
is present in a string list:
function IsPresentInList(strings: TStrings; const value: string): Boolean; var i: Integer; begin Result := False; for i := 0 to strings.Count - 1 do if SameText(strings[i], value) then Exit(True); end;
What can we tell about this function? The best case is really simple—it will find that the value
is equal to strings[0]
and it will exit. Great! The best behavior for our function is O(1). That, sadly, doesn't tell us much as that won't happen frequently in practice.
The worst behavior is also easy to find. If the value
is not present in the list, the code will have to scan all of the strings
list before deciding that it should return False
. In other words, the worst behavior is O(n), if the n represents the number of elements in the list. Incidentally (and without proof), the average behavior for this kind of search is also O(n).
Note
The Big O limits don't care about constant factors. If an algorithm would use n/2 steps on average, or even just 0.0001 * n steps, we would still write this down as O(n). Of course, a O(10 * n) algorithm is slower than a O(n) algorithm and that is absolutely important when we fine-tune the code, but no constant factor C will make O(C * n) faster than O(log n) if n gets sufficiently large.
There are better ways to check whether an element is present in some data than searching the list sequentially. We will explore one of them in the next section, Big O and Delphi data structures.
While the function of n inside the O() notation can be anything, there are some O functions that appear constantly in standard programming problems. The following table shows those Big O limits and the most common examples of problems that belong to each class:
Time complexity | Common examples of problems with that time complexity |
O(1) | Accessing array elements |
O(log n) | Search in an ordered list |
O(n) | Linear search |
O(n log n) | Quick sort (average behavior) |
O(n^{2}) | Quick sort (worst behavior), naive sort (bubblesort, insertion sort, selection sort) |
O(c^{n}) | Recursive Fibonacci, travelling salesman problem using dynamic programming (c is some numeric constant) |
If we care about program performance, then O(1) algorithms are of special interest to us as they present algorithms which don't get slower (at least not noticeably) when we increase the problem size. We'll see an example of such O(1) algorithms in the next section.
When we deal with algorithms that search in some datasets, we usually try to make them behave as O(log n), not O(n), as the former slows down much, much slower than the latter.
Another big class of problems deals with sorting the data. While the naive approaches sort in O(n^{2}), better algorithms (such as mergesort and quicksort) need on average just O(n log n) steps.
The following image shows how the time complexity for these typical limits (we have used 2^{n} as an example of a more generic c^{n}) grows when we increase the problem size up to 20-fold:
Most frequently encountered Big-O limits
We can see that O(1) and O(log n) grow very slowly. While O(n log n) grows faster than O(n), it also grows much slower than O(n^{2}), which we had to stop plotting when data was increased nine-fold.
The O(2^{n}) starts slowly and looks like a great solution for small data sizes (small n), but then it starts rising terribly fast, much faster than O(n^{2}).
The following table shows how fast O(n log n) and O(n^{2}) are growing if we compare them with O(n) and how quickly O(2^{n}) explodes.
The data column shows the data size increase factor. The number 10 in this column, for example, represents input with 10 times more elements than in the original data:
Data size | O(1) | O(log n) | O(n) | O(n log n) | O(n^{2}) | O(2^{n}) |
1 | 1 | 1 | 1 | 1 | 1 | 1 |
2 | 1 | 2 | 2 | 4 | 4 | 2 |
10 | 1 | 4 | 10 | 43 | 100 | 512 |
20 | 1 | 5 | 20 | 106 | 400 | 524,288 |
100 | 1 | 8 | 100 | 764 | 10,000 | 10^{29} |
300 | 1 | 9 | 300 | 2,769 | 90,000 | 10^{90} |
We can see from this table that O(log n) algorithms present a big improvement over O(n) algorithms (8 versus 100 times increase in time when data increases 100-fold). We can also see that the O(2^{n}) quickly becomes completely unmanageable.
The last cell in this table is particularly interesting. There are different estimates for the number of elementary particles (electrons, protons, neutrons, and so on) in the visible universe, but they all lie somewhere around 10^{90}. Suppose we have a computer which can solve an O(2^{n}) in a reasonable time. If we would increase the input data by a factor of just 300, then we would need 10^{90 }computers to solve the new problem in the same time. That is as much as the number of particles in the visible universe!
Delphi's Run-Time Library (RTL) contains many data structures (classes that are specifically designed to store and retrieve data), mostly stored in System.Classes
and System.Generics.Collection
units that greatly simplify everyday work. We should, however, be aware of their good and bad sides.
Every data structure in the world is seeking a balance between four different types of data access: accessing the data, inserting the data, searching for data, and deleting data. Some data structures are good in some areas, others in different ones, but no data structure in this world can make all four operations independent of data size.
When designing a program, we should therefore know what our needs are. That will help us select the appropriate data structure for the job.
The most popular data structure in Delphi is undoubtedly TStringList
. It can store a large amount of strings and assign an object to each of them. It can—and this is important—work in two modes, unsorted and sorted. The former, which is a default, keeps strings in the same order as they were added while the latter keeps them alphabetically ordered.
This directly affects the speed of some operations. While accessing any element in a string list can always be done in a constant time (O(1)), adding to a list can take O(1) when the list is not sorted and O(log n) when the list is sorted.
Why that big difference? When the list is unsorted, Add
just adds a string at its end. If the list is, however, sorted, Add
must first find a correct insertion place. It does this by executing a bisection search, which needs O(log n) steps to find the correct place.
The reverse holds true for searching in a string list. If it is not sorted, IndexOf
needs to search (potentially) the whole list to find an element. In a sorted list, it can do it much faster (again by using a bisection) in O(log n) steps.
We can see that TStringList
offers us two options - either a fast addition of elements or a fast lookup, but not both. In a practical situation, we must look at our algorithm and think wisely about what we really need and what will behave better.
To sort a string list, you can call its Sort
method or you can set its Sorted
property to True
. There is, however, a subtle difference that you should be aware of. While calling Sort
sorts the list, it doesn't set its internal is sorted flag and all operations on the list will proceed as if the list is unsorted. Setting Sorted := True
, on the other hand, does both - it sets the internal flag and calls the Sort
method to sort the data.
To store any (non-string) data, we can use traditional TList
and TObjectList
classes or their more modern generic counterparts, TList<T>
and TObjectList<T>
. They all always work in an unsorted mode and so adding an element takes O(1) while finding and removing an element takes O(n) steps.
All provide a Sort
function which sorts the data with a quicksort algorithm (O(n log n) on average) but only generic versions have a BinarySearch
method, which searches for an element with a bisection search taking O(log n) steps. Be aware that BinarySearch
requires the list to be sorted but doesn't make any checks to assert that. It is your responsibility to sort the list before you use this function.
If you need a very quick element lookup, paired with a fast addition and removal, then TDictionary
is the solution. It has methods for adding (Add
), removing (Remove
) and finding a key (ContainsKey
and TryGetValue
) that, on average, function in a constant time, O(1). Their worst behavior is actually quite bad, O(n), but that will only occur on specially crafted sets of data that you will never see in practical applications.
I've told you before that there's no free lunch and so we can't expect that TDictionary
is perfect. The big limitation is that we can't access the elements it is holding in a direct way. In other words, there is no TDictionary[i]
. We can walk over all elements in a dictionary by using a for
statement, but we can't access any of its elements directly. Another limitation of TDictionary
is that it does not preserve the order in which elements were added.
Delphi also offers two simple data structures that mimic standard queue—TQueue<T>
—and stack—TStack<T>
. Both have very fast O(1) methods for adding and removing the data, but they don't offer any bells and whistles—there is no direct data access, we cannot search for data, and so on. We can only insert (Enqueue
in queue or Push
in stack) and remove (Dequeue
and Pop
) data.
To help you select the right tool for the job, I have put together a table showing the most important data structures and their most important methods, together with average and (when they differ from the average) worst-case time complexities:
Data structure | Operation | Average | Worst |
TStringList | Direct access | O(1) | |
Add | O(1) / O(log n) | ||
Insert | O(1) | ||
Delete | O(1) | ||
IndexOf | O(n) / O(log n) | ||
Sort | O(n log n) | O(n^{2}) | |
TList, TObjectList | Direct access | O(1) | |
Add | O(1) | ||
Insert | O(1) | ||
Delete | O(1) | ||
Remove | O(n) | ||
IndexOf | O(n) | ||
Sort | O(n log n) | O(n^{2}) | |
TList<T>, TObjectList<T> | Direct access | O(1) | |
Add | O(1) | ||
Insert | O(1) | ||
Delete | O(1) | ||
Remove | O(n) | ||
IndexOf | O(n) | ||
BinarySearch | O(log n) | ||
Sort | O(n log n) | O(n^{2}) | |
TDictionary | Direct access | Not possible | |
Add | O(1) | O(n) | |
Remove | O(1) | O(n) | |
TryGetValue | O(1) | O(n) | |
ContainsKey | O(1) | O(n) | |
ContainsValue | O(n) | ||
TQueue<T> | Direct access | Not possible | |
Enqueue | O(1) | ||
Dequeue | O(1) | ||
TStack<T> | Direct access | Not possible | |
Push | O(1) | ||
Pop | O(1) |
The table shows the time complexity of the most important operations on built-in Delphi data structures. Complexity for the worst case is only listed if it differs from the average complexity.
Enough with the theory already! I know that you, like me, prefer to talk through the code. As one program explains more than a thousand words could, I have prepared a simple demo project: RandomWordSearch
.
This program functions as a very convoluted random word generator. When started, it will load a list of 370,101 English words from a file. It will also prepare three internal data structures preloaded with these words:
Random word generator
The program shows three buttons to the user. All three run basically the same code. The only difference is the test function which is passed to the centralized word generator as a parameter:
procedure TfrmRandomWordSearch.FindGoodWord(const wordTest: TWordCheckDelegate); var word: string; isWordOK: boolean; time: TStopwatch; begin time := TStopwatch.StartNew; repeat word := GenerateWord; isWordOK := wordTest(word); until isWordOK or (time.ElapsedMilliseconds > 10000); if isWordOK then lbWords.ItemIndex := lbWords.Items.Add(Format('%s (%d ms)', [word, time.ElapsedMilliseconds])) else lbWords.ItemIndex := lbWords.Items.Add('timeout'); end;
The core of the FindGoodWord
method can be easily described:
- Generate a random word by calling
GenerateWord
. - Call the test function
wordTest
on that word. If this function returnsFalse
, repeat Step 1. Otherwise show the word.
The code is a bit more complicated because it also checks that the word generation part runs for at most 10 seconds and reports a timeout if no valid word was found in that time.
The random word generator GenerateWord
is incredibly simple. It just appends together lowercase English letters until the specified length (settable in the user interface) is reached:
function TfrmRandomWordSearch.GenerateWord: string; var pos: integer; begin Result := ''; for pos := 1 to inpWordLength.Value do Result := Result + Chr(Ord('a') + Random(Ord('z') - Ord('a') + 1)); end;
Let's now check the data preparation phase. The not very interesting (and not shown here) OnCreate
handler loads data from a file into a TStringList
and calls the LoadWords
method:
procedure TfrmRandomWordSearch.LoadWords(wordList: TStringList); var word: string; begin FWordsUnsorted := TStringList.Create; FWordsUnsorted.Assign(wordList); FWordsSorted := TStringList.Create; FWordsSorted.Assign(wordList); FWordsSorted.Sorted := True; FWordsDictionary := TDictionary<string,boolean>.Create(wordList.Count); for word in wordList do FWordsDictionary.Add(word, True); end;
The first data structure is an unsorted TStringList
, FWordsUnsorted
. Data is just copied from the list of all words by calling the Assign
method.
The second data structure is a sorted TStringList
, FWordsSorted
. Data is firstly copied from the list of all words. The list is then sorted by setting FWordsSorted.Sorted := True
.
The last data structure is a TDictionary
, FWordsDictionary
. A TDictionary
always stores pairs of keys and values. In our case, we only need the keys part as there is no data associated with any specific word, but Delphi doesn't allow us to ignore the values part and so the code defines the value as a boolean
and always sets it to True
.
Although the dictionaries can grow, they work faster if we can initially set the number of elements that will be stored inside. In this case that is simple—we can just use the length of the wordList
as a parameter to TDictionary.Create
.
The only interesting part of the code left is OnClick
handlers for all three buttons. All three call the FindGoodWord
method, but each passes in a different test function.
When you click on the Unsorted list
button, the test function checks whether the word can be found in the FWordsUnsorted
list by calling the IndexOf
function. As we will mostly be checking non-English words (remember, they are just random strings of letters), this IndexOf
will typically have to compare all 307,101 words before returning -1
:
procedure TfrmRandomWordSearch.btnUnsortedListClick(Sender: TObject); begin FindGoodWord( function (const word: string): boolean begin Result := FWordsUnsorted.IndexOf(word) >= 0; end); end;
When you click on the Sorted list
button, the test function calls FWordsSorted.IndexOf
. As this TStringList
is sorted, IndexOf
will use a binary search that will need at most log(307101) = 19 (rounded up) comparisons to find out that a word is not found in the list. As this is much less than 307.101, we can expect that finding words with this approach will be much faster:
procedure TfrmRandomWordSearch.btnSortedListClick(Sender: TObject); begin FindGoodWord( function (const word: string): boolean begin Result := FWordsSorted.IndexOf(word) >= 0; end); end;
A click on the last button, Dictionary
, calls FWordsDictionary.ContainsKey
to check if the word can be found in the dictionary, and that can usually be done in just one step. Admittedly, this is a bit of a slower operation than comparing two strings, but still the TDictionary
approach should be faster than any of the TStringList
methods:
procedure TfrmRandomWordSearch.btnDictionaryClick(Sender: TObject); begin FindGoodWord( function (const word: string): boolean begin Result := FWordsDictionary.ContainsKey(word); end); end;
If we use the terminology from the last section, we can say that the O(n) algorithm (unsorted list) will run much slower than the O(log n) algorithm (sorted list), and that the O(1) algorithm (dictionary) will be the fastest of them all. Let's check this in practice.
Start the program and click on the Unsorted list
button a few times. You'll see that it typically needs few hundred milliseconds to a few seconds to generate a new word. As the process is random and dependent on CPU speed, your numbers may differ quite a lot from mine. If you are only getting timeout messages, you are running on a slow machine and you should decrease the Word length
to 3
.
If you increment the Word length
to 5
and click the button again, you'll notice that the average calculation time will grow up to a few seconds. You may even get an occasional timeout. Increase it to 6
and you'll mostly be getting timeouts. We are clearly hitting the limits of this approach.
Prepare now to be dazed! Click on the Sorted list
button (while keeping the Word length
at 6
) and words will again be calculated blazingly fast. On my computer, the code only needs 10 to 100 milliseconds to find a new word:
Testing with different word length and with first two algorithms
To better see the difference between a sorted list and a dictionary, we have to crank up the word length again. Setting it to 7
worked well for me. The sorted list needed from a few 100 milliseconds to a few seconds to find a new word while the dictionary approach mostly found a new word in under 100 milliseconds.
Increase the Word length
to 8
and the sorted list will start to time out while the dictionary will still work. Our O(1) approach is indeed faster than the O(log n) code:
Comparing sorted list (first six words and the two after "timeout") with the dictionary approach (next five and last two).
While the knowledge of algorithms and their complexity is useful, it will not help in every occasion. Sometimes you already use the best possible algorithm but your program is still too slow. At such times, you'll want to find the slow parts of the program and speed them up with whatever trick possible. To do that, you have to find them first, which is the topic of the rest of this chapter.
While I was talking about theory and data structures and best practices, our friend Mr. Smith read everything about polar forests on the internet and then got really bored. In a moment of sheer panic—when he was thinking about watching videos of cute kittens for the whole polar night—he turned to programming. He checked various internet sources and decided that Delphi is the way to go. After all, he could use it to put together a new game for iOS and Android and make lots of money while waiting for the sun to show up!
As he didn't have any programming literature on the Antarctic base, he learned programming from internet resources. Sadly, he found most of his knowledge on Experts Exchange and Yahoo Answers and his programs sometimes reflect that.
As of this moment, he has written one working program (and by working I mean that it compiles), but he is not really sure what the program actually does. He only knows that the program is a bit slow and because of that he named it SlowCode
.
His program is a console mode program which upon startup calls the Test
method:
procedure Test; var data: TArray<Integer>; highBound: Integer; begin repeat Writeln('How many numbers (0 to exit)?'); Write('> '); Readln(highBound); if highBound = 0 then Exit; data := SlowMethod(highBound); ShowElements(data); until false; end;
That one, at least, is easy to grasp. It reads some number, passes it to something called SlowMethod
(hmm, Mr. Smith really should work on naming techniques), and then passes the result (which is of type TArray<Integer>
) to a method called ShowElements
. When the user types in 0
, the program exits.
Let's check the ShowElements
function:
function SlowMethod(highBound: Integer): TArray<Integer>; var i: Integer; temp: TList<Integer>; begin temp := TList<Integer>.Create; try for i := 2 to highBound do if not ElementInDataDivides(temp, i) then temp.Add(i); Result := Filter(temp); finally FreeAndNil(temp); end; end;
The code creates a list of integers. Then it iterates from a value 2 to the value that was entered by the user and for each value calls ElementInDataDivides
, passing in the list and the current value. If that function returns True
, the current value is entered into the list.
After that, SlowMethod
calls the Filter
method which does something with the list and converts it into an array of integers which is then returned as a function result:
function ElementInDataDivides(data: TList<Integer>; value: Integer): boolean; var i: Integer; begin Result := True; for i in data do if (value <> i) and ((value mod i) = 0) then Exit; Result := False; end;
The ElementInDataDivides
function iterates over all the numbers in the list and checks if any element in the list divides the value
(with the additional constraint that this element in the list must not be equal to the value
).
Let's check the last part of the puzzle—the Filter
function:
function Reverse(s: string): string; var ch: char; begin Result := ''; for ch in s do Result := ch + Result; end; function Filter(list: TList<Integer>): TArray<Integer>; var i: Integer; reversed: Integer; begin SetLength(Result, 0); for i in list do begin reversed := StrToInt(Reverse(IntToStr(i))); if not ElementInDataDivides(list, reversed) then begin SetLength(Result, Length(Result) + 1); Result[High(Result)] := i; end; end; end;
This one again iterates over the list, reverses the numbers in each element (changes 123 to 321, 3341 to 1433, and so on) and calls ElementInDataDivides
on the new number. If it returns True
, the element is added to the returned result in a fairly inefficient way.
I agree with Mr. Smith—it is hard to tell what the program does. Maybe it is easiest to run it and look at the output:
It looks like the program is outputting prime numbers. Not all prime numbers, just some of them. (For example, 19 is missing from the list, and so is 23.) Let's leave it at that for the moment.
We can tell more about the program, about its good and bad parts, if we look at it through the eyes of time complexity, in terms of the Big O notation.
We'll start where the code starts—in the SlowMethod
method. It has a loop iterating from 2
to the user-specified upper bound, for i := 2 to highBound do
. The size of our data, or n, is therefore equal to highBound
and this for
loop has time complexity of O(n):
for i := 2 to highBound do if not ElementInDataDivides(temp, i) then temp.Add(i);
Inside this loop, the code calls ElementInDataDivides
followed by an occasional temp.Add
. The latter will execute in O(1), but we can't say anything about the ElementInDataDivides
before we examine it.
This method also has a loop iterating over the data
list. We can't guess how many elements are in this list, but in the short test that we just performed, we know that the program writes out 13 elements when processing values from 2 to 100:
for i in data do if (value <> i) and ((value mod i) = 0) then Exit;
For the purpose of this very rough estimation I'll just guess that the for i in data do
loop also has a time complexity of O(n).
In SlowMethod
we therefore have an O(n) loop executing another O(n) loop for each element, which gives us a O(n^{2}) performance.
The SlowMethod
then calls the Filter
method which also contains O(n) for loop calling ElementInDataDivides
, which gives us O(n^{2}) complexity for this part:
for i in list do begin reversed := StrToInt(Reverse(IntToStr(i))); if not ElementInDataDivides(list, reversed) then begin SetLength(Result, Length(Result) + 1); Result[High(Result)] := i; end; end;
There's also a conversion to string, some operation on that string, and conversion back to the integer StrToInt(Reverse(IntToStr(i)))
. It works on all elements of the list (O(n)) but in each iteration it processes all characters in the string representation of a number. As a length of the number is proportional to log n, we can say that this part has complexity of O(n log n), which can be ignored as it is much less than the O(n^{2}) complexity of the whole method.
There are also some operations hidden inside SetLength
, but at this moment we don't know yet what they are and how much they contribute to the whole program. We'll cover that area in Chapter 4, Memory Management.
The SlowMethod
therefore consists of two parts, both with complexity O(n^{2}). Added together, that would give us 2*n^{2}, but as we ignore constant factors (that is, 2) in the Big O notation, we can only say that the time complexity of SlowMethod
is O(n^{2}).
So what can we say simply by looking at the code?
- The program probably runs in O(n^{2}) time. It will take around 100 times longer to process 10,000 elements than 1,000 elements.
- There is a conversion from the integer to the string and back (
Filter
), which has complexity of only O(n log n) but it would still be interesting to know how fast this code really is. - There's a time complexity hidden behind the
SetLength
call which we know nothing about. - We can guess that most probably the
ElementInDataDivides
is the most time-consuming part of the code and any improvements in this method would probably help. - Fixing the terrible idea of appending elements to an array with
SetLength
could probably speed up a program, too.
As the code performance is not everything, I would also like to inform Mr. Smith about a few places where his code is less than satisfactory:
- The prompt How many numbers is misleading. A user would probably expect it to represent the number of numbers outputted, while the program actually wants to know how many numbers to test.
- Appending to
TArray<T>
in that way is not a good idea. UseTList<T>
for temporary storage and call itsToArray
method at the end. If you needTArray<T>
, that is. You can also do all processing usingTList<T>
. SlowMethod
has two distinctive parts - data generation, which is coded as a part ofSlowMethod
, and data filtering, which is extracted in its own method,Filter
. It would be better if the first part is extracted into its own method, too.- Console program, really? Console programs are good for simple tests, but that is it. Learn VCL or FireMonkey, Mr. Smith!
We can now try and optimize parts of the code (ElementInDataDivides
seems to be a good target for that) or, better, we can do some measuring to confirm our suspicions with hard numbers.
In a more complicated program (what we call a real life), it would usually be much simpler to measure the program performance than to do such analysis. This approach, however, proves to be a powerful tool if you are using it while designing the code. Once you hear a little voice nagging about the time complexities all the time while you're writing a code, you'll be on the way to becoming an excellent programmer.