Sorting Algorithms in Delphi
A frequently encountered task in programming is that of sorting a sequence of numeric values. Typically, the sequence is provided in the form of an array of values. The download at the bottom of this page provides efficient Delphi implementations for the following sorting algorithms
- Bubble Sort - The granddaddy of all sorting algorithms and the most inefficient of them all. Data elements bubble up to the the top or sink to the bottom of the sequence - slooowly.
- Selection Sort - Better than the bubble sort. Data elements find their rightful position in the sequence considerably faster. As a general rule the Insertion sort is better.
- Insertion Sort - Similar in many ways to the Selection sort. However, it involves fewer data value exchanges and data elements find their rightful position in the sequence in one single bound rather than a sequence of short hops as with the Bubble and Selection sorts.
- Heap Sort - This sort uses a mathematical structure called a binary heap - a heap created using a binary tree. Briefly, the rules used in building the tree are
- The value of a parent node is always greater than the values of its child nodes
- Each node in the tree has 0, 1 or 2 children
- When building the tree we endeavor to assign the maximum number of children to each nod
- In the first phase the elements of the orignal data sequence are reordered to confirm to the rules for a binary heap, described above.
- In the second phase the top of the heap, seq[0] which is the biggest value is placed at the end of the array. This results in two orphaned children being left at the top of the heap and the sequence is no longer a valid binary heap. We rebuild the heap but leave out the biggest value that has been moved to the end of the sequence. This process is repeated until the
heap
is reduced to one single value.
- Merge Sort - This algorithm, and those that follow, use a
divide-and-rule
strategy. Instead of attempting to sort the entire data sequence in a single step it recursively splits the sequence into two halves, sorts the two halves and then merges them back together. There are two downsides to this technique- Part or whole of the data sequence needs to be replicated - i.e. more memory is required. However, unless the data sequence is unusually large this is not a particularly significant constraint.
- The technique is recursive. With large data sets the recursion can get quite deep. Good coding is required to minimize the impact of such recursion. Typically, one should avoid the use of too many/any local variables in the recursive part of the technique.
- Quick Sort - This is also known as the Hoare sort, after its inventor. It works by recursively partitioning the data sequence into two parts such that all elements in the first part are smaller than or equal to those in the second part. The name
Quick Sort
is a little unfortunate. While it is without any doubt one of the fastest sorting algorithms it uses recursion and it is difficult to implement the recursive part without using local variables. As a result with large data sequences it is computationally inefficient. - Shell Sort - Named after its inventor, the Shell sort is the oldest of the
divide-and-rule
algorithms and a particularly elegant one. The technique works by treating the data sequence as a notional two-dimensional array and then sorting the columns in the array. The column sort is done by insertion. The process is repeated with a decreasing number of columns with the sortedness of the sequence increasing with each repetition. It should be noted that what is done is a repetition, not a recursion.
The Best Sorting Algorithm
Developers often spend a great deal of time in an effort to implement the "best" sorting algorithm. By and large such efforts are misguided and pointless. Except for a small number of truly number intensive applications the efficiency of the occasional sort will have no impact on the perceived speed of the application. Even the lowly bubble sort will do a perfectly good job. It is better to spend development time exploring one or more of the following ideas
- Don't Sort - the best sort is the one that does not have to be done. Consider if there is a way to avoid sorting. Alternately, try to improve the sortedness of your data as they are acquired rather than trying to sort a large mass of unordered collated data.
- Use a sort you understand - some of the more efficient sorting algorithms are not easily understood. Consider using the simplest sorting algorithm that provides acceptable performance. The time saved can be better used improving aspects of your application that really do matter to the user.
- Don't recurse - Recursion is computationally expensive and gets increasingly so if you use a large number of local variables inside the recursion and the data sequence to be sorted gets large. Examine means of reducing the number of local variables or, even better, use a non-recursive sort - even if it is not called
quick
.
Much has been written on the subject of sorting. Generally speaking the efficiency of a sorting algorithm is judged based on the relationship between the size of the data sequence, n and the time required for the sort. The sorting algorithms described here belong to two categories
- O(n 2 ) Algorithms - the time required is proportional to the square of n. The first three algorithms described here belong to this category.
- O(n*log 2 n) Algorithms - the others.
Needless to say this measure of efficiency does not fully account for the costs incurred by deep recursion, the excessive use of local variables etc. An excellent discussion of sorting algorithms can be found here.
Download