10.6 Shellsort
Although insertion sort works well for short lists, it tends to slow down
drastically with long lists. This is because of the large numbers of comparisons
that are needed to find the correct positions for insertions of data.
If, however, the elements of a list are almost in order, insertion sort is
very fast, even for long lists because in this case only a few comparisons
are needed to find the correct insertion point for each value. Shellsort, invented
by Donald Shell in 1959, adapts insertion sort to produce a sorting
technique that works well even with long lists that are randomly ordered.
With Shell’s adaptation of insertion sort, data can be moved long distances
with only a few comparisons.
The sort uses the following idea. We say that a list is k-sorted if,
starting at any point in the list, every kth element is in order. A k-sorted
list consists of k interleaved, sorted sublists. To illustrate the idea of a
k-sorted list, let us see how a list can be 3-sorted.
Example 1
Suppose that we are given an unsorted list of integers.
34 21 40 12 27 18 29 13 25 17 11 38
This can be decomposed into three sublists with the first list containing
the elements at positions 0, 3, 6, . . ., the second containing the elements
at positions 1, 4, 7, . . ., and the third containing the elements at positions
2, 5, 8, . . . .
34 12 29 17
21 27 13 11
40 18 25 38
|
|
If we now sort each sublist, we obtain
12 17 29 34
11 13 21 27
18 25 38 40
to give us a 3-sorted list.
12 11 18 17 13 25 29 21 38 34 27 40
What does this achieve? If we examine the original list of values in
Example 1 together with the 3-sorted list and the list as it would be if it
were completely ordered, we can see that in the 3-sorted list, many of the
elements have been moved long distances and most of them are quite close
to their final destination.
Unsorted: 34 21 40 12 27 18 29 13 25 17 11 38
3-sorted: 12 11 18 17 13 25 29 21 38 34 27 40
Sorted: 11 12 13 17 18 21 25 27 29 34 38 40
Without doing very much work to obtain the 3-sorted list, we have gone a
long way toward our final goal.
Shellsort repeatedly uses insertion sort to create sorted sublists. Initially
a k-sorted list is created using a large value of k so that values can
be moved long distances. On subsequent passes, other k-sorted lists are
formed using successively smaller values of k. On the final pass, the value
of k is set to one; a 1-sorted list is completely sorted.
Example 2
The tables show the results of using Shellsort on the values
64 31 10 40 22 49 82 20 29 56 40 18 19 27 26
in which successive passes of the sort creates lists that are 7-sorted, 3-sorted,
and 1-sorted.
Unsorted: 64 31 10 40 22 49 82 20 29 56 40 18 19 27 26
First Pass: 20 26 64
29 31
10 56
40 40
18 22
19 49
27 82
7-Sorted: 20 29 10 40 18 19 27 26 31 56 40 22 49 82 64
Second Pass: 20 27 40 49 56
18 26 29 40 82
10 19 22 31 64
3-Sorted: 20 18 10 27 26 19 40 29 22 49 40 31 56 82 64
Third Pass
1-Sorted: 10 18 19 20 22 26 27 29 31 40 40 49 56 64 82
|
|
In using Shellsort, we noted that the first pass should use a large
value of k so that values that had to be moved long distances could do
so quickly. We also noted that the last pass must use k = 1 in order to
guarantee a completely sorted list. It is natural to ask: what is the best
sequence of values of k? Surprisingly, the answer to that question is not
known, despite the fact that a good deal of work has been done on the
problem. One sequence, however, that has been found to give good results
is . . . , 40, 13, 4, 1. Working from the right end of this sequence, each term
is one more than three times as large as the term on its right.
Exercise 10.6
- Given the following data, show how they would appear after they
have been 5-sorted.
26 37 21 41 63 19 61 72 55 29 47 18 26 22
- Starting with the same set of data given in Question 1, show how
they would appear after they had been 4-sorted.
- In the text, we suggested that a good sequence of values of k ended
with 40, 13, 4, 1. Give the next three smallest terms of this sequence.
- How would you answer the following argument against using Shellsort?
“The last step of Shellsort, using k = 1, is simply a normal
insertion sort. Since Shellsort performs many preliminary steps before
this final one, it must be slower than a single insertion sort.”
- Suppose that you were going to write a version of Shellsort using
the sequence of k-sorts suggested in the text. For a list containing n
elements, the first value of k that should be used is the largest value
in the sequence that is less than n.
- Write a sequence of statements that will initialize k correctly for
a given value of n.
- Write a statement that will, for any value of k in the sequence,
produce the next smaller value of k.
- Write a method shellSort to sort an array of int values in
ascending order. In performing the k-sorts, use the sequence of
values of k suggested in the text. Be sure to use insertion sort
at each stage of the sort.
- Test your method by writing a complete program that first generates
an array of 500 random int values in the range [0, 999],
prints this array (ten values per line), sorts the array using your
shellSort method, and then prints the resulting array (ten values
per line).
- Experiment with using Shellsort with sequences for k other than the
one given in the text, testing your sequences on large arrays of integers
(generated as indicated in the previous question) and noting the time
required by the sort in each case. To measure the time taken by a
sort, you can use the method currentTimeMillis in the System
class. The method has the signature
public static long currentTimeMillis()
It returns, as a long value, the number of milliseconds since midnight
GMT on January 1, 1970. To use the method, you might write
long startTime = System.currentTimeMillis();
just before starting the sort and
int elapsedTime = (int)(System.currentTimeMillis() - startTime);
just after completing the sort. In choosing a sequence, remember
that the last value of k must be one but, beyond this, there are no
restrictions on the sequences. Can you find a sequence that performs
better than the one given in the text? (It is possible to beat the given
sequence; we chose it because it is fairly simple and reliable.)
|
|