Under tight schedules and looming deadlines, C++ programmers tend to ignore the Standard Library algorithms and settle for rather suboptimal handcrafted solutions that get the job done. I know this, because I myself have done it more than I’d care to admit. The excuse is usually that the problem does not fit any of these neat, but somewhat detached, algorithms.

STL algorithms are actually very handy. They give names to commonly occurring (and often cryptic) raw loops1. Until recently, however, they required too much work to use effectively. Thankfully, with the introduction of type inference and lambda expressions in C++11, STL algorithms are now a lot easier to use.

In this post, I’ll be demonstrating the versatility of STL algorithms by implementing common sorting algorithms (insertion sort, in-place merge sort and randomized quicksort) using STL algorithms as building blocks2. Sorting is the kind of problem that requires moving data around, and when implemented using raw loops, they tend to be unreadable. We’ll see how STL algorithms can describe this code more succinctly and possibly more efficiently.

### Insertion Sort

This is a C++ adaptation of insertion sort as it appears in CLRS3:

template <typename I>
void insertion_sort(I first, I last) {
if (first == last) {
return;
}

for (auto&& j = std::next(first); j != last; ++j) {
auto key = *j;
auto i = j;
while (i != first && *std::prev(i) > key) {
*i = *std::prev(i);
--i;
}

*i = key;
}
}

It’s compact and generic, but aside from the function name, it’s completely non-obvious what the nested loops are doing at first sight. Let’s see how to make this more readable.

By examining the inner loop, it looks like it searches backwards in the given range for a value less than or equal to key. It starts from the value just before j down to 0. In doing so, it also shifts all the values it scans one position to the right. Once the value is found (or we reach the beginning of the range), the value at j (i.e. key) is then inserted.

Let’s pretend for a moment that the range is being scanned in the forward rather than the backward direction. In this case, all we need is a function that searches a range for an element (call it k) that satisfies a certain predicate. When k is found (or we reach the end of the range), the function has to insert the first element of the range before k and shift all elements between them one position to the left:

template <typename I, typename P>
void insert_first_before(I first, I last, P pred) {
auto&& second = std::next(first);
std::rotate(first, second,
std::find_if(second, last, pred));
}

If you’re surprised to see std::rotate() being used here, you’re not alone. Sean Parent’s GoingNative 2013 talk was an eye-opener for me in this respect. Turns out std::rotate() is one of the handiest algorithms in the Standard Library. It takes three iterators: first, middle and last. It performs a left-rotate operation such that the elements between middle and last are moved to the beginning of the range, and the elements between first and middle are moved to the end of the range. It also returns an iterator to the beginning of the second range after rotation, which often comes in handy4. Essentially, every time you find yourself manually shifting elements, there’s almost always a better way to achieve the same effect using std::rotate()5.

Going back to our problem, what we originally needed was an algorithm that scans a range backwards not forwards. This can be achieved easily using reverse_iterators:

template <typename I, typename P>
void insert_last_after(I first, I last, P pred) {
insert_first_before(
std::make_reverse_iterator(last),
std::make_reverse_iterator(first), pred);
}

You may notice that the function name isn’t ideal. The word last in the function name refers to the last element in the range, while last the formal parameter, refers to the element one past the end of the range.

In any case, given insert_last_after(), insertion_sort() becomes trivial:

template <typename I>
void insertion_sort(I first, I last) {
if (first == last) {
return;
}

for (auto&& iter = std::next(first); iter != last; ++iter) {
insert_last_after(first, std::next(iter),
[iter] (auto&& value) { return value <= *iter; });
}
}

Notice that the arguments to insert_last_after() are initialized with iterators one position to the right of where the reverse_iterators are supposed to point. This is because reverse_iterators refer to the position right before the one they’re initialized with.

But do we really need a linear search? If we notice the loop invariant that the range from first to j is always sorted, we can simply perform a binary search instead. This can be achieved directly using std::upper_bound(). Once that element is found, we need to perform a rotate to insert the element at iter before it:

template <typename I>
void insertion_sort(I first, I last) {
if (first == last) {
return;
}

for (auto&& iter = std::next(first); iter != last; ++iter) {
std::rotate(std::upper_bound(first, iter, *iter),
iter, std::next(iter));
}
}

std::upper_bound() returns the first element in a range that compares greater than a look-up key (or last if no such element exists). It does that in O(log n) time by requiring the range to be sorted or at least partitioned around the look-up key.

### In-Place Merge Sort

The canonical implementation of merge sort partitions the given range in half and recursively sorts each partition. It then performs the merge step in a separate range from those of the two partitions. In-place merge sort, on the other hand, performs the merge step in-place. The two partitions are kept adjacent in memory constituting one big range. The two elements at the beginning of each partition are compared and the smallest is moved to the beginning of the combined range shifting elements to the right as necessary.

Following is the top-level mergesort() procedure:

template <typename I>
void mergesort(I first, I last) {
auto&& size = std::distance(first, last);
if (size < 2) {
return;
}

auto&& mid = first + size / 2 + size % 2;

mergesort(first, mid);
mergesort(mid, last);
merge(first, mid, last);
}

As described, unless the given range has only one element, mergesort() partitions the given range in half and recursively sorts each partition. It finally calls the in-place merge() procedure to merge the two sorted partitions. The two partitions are adjacent in memory with iterator mid pointing at the beginning of the second partition.

Here’s one way to implement the in-place merge() procedure:

template <typename I>
void merge(I first, I mid, I last) {
while (first != mid && mid != last) {
auto iter = mid;
first = std::upper_bound(first, mid, *mid);
mid = std::upper_bound(mid, last, *first);
std::rotate(first, iter, mid);
}
}

In every iteration, merge() does the following:

1. It skips over the first few elements in the first partition that are less than the first element of the second partition (i.e. *mid). These elements are essentially in their correct final location, so no need to move them. It does that by calling std::upper_bound() on the first partition and assigning the result back to first.

2. Next, it does the same thing for the second partition. It skips over the first few elements in the second partition that are greater than the value pointed at by first. This time it does it because these elements are all going to have to be moved before first, so there’s no point in moving them one-by-one if they can be moved as a batch.

3. Finally, it calls std::rotate() to move those elements before first.

The loop stops when either of the two moving partitions is totally consumed. Neat, isn’t it?!6

### Randomized Quicksort

Quicksort is by far the most widely used sorting algorithm. Even though its worst case running time is O(n2), its expected running time is O(n log n). It works by partitioning the given range around some pivot chosen from the range (usually the first element). It then recursively partitions the two resulting ranges until the original range is fully sorted. The choice of a pivot from a fixed location every time is known to degrade the algorithm’s performance to its worst case when the input range is originally sorted (forwards or backwards). A common way to mitigate this effect is to choose the pivot randomly among the range elements every time. This last variation is commonly known as randomized quicksort.

Another important aspect about quicksort is that it doesn’t guarantee the stability of the resulting range unless the partitioning algorithm is stable. This means that equivalent elements might not end up sorted in the same order they used to be in originally. That’s okay for primitive types, but it’s not always okay for composite types.

Stable partitioning is typically more expensive and is not always needed, so it would be nice to have two versions of quicksort(): a stable and an unstable version. This suggests that the partitioning algorithm should be a parameter to quicksort(), but that would make the interface very inconvenient to use. Luckily, we can hide these petty details behind good looking interfaces:

template <typename I>
void quicksort(I first, I last) {
auto&& engine = std::default_random_engine(
std::chrono::system_clock::now().time_since_epoch().count());

quicksort_impl(first, last, std::move(engine),
quicksort_partition<I, decltype(std::distance(first, last))>);
}

template <typename I>
void stable_quicksort(I first, I last) {
auto&& engine = std::default_random_engine(
std::chrono::system_clock::now().time_since_epoch().count());

quicksort_impl(first, last, std::move(engine),
quicksort_stable_partition<I, decltype(std::distance(first, last))>);
}

Both functions initialize a random number generation engine from a unique seed (the current time) and pass it along with the right partitioning algorithm to quicksort_impl() which does the real work.

It’s tempting to think that std::partition and std::stable_partition could directly be used as partition procedures for quicksort, but that’s not possible.

Both std::partition and std::stable_partition take a range and a unary predicate. They partition the range in-place into two ranges such that every element that satisfies the predicate ends up in the first range and every element that doesn’t satisfy the predicate ends up in the second range. They both return an iterator to the beginning of the second range. Obviously, std::stable_partition does this in a stable manner!

Quicksort partitioning requires that the pivot be chosen from the given range and that it ends up exactly between the two resulting ranges. STL partitioning algorithms do not guarantee that the pivot will end up between the two partitions, even if it exists in the range.

Consider the following range:

[8, 2, 10, 7, 34, 9]

Here’s a perfectly valid partitioning using a “less-than 9” predicate:

[8, 2, 7, 10, 34, 9]

The algorithm would return an iterator to 10. Every element before this iterator satisfies the predicate and every element starting at that iterator does not satisfy the predicate. Even though 9 is in the range, it’s not required to end up between the two partitions.

So we need to implement two quicksort partitioning procedures that take the pivot offset and perform partitioning as required by the quicksort algorithm, possibly making use of STL partitioning algorithms:

template <typename I, typename S>
I quicksort_partition(I first, I last, S offset) {
assert(offset >= 0 && offset < std::distance(first, last));

std::iter_swap(first, std::next(first, offset));

auto&& ret =
std::prev(std::partition(next(first), last,
[first] (auto&& val) { return val < *first; }));

std::iter_swap(first, ret);

return ret;
}

template <typename I, typename S>
I quicksort_stable_partition(I first, I last, S offset) {
assert(offset >= 0 && offset < std::distance(first, last));

auto&& pivot_iter = std::next(first, offset);
std::rotate(first, pivot_iter, std::next(pivot_iter));

auto&& ret =
std::prev(std::stable_partition(std::next(first), last,
[first] (auto&& val) { return val < *first; }));

std::rotate(first, std::next(first), std::next(ret));

return ret;
}

The unstable partitioning function swaps the pivot with first element of the range, performs partitioning over the full range excluding the pivot, swaps the pivot back with the last element in the first range and returns an iterator to that location.

std::iter_swap() swaps range elements by iterators rather than references.

The stable partitioning function however cannot swap elements arbitrarily in the range because this changes their relative order, so it needs to use std::rotate() to bring the pivot to the front and back to its final location.

The remaining piece now is quicksort_impl():

template <typename I, typename E, typename F>
void quicksort_impl(I first, I last, E&& engine, F partitioner) {
auto size = std::distance(first, last);
if (size < 2) {
return;
}

std::uniform_int_distribution<decltype(size)>
distribution(0, size - 1);

auto mid = partitioner(first, last, distribution(engine));
quicksort_impl(first, mid, engine, partitioner);
quicksort_impl(std::next(mid), last, engine, partitioner);
}

It simply takes the random number generation engine and partitioning algorithm from the convenience wrappers, generates a uniformly distributed random index in the given range and passes it as a pivot offset to the right partitioning algorithm. It gets back an iterator to the middle of the partitioned range and recursively sorts the resulting two partitions.

### Conclusion

We saw variations of three common sorting algorithms implemented generically and compactly using STL algorithms. In general, STL algorithms are more applicable than they look. The key to utilizing them is to always ask whether the raw loop I’m about to write (or the one I’m reading) can be replaced by a packaged STL algorithm. You’d be surprised how many times this question can be answered affirmatively.

1. One thing every sorcerer will tell you is if you have the name of a spirit, you have power over it.’’ —Gerald Jay Sussman↩︎

2. Obviously I won’t be using std::sort()!↩︎

3. Introduction to Algorithms, by Cormen et al.↩︎

4. Before C++11, std::rotate() used to return void.↩︎

5. std::rotate() can be implemented in O(n) time using three reverses. In fact, this implementation results in a number of swaps almost exactly equals to n.↩︎

6. I was informed that this merge implementation is simple but not ideal. In-place merging can be done in O(n log n) worst case time but the implementation is more complex.↩︎

Tags: C++, STL, Algorithms.