Under tight schedules and looming deadlines, C++ programmers tend to ignore the Standard Library algorithms and settle for rather suboptimal handcrafted solutions that get the job done. I know this, because I myself have done it more than I’d care to admit. The excuse is usually that the problem does not fit any of these neat, but somewhat detached, algorithms.
STL algorithms are actually very handy. They give names to commonly occurring (and often cryptic) raw loops1. Until recently, however, they required too much work to use effectively. Thankfully, with the introduction of type inference and lambda expressions in C++11, STL algorithms are now a lot easier to use.
In this post, I’ll be demonstrating the versatility of STL algorithms by implementing common sorting algorithms (insertion sort, in-place merge sort and randomized quicksort) using STL algorithms as building blocks2. Sorting is the kind of problem that requires moving data around, and when implemented using raw loops, they tend to be unreadable. We’ll see how STL algorithms can describe this code more succinctly and possibly more efficiently.
Insertion Sort
This is a C++ adaptation of insertion sort as it appears in CLRS3:
template <typename I>
void insertion_sort(I first, I last) {
if (first == last) {
return;
}
for (auto&& j = std::next(first); j != last; ++j) {
auto key = *j;
auto i = j;
while (i != first && *std::prev(i) > key) {
*i = *std::prev(i);
--i;
}
*i = key;
}
}
It’s compact and generic, but aside from the function name, it’s completely non-obvious what the nested loops are doing at first sight. Let’s see how to make this more readable.
By examining the inner loop, it looks like it searches backwards in the given
range for a value less than or equal to key
. It starts from the value just
before j
down to 0
. In doing so, it also shifts all the values it scans one
position to the right. Once the value is found (or we reach the beginning of the
range), the value at j
(i.e. key
) is then inserted.
Let’s pretend for a moment that the range is being scanned in the forward rather
than the backward direction. In this case, all we need is a function that
searches a range for an element (call it k
) that satisfies a certain
predicate. When k
is found (or we reach the end of the range), the function
has to insert the first element of the range before k
and shift all elements
between them one position to the left:
template <typename I, typename P>
void insert_first_before(I first, I last, P pred) {
auto&& second = std::next(first);
std::rotate(first, second,
std::find_if(second, last, pred));
}
If you’re surprised to see std::rotate()
being used here, you’re not alone.
Sean Parent’s GoingNative 2013
talk was an eye-opener for me in
this respect. Turns out std::rotate()
is one of the handiest algorithms in the
Standard Library. It takes three iterators: first
, middle
and last
. It
performs a left-rotate operation such that the elements between middle
and
last
are moved to the beginning of the range, and the elements between first
and middle
are moved to the end of the range. It also returns an iterator to
the beginning of the second range after rotation, which often comes in
handy4. Essentially, every time you find yourself manually
shifting elements, there’s almost always a better way to achieve the same effect
using std::rotate()
5.
Going back to our problem, what we originally needed was an algorithm that scans
a range backwards not forwards. This can be achieved easily using
reverse_iterator
s:
template <typename I, typename P>
void insert_last_after(I first, I last, P pred) {
(
insert_first_beforestd::make_reverse_iterator(last),
std::make_reverse_iterator(first), pred);
}
You may notice that the function name isn’t ideal. The word last
in the
function name refers to the last element in the range, while last
the formal
parameter, refers to the element one past the end of the range.
In any case, given insert_last_after()
, insertion_sort()
becomes trivial:
template <typename I>
void insertion_sort(I first, I last) {
if (first == last) {
return;
}
for (auto&& iter = std::next(first); iter != last; ++iter) {
(first, std::next(iter),
insert_last_after[iter] (auto&& value) { return value <= *iter; });
}
}
Notice that the arguments to insert_last_after()
are initialized with
iterators one position to the right of where the reverse_iterator
s are
supposed to point. This is because reverse_iterator
s refer to the position
right before the one they’re initialized with.
But do we really need a linear search? If we notice the loop invariant that the
range from first
to j
is always sorted, we can simply perform a binary
search instead. This can be achieved directly using std::upper_bound()
. Once
that element is found, we need to perform a rotate to insert the element at
iter
before it:
template <typename I>
void insertion_sort(I first, I last) {
if (first == last) {
return;
}
for (auto&& iter = std::next(first); iter != last; ++iter) {
std::rotate(std::upper_bound(first, iter, *iter),
, std::next(iter));
iter}
}
std::upper_bound()
returns the first element in a range that compares greater
than a look-up key (or last
if no such element exists). It does that in
O(log n) time by requiring the range to be sorted or at least partitioned
around the look-up key.
In-Place Merge Sort
The canonical implementation of merge sort partitions the given range in half and recursively sorts each partition. It then performs the merge step in a separate range from those of the two partitions. In-place merge sort, on the other hand, performs the merge step in-place. The two partitions are kept adjacent in memory constituting one big range. The two elements at the beginning of each partition are compared and the smallest is moved to the beginning of the combined range shifting elements to the right as necessary.
Following is the top-level mergesort()
procedure:
template <typename I>
void mergesort(I first, I last) {
auto&& size = std::distance(first, last);
if (size < 2) {
return;
}
auto&& mid = first + size / 2 + size % 2;
(first, mid);
mergesort(mid, last);
mergesort(first, mid, last);
merge}
As described, unless the given range has only one element, mergesort()
partitions the given range in half and recursively sorts each partition.
It finally calls the in-place merge()
procedure to merge the two sorted
partitions. The two partitions are adjacent in memory with iterator mid
pointing at the beginning of the second partition.
Here’s one way to implement the in-place merge()
procedure:
template <typename I>
void merge(I first, I mid, I last) {
while (first != mid && mid != last) {
auto iter = mid;
= std::upper_bound(first, mid, *mid);
first = std::upper_bound(mid, last, *first);
mid std::rotate(first, iter, mid);
}
}
In every iteration, merge()
does the following:
It skips over the first few elements in the first partition that are less than the first element of the second partition (i.e.
*mid
). These elements are essentially in their correct final location, so no need to move them. It does that by callingstd::upper_bound()
on the first partition and assigning the result back tofirst
.Next, it does the same thing for the second partition. It skips over the first few elements in the second partition that are greater than the value pointed at by
first
. This time it does it because these elements are all going to have to be moved beforefirst
, so there’s no point in moving them one-by-one if they can be moved as a batch.Finally, it calls
std::rotate()
to move those elements beforefirst
.
The loop stops when either of the two moving partitions is totally consumed. Neat, isn’t it?!6
Randomized Quicksort
Quicksort is by far the most widely used sorting algorithm. Even though its worst case running time is O(n2), its expected running time is O(n log n). It works by partitioning the given range around some pivot chosen from the range (usually the first element). It then recursively partitions the two resulting ranges until the original range is fully sorted. The choice of a pivot from a fixed location every time is known to degrade the algorithm’s performance to its worst case when the input range is originally sorted (forwards or backwards). A common way to mitigate this effect is to choose the pivot randomly among the range elements every time. This last variation is commonly known as randomized quicksort.
Another important aspect about quicksort is that it doesn’t guarantee the stability of the resulting range unless the partitioning algorithm is stable. This means that equivalent elements might not end up sorted in the same order they used to be in originally. That’s okay for primitive types, but it’s not always okay for composite types.
Stable partitioning is typically more expensive and is not always needed, so it
would be nice to have two versions of quicksort()
: a stable and an unstable
version. This suggests that the partitioning algorithm should be a parameter to
quicksort()
, but that would make the interface very inconvenient to use.
Luckily, we can hide these petty details behind good looking interfaces:
template <typename I>
void quicksort(I first, I last) {
auto&& engine = std::default_random_engine(
std::chrono::system_clock::now().time_since_epoch().count());
(first, last, std::move(engine),
quicksort_impl<I, decltype(std::distance(first, last))>);
quicksort_partition}
template <typename I>
void stable_quicksort(I first, I last) {
auto&& engine = std::default_random_engine(
std::chrono::system_clock::now().time_since_epoch().count());
(first, last, std::move(engine),
quicksort_impl<I, decltype(std::distance(first, last))>);
quicksort_stable_partition}
Both functions initialize a random number generation engine from a unique seed
(the current time) and pass it along with the right partitioning algorithm to
quicksort_impl()
which does the real work.
It’s tempting to think that std::partition
and std::stable_partition
could
directly be used as partition procedures for quicksort, but that’s not possible.
Both std::partition
and std::stable_partition
take a range and a unary
predicate. They partition the range in-place into two ranges such that every
element that satisfies the predicate ends up in the first range and every
element that doesn’t satisfy the predicate ends up in the second range. They
both return an iterator to the beginning of the second range. Obviously,
std::stable_partition
does this in a stable manner!
Quicksort partitioning requires that the pivot be chosen from the given range and that it ends up exactly between the two resulting ranges. STL partitioning algorithms do not guarantee that the pivot will end up between the two partitions, even if it exists in the range.
Consider the following range:
[8, 2, 10, 7, 34, 9]
Here’s a perfectly valid partitioning using a “less-than 9” predicate:
[8, 2, 7, 10, 34, 9]
The algorithm would return an iterator to 10
. Every element before this
iterator satisfies the predicate and every element starting at that iterator
does not satisfy the predicate. Even though 9
is in the range, it’s not
required to end up between the two partitions.
So we need to implement two quicksort partitioning procedures that take the pivot offset and perform partitioning as required by the quicksort algorithm, possibly making use of STL partitioning algorithms:
template <typename I, typename S>
(I first, I last, S offset) {
I quicksort_partitionassert(offset >= 0 && offset < std::distance(first, last));
std::iter_swap(first, std::next(first, offset));
auto&& ret =
std::prev(std::partition(next(first), last,
[first] (auto&& val) { return val < *first; }));
std::iter_swap(first, ret);
return ret;
}
template <typename I, typename S>
(I first, I last, S offset) {
I quicksort_stable_partitionassert(offset >= 0 && offset < std::distance(first, last));
auto&& pivot_iter = std::next(first, offset);
std::rotate(first, pivot_iter, std::next(pivot_iter));
auto&& ret =
std::prev(std::stable_partition(std::next(first), last,
[first] (auto&& val) { return val < *first; }));
std::rotate(first, std::next(first), std::next(ret));
return ret;
}
The unstable partitioning function swaps the pivot with first element of the range, performs partitioning over the full range excluding the pivot, swaps the pivot back with the last element in the first range and returns an iterator to that location.
std::iter_swap()
swaps range elements by iterators rather than references.
The stable partitioning function however cannot swap elements arbitrarily in the
range because this changes their relative order, so it needs to use
std::rotate()
to bring the pivot to the front and back to its final location.
The remaining piece now is quicksort_impl()
:
template <typename I, typename E, typename F>
void quicksort_impl(I first, I last, E&& engine, F partitioner) {
auto size = std::distance(first, last);
if (size < 2) {
return;
}
std::uniform_int_distribution<decltype(size)>
(0, size - 1);
distribution
auto mid = partitioner(first, last, distribution(engine));
(first, mid, engine, partitioner);
quicksort_impl(std::next(mid), last, engine, partitioner);
quicksort_impl}
It simply takes the random number generation engine and partitioning algorithm from the convenience wrappers, generates a uniformly distributed random index in the given range and passes it as a pivot offset to the right partitioning algorithm. It gets back an iterator to the middle of the partitioned range and recursively sorts the resulting two partitions.
Conclusion
We saw variations of three common sorting algorithms implemented generically and compactly using STL algorithms. In general, STL algorithms are more applicable than they look. The key to utilizing them is to always ask whether the raw loop I’m about to write (or the one I’m reading) can be replaced by a packaged STL algorithm. You’d be surprised how many times this question can be answered affirmatively.
``One thing every sorcerer will tell you is if you have the name of a spirit, you have power over it.’’ —Gerald Jay Sussman↩︎
Obviously I won’t be using
std::sort()
!↩︎Introduction to Algorithms, by Cormen et al.↩︎
Before C++11,
std::rotate()
used to returnvoid
.↩︎std::rotate()
can be implemented in O(n) time using three reverses. In fact, this implementation results in a number of swaps almost exactly equals to n.↩︎I was informed that this merge implementation is simple but not ideal. In-place merging can be done in O(n log n) worst case time but the implementation is more complex.↩︎