Haitham Gad's Website

STL Algorithms in Action

March 25, 2016

Under tight schedules and looming deadlines, C++ programmers tend to ignore the Standard Library algorithms and settle for rather suboptimal handcrafted solutions that get the job done. I know this, because I myself have done it more than I’d care to admit. The excuse is usually that the problem does not fit any of these neat, but somewhat detached, algorithms.

STL algorithms are actually very handy. They give names to commonly occurring (and often cryptic) raw loops¹. Until recently, however, they required too much work to use effectively. Thankfully, with the introduction of type inference and lambda expressions in C++11, STL algorithms are now a lot easier to use.

In this post, I’ll be demonstrating the versatility of STL algorithms by implementing common sorting algorithms (insertion sort, in-place merge sort and randomized quicksort) using STL algorithms as building blocks². Sorting is the kind of problem that requires moving data around, and when implemented using raw loops, they tend to be unreadable. We’ll see how STL algorithms can describe this code more succinctly and possibly more efficiently.

Insertion Sort

This is a C++ adaptation of insertion sort as it appears in CLRS³:

template <typename I>
void insertion_sort(I first, I last) {
  if (first == last) {
    return;
  }

  for (auto&& j = std::next(first); j != last; ++j) {
    auto key = *j;
    auto i = j;
    while (i != first && *std::prev(i) > key) {
      *i = *std::prev(i);
      --i;
    }

    *i = key;
  }
}

It’s compact and generic, but aside from the function name, it’s completely non-obvious what the nested loops are doing at first sight. Let’s see how to make this more readable.

By examining the inner loop, it looks like it searches backwards in the given range for a value less than or equal to key. It starts from the value just before j down to 0. In doing so, it also shifts all the values it scans one position to the right. Once the value is found (or we reach the beginning of the range), the value at j (i.e. key) is then inserted.

Let’s pretend for a moment that the range is being scanned in the forward rather than the backward direction. In this case, all we need is a function that searches a range for an element (call it k) that satisfies a certain predicate. When k is found (or we reach the end of the range), the function has to insert the first element of the range before k and shift all elements between them one position to the left:

template <typename I, typename P>
void insert_first_before(I first, I last, P pred) {
  auto&& second = std::next(first);
  std::rotate(first, second,
    std::find_if(second, last, pred));
}

If you’re surprised to see std::rotate() being used here, you’re not alone. Sean Parent’s GoingNative 2013 talk was an eye-opener for me in this respect. Turns out std::rotate() is one of the handiest algorithms in the Standard Library. It takes three iterators: first, middle and last. It performs a left-rotate operation such that the elements between middle and last are moved to the beginning of the range, and the elements between first and middle are moved to the end of the range. It also returns an iterator to the beginning of the second range after rotation, which often comes in handy⁴. Essentially, every time you find yourself manually shifting elements, there’s almost always a better way to achieve the same effect using std::rotate()⁵.

Going back to our problem, what we originally needed was an algorithm that scans a range backwards not forwards. This can be achieved easily using reverse_iterators:

template <typename I, typename P>
void insert_last_after(I first, I last, P pred) {
  insert_first_before(
    std::make_reverse_iterator(last),
    std::make_reverse_iterator(first), pred);
}

You may notice that the function name isn’t ideal. The word last in the function name refers to the last element in the range, while last the formal parameter, refers to the element one past the end of the range.

In any case, given insert_last_after(), insertion_sort() becomes trivial:

template <typename I>
void insertion_sort(I first, I last) {
  if (first == last) {
    return;
  }

  for (auto&& iter = std::next(first); iter != last; ++iter) {
    insert_last_after(first, std::next(iter),
      [iter] (auto&& value) { return value <= *iter; });
  }
}

Notice that the arguments to insert_last_after() are initialized with iterators one position to the right of where the reverse_iterators are supposed to point. This is because reverse_iterators refer to the position right before the one they’re initialized with.

But do we really need a linear search? If we notice the loop invariant that the range from first to j is always sorted, we can simply perform a binary search instead. This can be achieved directly using std::upper_bound(). Once that element is found, we need to perform a rotate to insert the element at iter before it:

template <typename I>
void insertion_sort(I first, I last) {
  if (first == last) {
    return;
  }

  for (auto&& iter = std::next(first); iter != last; ++iter) {
    std::rotate(std::upper_bound(first, iter, *iter),
                iter, std::next(iter));
  }
}

std::upper_bound() returns the first element in a range that compares greater than a look-up key (or last if no such element exists). It does that in O(log n) time by requiring the range to be sorted or at least partitioned around the look-up key.

In-Place Merge Sort

The canonical implementation of merge sort partitions the given range in half and recursively sorts each partition. It then performs the merge step in a separate range from those of the two partitions. In-place merge sort, on the other hand, performs the merge step in-place. The two partitions are kept adjacent in memory constituting one big range. The two elements at the beginning of each partition are compared and the smallest is moved to the beginning of the combined range shifting elements to the right as necessary.

Following is the top-level mergesort() procedure:

template <typename I>
void mergesort(I first, I last) {
  auto&& size = std::distance(first, last);
  if (size < 2) {
    return;
  }

  auto&& mid = first + size / 2 + size % 2;

  mergesort(first, mid);
  mergesort(mid, last);
  merge(first, mid, last);
}

As described, unless the given range has only one element, mergesort() partitions the given range in half and recursively sorts each partition. It finally calls the in-place merge() procedure to merge the two sorted partitions. The two partitions are adjacent in memory with iterator mid pointing at the beginning of the second partition.

Here’s one way to implement the in-place merge() procedure:

template <typename I>
void merge(I first, I mid, I last) {
  while (first != mid && mid != last) {
    auto iter = mid;
    first = std::upper_bound(first, mid, *mid);
    mid = std::upper_bound(mid, last, *first);
    std::rotate(first, iter, mid);
  }
}

In every iteration, merge() does the following:

It skips over the first few elements in the first partition that are less than the first element of the second partition (i.e. *mid). These elements are essentially in their correct final location, so no need to move them. It does that by calling std::upper_bound() on the first partition and assigning the result back to first.
Next, it does the same thing for the second partition. It skips over the first few elements in the second partition that are greater than the value pointed at by first. This time it does it because these elements are all going to have to be moved before first, so there’s no point in moving them one-by-one if they can be moved as a batch.
Finally, it calls std::rotate() to move those elements before first.

The loop stops when either of the two moving partitions is totally consumed. Neat, isn’t it?!⁶

Randomized Quicksort

Quicksort is by far the most widely used sorting algorithm. Even though its worst case running time is O(n²), its expected running time is O(n log n). It works by partitioning the given range around some pivot chosen from the range (usually the first element). It then recursively partitions the two resulting ranges until the original range is fully sorted. The choice of a pivot from a fixed location every time is known to degrade the algorithm’s performance to its worst case when the input range is originally sorted (forwards or backwards). A common way to mitigate this effect is to choose the pivot randomly among the range elements every time. This last variation is commonly known as randomized quicksort.

Another important aspect about quicksort is that it doesn’t guarantee the stability of the resulting range unless the partitioning algorithm is stable. This means that equivalent elements might not end up sorted in the same order they used to be in originally. That’s okay for primitive types, but it’s not always okay for composite types.

Stable partitioning is typically more expensive and is not always needed, so it would be nice to have two versions of quicksort(): a stable and an unstable version. This suggests that the partitioning algorithm should be a parameter to quicksort(), but that would make the interface very inconvenient to use. Luckily, we can hide these petty details behind good looking interfaces:

template <typename I>
void quicksort(I first, I last) {
  auto&& engine = std::default_random_engine(
    std::chrono::system_clock::now().time_since_epoch().count());

  quicksort_impl(first, last, std::move(engine),
    quicksort_partition<I, decltype(std::distance(first, last))>);
}

template <typename I>
void stable_quicksort(I first, I last) {
  auto&& engine = std::default_random_engine(
    std::chrono::system_clock::now().time_since_epoch().count());

  quicksort_impl(first, last, std::move(engine),
    quicksort_stable_partition<I, decltype(std::distance(first, last))>);
}

Both functions initialize a random number generation engine from a unique seed (the current time) and pass it along with the right partitioning algorithm to quicksort_impl() which does the real work.

It’s tempting to think that std::partition and std::stable_partition could directly be used as partition procedures for quicksort, but that’s not possible.

Both std::partition and std::stable_partition take a range and a unary predicate. They partition the range in-place into two ranges such that every element that satisfies the predicate ends up in the first range and every element that doesn’t satisfy the predicate ends up in the second range. They both return an iterator to the beginning of the second range. Obviously, std::stable_partition does this in a stable manner!

Quicksort partitioning requires that the pivot be chosen from the given range and that it ends up exactly between the two resulting ranges. STL partitioning algorithms do not guarantee that the pivot will end up between the two partitions, even if it exists in the range.

Consider the following range:

[8, 2, 10, 7, 34, 9]

Here’s a perfectly valid partitioning using a “less-than 9” predicate:

[8, 2, 7, 10, 34, 9]

The algorithm would return an iterator to 10. Every element before this iterator satisfies the predicate and every element starting at that iterator does not satisfy the predicate. Even though 9 is in the range, it’s not required to end up between the two partitions.

So we need to implement two quicksort partitioning procedures that take the pivot offset and perform partitioning as required by the quicksort algorithm, possibly making use of STL partitioning algorithms:

template <typename I, typename S>
I quicksort_partition(I first, I last, S offset) {
  assert(offset >= 0 && offset < std::distance(first, last));

  std::iter_swap(first, std::next(first, offset));

  auto&& ret =
    std::prev(std::partition(next(first), last,
      [first] (auto&& val) { return val < *first; }));

  std::iter_swap(first, ret);

  return ret;
}

template <typename I, typename S>
I quicksort_stable_partition(I first, I last, S offset) {
  assert(offset >= 0 && offset < std::distance(first, last));

  auto&& pivot_iter = std::next(first, offset);
  std::rotate(first, pivot_iter, std::next(pivot_iter));

  auto&& ret =
    std::prev(std::stable_partition(std::next(first), last,
           [first] (auto&& val) { return val < *first; }));

  std::rotate(first, std::next(first), std::next(ret));

  return ret;
}

The unstable partitioning function swaps the pivot with first element of the range, performs partitioning over the full range excluding the pivot, swaps the pivot back with the last element in the first range and returns an iterator to that location.

std::iter_swap() swaps range elements by iterators rather than references.

The stable partitioning function however cannot swap elements arbitrarily in the range because this changes their relative order, so it needs to use std::rotate() to bring the pivot to the front and back to its final location.

The remaining piece now is quicksort_impl():

template <typename I, typename E, typename F>
void quicksort_impl(I first, I last, E&& engine, F partitioner) {
  auto size = std::distance(first, last);
  if (size < 2) {
    return;
  }

  std::uniform_int_distribution<decltype(size)>
    distribution(0, size - 1);

  auto mid = partitioner(first, last, distribution(engine));
  quicksort_impl(first, mid, engine, partitioner);
  quicksort_impl(std::next(mid), last, engine, partitioner);
}

It simply takes the random number generation engine and partitioning algorithm from the convenience wrappers, generates a uniformly distributed random index in the given range and passes it as a pivot offset to the right partitioning algorithm. It gets back an iterator to the middle of the partitioned range and recursively sorts the resulting two partitions.

Conclusion

We saw variations of three common sorting algorithms implemented generically and compactly using STL algorithms. In general, STL algorithms are more applicable than they look. The key to utilizing them is to always ask whether the raw loop I’m about to write (or the one I’m reading) can be replaced by a packaged STL algorithm. You’d be surprised how many times this question can be answered affirmatively.

``One thing every sorcerer will tell you is if you have the name of a spirit, you have power over it.’’ —Gerald Jay Sussman↩︎
Obviously I won’t be using std::sort()!↩︎
Introduction to Algorithms, by Cormen et al.↩︎
Before C++11, std::rotate() used to return void.↩︎
std::rotate() can be implemented in O(n) time using three reverses. In fact, this implementation results in a number of swaps almost exactly equals to n.↩︎
I was informed that this merge implementation is simple but not ideal. In-place merging can be done in O(n log n) worst case time but the implementation is more complex.↩︎

Object Inspection in GDB

May 24, 2014

One of the primary needs of a developer, while debugging any code, is to be able to inspect, not only primitive values, but also compound values (like objects). The problem is that objects are usually littered with lots of irrelevant information that obscure the essence of what the developer might be interested in. It becomes even worse when the interesting pieces of an object are in dynamic memory, in which case all you get to see are dumb pointers. Consider the following code:

#include <vector>

int main() {
  std::vector<int> v;
  v.push_back(1);
  v.push_back(2);
  v.push_back(3);
  v.push_back(4);
  v.push_back(5);

  return 0;
}

Here’s the output from GDB when attempting to print v just before returning¹:

(gdb) print v
$1 = {
  <std::__1::__vector_base<int, std::__1::allocator<int> >> = {
    <std::__1::__vector_base_common<true>> = {<No data fields>},
    members of std::__1::__vector_base<int, std::__1::allocator<int> >:
    __begin_ = 0x100103a70,
    __end_ = 0x100103a84,
    __end_cap_ = {
      <std::__1::__libcpp_compressed_pair_imp<int*, std::__1::allocator<int>, 2>> = {
        <std::__1::allocator<int>> = {<No data fields>},
        members of std::__1::__libcpp_compressed_pair_imp<int*, std::__1::allocator<int>, 2>:
        __first_ = 0x100103a90
      }, <No data fields>}
  }, <No data fields>}

It’s obvious that this output is essentially useless. It’s basically a bunch of memory addresses and obscure data types with absolutely no reference to what the developer is looking for, the entries of the vector!

GDB solves this problem (and others) by providing a full-fledged Python API ² for interacting with the inferior³. This API can be used for inspecting all sorts of information about the inferior, including static and dynamic value types, symbols and symbol tables, stack frames and even allows you to evaluate expressions in the inferior language. Moreover, this API has facilities for controlling GDB itself, by defining and executing GDB commands, creating breakpoints and watchpoints, inspecting breakpoint attributes and much more.

What we care about here is GDB’s Pretty Printing API. This API enables the creation of custom pretty printers for values of user-defined types. The goal is to allow the developer to use the plain old GDB print command on values of user-defined types and still get to see only the relevant pieces of the printed value.

GDB’s Python API represents values from the inferior using the gdb.Value class, and represents types using the gdb.Type class. Documentation for these classes can be found here and here. The most important aspect about gdb.Value is that for object values, you can access object members using Python’s dictionary syntax e.g. obj.mem becomes obj["mem"].

To create a pretty printer for a certain type, you need to create a printer class that has at least one method, to_string(), which converts that type to string for printing. If you know that your type represents some sort of an array, list or table, you may also need to add a children() method to your class. This later method must return an object conforming to the Python iterator protocol, where every item returned by that iterator must be a pair of child name and value as you wish to observe them in the output. If the child type also has a printer, GDB will automatically invoke it to print all the children.

Let’s go ahead and write a pretty printer for libc++’s std::vector⁴:

import gdb
import re

class VectorPrinter(object):
  """Prints an std::vector object."""
  class _iterator(object):
    def __init__(self, value):
      self.counter = 0
      self.begin = value["__begin_"]
      self.end = value["__end_"]

    def __iter__(self):
      return self

    def next(self):
      if self.begin == self.end:
        raise StopIteration

      ret = ("[%d]" % self.counter, self.begin.dereference())
      self.begin += 1
      self.counter += 1
      return ret

  def __init__(self, value):
    self.value = value

  def to_string(self):
    return "vector"

  def children(self):
    return self._iterator(self.value)

Note that we can access private members of std::vector (__begin_ and __end_). This is because access rights are inactive in the debugger. You may wonder why we bother accessing data members of std::vector if we can just call v.begin() and v.end(). There are two reasons for this:

In many cases we just can’t, because these member functions are usually small enough that the compiler usually decides to inline them. If you try to print the begin() iterator for instance, you may get an error that looks something like this: Cannot evaluate function -- may be inlined.
If the type you’re trying to print is an incomplete type e.g. if it’s not part of the process binary (it might be dynamically linked for instance), then you won’t even have access to any of the object’s members, in which case, the only way to refer to the object’s data members is by using pointer arithmetic.

Notice also that to_string() doesn’t actually do anything here. It just returns the string "vector". That’s because when you create a children() method, the output of to_string() represents the name of the printed list. It gets prepended (along with a succeeding = sign) to the output list resulting from the call to children().

children() returns an object of the VectorPrinter._iterator class. This class conforms to the Python iterator protocol by implementing a next() method, which returns the next value of the vector (actually a name/value pair) and raises a StopIteration exception when no more entries are found in the vector. The dereference() method of gdb.Value returns the value pointed to by a pointer when the gdb.Value represents a pointer and raises an exception otherwise.

To activate the pretty printer, we need to define a lookup function for our type and a registration function for our printer(s):

def vec_lookup_function(val):
  lookup_tag = val.type.tag
  if lookup_tag == None:
    return None

  regex = re.compile("^.*vector_base<.*,.*>$")
  if regex.match(lookup_tag):
    return VectorPrinter(val)

  return None

def register_libcxx_printers(objfile):
  objfile.pretty_printers.append(vec_lookup_function)

The lookup function is called by GDB for every value it attempts to print. If the return value is not None, GDB uses it as a printer object for that value. In this case the std::vector lookup function inspects the value type tag. That’s the word that comes after a class, struct or union keywords in C++ i.e. it’s the class name. Since in libc++, std::vector unfolds to a type that has the word vector_base in it, it’s possible to recognize that type using a regular expression as shown above.

Finally, we need to make sure GDB loads this file at startup. To do so, we need to make sure the printer module is in the PYTHONPATH and call the printer registration function in ~/.gdbinit:

python
from libcxx.printers import register_libcxx_printers
register_libcxx_printers(gdb)
end

Here’s the output from GDB using this pretty printer:

(gdb) print v
$1 = {
  <std::__1::__vector_base<int, std::__1::allocator<int> >> = vector = {
    [0] = 1,
    [1] = 2,
    [2] = 3,
    [3] = 4,
    [4] = 5
  }, <No data fields>}

At any point, if you need to print the raw version of a vector, you can use the /r switch:

(gdb) print /r v
$2 = {
  <std::__1::__vector_base<int, std::__1::allocator<int> >> = {
    <std::__1::__vector_base_common<true>> = {<No data fields>},
    members of std::__1::__vector_base<int, std::__1::allocator<int> >:
    __begin_ = 0x100103a70,
    __end_ = 0x100103a84,
    __end_cap_ = {
      <std::__1::__libcpp_compressed_pair_imp<int*, std::__1::allocator<int>, 2>> = {
        <std::__1::allocator<int>> = {<No data fields>},
        members of std::__1::__libcpp_compressed_pair_imp<int*, std::__1::allocator<int>, 2>:
        __first_ = 0x100103a90
      }, <No data fields>}
  }, <No data fields>}

There are different variations for how to create printer classes and lookup and registration function, but I wanted to keep things simple in this tutorial. For more information, check out the Pretty Printing API documentation.

I’m using Clang’s libc++ standard library here. If you use GCC’s libstdc++, your mileage may vary.↩︎
GDB has to be configured with --with-python for this to work.↩︎
The process under debugging.↩︎
For libstdc++, you can use any of the tools mentioned here. I should also mention that LLDB already prints libc++’s STL containers pretty decently. Check it out before rolling out your own GDB printers.↩︎

For older posts, head over to the blog page. To browse all posts and post tags, check out the archive.