Object Inspection in GDB

One of the primary needs of a developer, while debugging any code, is to be able to inspect, not only primitive values, but also compound values (like objects). The problem is that objects are usually littered with lots of irrelevant information that obscure the essence of what the developer might be interested in. It becomes even worse when the interesting pieces of an object are in dynamic memory, in which case all you get to see are dumb pointers. Consider the following code:

#include <vector>

int main() {
  std::vector<int> v;
  v.push_back(1);
  v.push_back(2);
  v.push_back(3);
  v.push_back(4);
  v.push_back(5);

  return 0;
}

Here’s the output from GDB when attempting to print v just before returning1:

(gdb) print v
$1 = {
  <std::__1::__vector_base<int, std::__1::allocator<int> >> = {
    <std::__1::__vector_base_common<true>> = {<No data fields>},
    members of std::__1::__vector_base<int, std::__1::allocator<int> >:
    __begin_ = 0x100103a70,
    __end_ = 0x100103a84,
    __end_cap_ = {
      <std::__1::__libcpp_compressed_pair_imp<int*, std::__1::allocator<int>, 2>> = {
        <std::__1::allocator<int>> = {<No data fields>},
        members of std::__1::__libcpp_compressed_pair_imp<int*, std::__1::allocator<int>, 2>:
        __first_ = 0x100103a90
      }, <No data fields>}
  }, <No data fields>}

It’s obvious that this output is essentially useless. It’s basically a bunch of memory addresses and obscure data types with absolutely no reference to what the developer is looking for, the entries of the vector!

GDB solves this problem (and others) by providing a full-fledged Python API2 for interacting with the inferior3. This API can be used for inspecting all sorts of information about the inferior, including static and dynamic value types, symbols and symbol tables, stack frames and even allows you to evaluate expressions in the inferior language. Moreover, this API has facilities for controlling GDB itself, by defining and executing GDB commands, creating breakpoints and watchpoints, inspecting breakpoint attributes and much more.

What we care about here is GDB’s Pretty Printing API. This API enables the creation of custom pretty printers for values of user-defined types. The goal is to allow the developer to use the plain old GDB print command on values of user-defined types and still get to see only the relevant pieces of the printed value.

GDB’s Python API represents values from the inferior using the gdb.Value class, and represents types using the gdb.Type class. Documentation for these classes can be found here and here. The most important aspect about gdb.Value is that for object values, you can access object members using Python’s dictionary syntax e.g. obj.mem becomes obj["mem"].

To create a pretty printer for a certain type, you need to create a printer class that has at least one method, to_string(), which converts that type to string for printing. If you know that your type represents some sort of an array, list or table, you may also need to add a children() method to your class. This later method must return an object conforming to the Python iterator protocol, where every item returned by that iterator must be a pair of child name and value as you wish to observe them in the output. If the child type also has a printer, GDB will automatically invoke it to print all the children.

Let’s go ahead and write a pretty printer for libc++’s std::vector4:

import gdb
import re

class VectorPrinter(object):
  """Prints an std::vector object."""
  class _iterator(object):
    def __init__(self, value):
      self.counter = 0
      self.begin = value["__begin_"]
      self.end = value["__end_"]

    def __iter__(self):
      return self

    def next(self):
      if self.begin == self.end:
        raise StopIteration

      ret = ("[%d]" % self.counter, self.begin.dereference())
      self.begin += 1
      self.counter += 1
      return ret

  def __init__(self, value):
    self.value = value

  def to_string(self):
    return "vector"

  def children(self):
    return self._iterator(self.value)

Note that we can access private members of std::vector (__begin_ and __end_). This is because access rights are inactive in the debugger. You may wonder why we bother accessing data members of std::vector if we can just call v.begin() and v.end(). There are two reasons for this:

  1. In many cases we just can’t, because these member functions are usually small enough that the compiler usually decides to inline them. If you try to print the begin() iterator for instance, you may get an error that looks something like this: Cannot evaluate function -- may be inlined.

  2. If the type you’re trying to print is an incomplete type e.g. if it’s not part of the process binary (it might be dynamically linked for instance), then you won’t even have access to any of the object’s members, in which case, the only way to refer to the object’s data members is by using pointer arithmetic.

Notice also that to_string() doesn’t actually do anything here. It just returns the string "vector". That’s because when you create a children() method, the output of to_string() represents the name of the printed list. It gets prepended (along with a succeeding = sign) to the output list resulting from the call to children().

children() returns an object of the VectorPrinter._iterator class. This class conforms to the Python iterator protocol by implementing a next() method, which returns the next value of the vector (actually a name/value pair) and raises a StopIteration exception when no more entries are found in the vector. The dereference() method of gdb.Value returns the value pointed to by a pointer when the gdb.Value represents a pointer and raises an exception otherwise.

To activate the pretty printer, we need to define a lookup function for our type and a registration function for our printer(s):

def vec_lookup_function(val):
  lookup_tag = val.type.tag
  if lookup_tag == None:
    return None

  regex = re.compile("^.*vector_base<.*,.*>$")
  if regex.match(lookup_tag):
    return VectorPrinter(val)

  return None

def register_libcxx_printers(objfile):
  objfile.pretty_printers.append(vec_lookup_function)

The lookup function is called by GDB for every value it attempts to print. If the return value is not None, GDB uses it as a printer object for that value. In this case the std::vector lookup function inspects the value type tag. That’s the word that comes after a class, struct or union keywords in C++ i.e. it’s the class name. Since in libc++, std::vector unfolds to a type that has the word vector_base in it, it’s possible to recognize that type using a regular expression as shown above.

Finally, we need to make sure GDB loads this file at startup. To do so, we need to make sure the printer module is in the PYTHONPATH and call the printer registration function in ~/.gdbinit:

python
from libcxx.printers import register_libcxx_printers
register_libcxx_printers(gdb)
end

Here’s the output from GDB using this pretty printer:

(gdb) print v
$1 = {
  <std::__1::__vector_base<int, std::__1::allocator<int> >> = vector = {
    [0] = 1,
    [1] = 2,
    [2] = 3,
    [3] = 4,
    [4] = 5
  }, <No data fields>}

At any point, if you need to print the raw version of a vector, you can use the /r switch:

(gdb) print /r v
$2 = {
  <std::__1::__vector_base<int, std::__1::allocator<int> >> = {
    <std::__1::__vector_base_common<true>> = {<No data fields>},
    members of std::__1::__vector_base<int, std::__1::allocator<int> >:
    __begin_ = 0x100103a70,
    __end_ = 0x100103a84,
    __end_cap_ = {
      <std::__1::__libcpp_compressed_pair_imp<int*, std::__1::allocator<int>, 2>> = {
        <std::__1::allocator<int>> = {<No data fields>},
        members of std::__1::__libcpp_compressed_pair_imp<int*, std::__1::allocator<int>, 2>:
        __first_ = 0x100103a90
      }, <No data fields>}
  }, <No data fields>}

There are different variations for how to create printer classes and lookup and registration function, but I wanted to keep things simple in this tutorial. For more information, check out the Pretty Printing API documentation.


  1. I’m using Clang’s libc++ standard library here. If you use GCC’s libstdc++, your mileage may vary.↩︎

  2. GDB has to be configured with --with-python for this to work.↩︎

  3. The process under debugging.↩︎

  4. For libstdc++, you can use any of the tools mentioned here. I should also mention that LLDB already prints libc++’s STL containers pretty decently. Check it out before rolling out your own GDB printers.↩︎

Tags: Debugging, GDB.