Monday, June 24, 2013

Readability: Document your types

Wow. The previous post titled Self-interview after leaving the NetBSD board has turned out to be, by far, the most popular article in this blog. The feedback so far has been positive and I owe all of you a follow-up post. However, writing such post will take a while and content must keep flowing. So let's get back to the readability series for now.


In dynamically-typed languages1, variable and function definitions do not state the type of their arguments. This is quite convenient when writing code, but results in very hard to read "stuff" later on. Consider this snippet:

def compute_balances(accounts):
    """Calculates the balances of a set of accounts.

    Args:
        accounts: Accounts for which to compute the balances.

    Returns:
        The balances for every account.
    """
    balances = new_balances()
    for account in accounts:
        balances[account.name] += account.balance
    return balances

Now, this example is certainly trivial and I'm writing it to illustrate the docstring text more than the code itself. That said: what is the type of accounts? What is the return type of the function?

When you write code in one of these typeless languages, document the types of your function arguments and return values, and do so in the most specific way you can — possibly using the native type syntax of the language you are using. The specific style I use is:

def compute_balances(accounts):
    """Calculates the balances of a set of accounts.

    Args:
        accounts: frozenset(Account).  Accounts for which to
            compute the balances.

    Returns:
        dict(str, Balance).  The balances for every account.
    """
    balances = new_balances()
    for account in accounts:
        balances[account.name] += account.balance
    return balances

While this may seem like unnecessary boilerplate, and it may also seem like such comments will likely get out of sync with the code later on, these details become invaluably helpful later on when reading code written by others.

Of course, I'd suggest you to choose a strictly-typed language in the first place, but that'd just be calling for a flamewar ;-) A similar piece of code in C++ would look like this:

/// Calculates the balances of a set of accounts.
///
/// \param accounts Accounts for which to compute the balances.
///
/// \return The balances for every account.
std::map< std::string, balance >
compute_balances(const std::set< account >& accounts):
    std::map< std::string, balance > balances = new_balances();
    for (auto account : accounts) {
        balances[account.name] += account.balance
    }
    return balances
}

The types in this case are required by the compiler but also provide useful information to the user of this function. Even more, if we ran this through Doxygen, the generated documentation would also automatically detail the types of the inputs and the return value.

To recap: always write docstrings for your functions and annotate them with specific type information for all inputs and outputs.


1 That's not strictly true. Some strongly-typed languages, such as Haskell, also have typeless definitions of variables and functions. The same suggestions given here also apply to such languages as well, even when the types can be automatically inferred.