From day one, the Kyua source tree has had docstring annotations for all of its symbols. The goal of such docstrings is to document the code for the developers of Kyua: these docstrings were never intended to turn into pre-generated HTML documentation because Kyua does not offer an API once installed.

As you might have noticed, Doxygen is an optional component of the build and it used to run on each make invocation. This changed “recently”. Nowadays, Doxygen is only run asynchronously on Travis CI to report docstring inconsistencies post-submission (see the DO=apidocs matrix entry if you are impatient). Combined with feature branches that are only merged into master when green, this is as good as the previous approach of running Doxygen along the build. Scratch that: this is even better because running Doxygen locally on each build took significant resources and penalized edit/build/test cycles.

In this article, I am going to guide you through the specifics of running Doxygen as a separate build in Travis CI. You can extrapolate this example to other “maintenance” tasks that you wish to run on each push—say, building and uploading manpages (which I still have to get to), verifying the style of your source tree, or running IWYU.

Background: docstrings

Since I started writing Python code at Google in 2009, I have become a fan of docstrings.

Having to explicitly document the purpose of each function via a textual description of its arguments, its return values, and any possible exceptions serves to make the code clearer and, more importantly, forces the developer to think about the real purpose of each function. More than once I’ve caught myself unable to concisely explain what a function does, which in turn led to a refactoring of such code.

Because of this reason, Kyua has had docstrings everywhere since day one. (I have even annotated shell scripts with such docstrings even when these cannot be parsed by Doxygen!) As an example, see the docstring for the randomly selected engine::check_reqs function:

/// Checks if the requirements specified by the test case are met.
///
/// \param md The test metadata.
/// \param cfg The engine configuration.
/// \param test_suite Name of the test suite the test belongs to.
/// \param work_directory Path to where the test case will be run.
///
/// \return A string describing the reason for skipping the test,
/// or empty if the test should be executed.
std::string
engine::check_reqs(const model::metadata& md,
                   const config::tree& cfg,
                   const std::string& test_suite,
                   const fs::path& work_directory)
{ ... }

Docstring linting

Keeping docstrings in sync with the code is very important—out of date documentation is harmful!—but validating documentation is never easy. One way to perform some minimum validation is to use Doxygen: when Doxygen runs, it spits out diagnostic messages if, for example, the list of documented parameters does not match the actual parameters of a function, of if the return value is not documented for a function that returns a value.

In the past, a post-build hook in Kyua’s Makefile triggered a run of Doxygen to sanity-check the contents of these docstrings by looking for warning messages in the output and then printing those at the end of the make invocation. The generated HTML files were discarded.

However, running Doxygen in a moderately sized codebase such as Kyua’s, which clocks at ~50K lines of code, takes a significant amount of time. For years, this had annoyed me to the point where I came up with local shell aliases to rebuild only a subset of the source tree without triggering Doxygen—particularly because in a dual-core system, Doxygen easily clogs one of the cores for the majority of the build time.

Recenlty, though, I figured I could delegate the execution of Doxygen to Travis CI and thus only validate that the docstrings are valid at push time. In fact, this approach can be generalized to asynchronously run other maintenance tasks but, for illustration purposes, I am focusing only on Doxygen.

Is Travis enough?

By moving the docstrings sanity-check operation to Travis, I lost the continuous validation that happened every time I typed make. As a result, there is a higher chance for individual commits to break docstrings unintentionally.

But that’s just fine.

Per the Kyua Git workflow policies, changes should not be committed directly into master: it is all too easy to commit something that later fails in Travis and thus requires an embarrassing follow-up check-in of the kind “Fix previous”. It is much better to do all work in a local branch (even for apparent one-commit fixes!), push the new branch to GitHub, let Travis go for a test run, see if there are any failures, use git rebase -i master along the edit or fixup actions, and make the set of commits sane from the ground up without amendments to fix obvious mistakes. (Think of Darcs if you will.) With a green build for the branch, merging into master becomes trivial and, more importantly, safe.

Therefore, requiring changes to be pushed into master only after getting a green build from Travis ensures that master never gets bogus commits with invalid docstrings. The visible effects are the same as before, so this is good enough.

Are you convinced yet? Let’s dive in.

Dealing with Doxygen false negatives

The first problem to deal with before integrating Doxygen into Travis are false negatives in docstrings: that is, the cases where Doxygen would complain about an incorrect docstring that is actually correct. I had trained myself to ignore the false negatives, but a mental process has one major problem: the act of enforcing failures when docstrings are invalid can only be done if the output is deterministic and clean. In other words: we need Doxygen to return 0 if all docstrings look good and 1 if any do not. But because of false negatives, we cannot trust the 1 return values. (I blame Doxygen’s C++ parser. Parsing C++ is very difficult and the only reasonable way of doing so these days is by using LLVM’s libraries. Anything else is bound to make mistakes.)

The good thing is that the false negatives are deterministic so I wrote a small AWK script (see check-api-docs.awk) that receives the output of Doxygen, strips out any known false negatives, and returns success if there are no new errors and failure if any unknown errors are found. Plugging this into the Makefile results in a check-api-docs target that can be properly used in an automated environment (see Makefile.am for the full details):

if WITH_DOXYGEN
check-api-docs: api-docs/api-docs.tag
	@$(AWK) -f $(srcdir)/admin/check-api-docs.awk
	    api-docs/doxygen.out

api-docs/api-docs.tag: $(builddir)/Doxyfile $(SOURCES)
	@mkdir -p api-docs
	@rm -f api-docs/doxygen.out api-docs/doxygen.out.tmp
	$(AM_V_GEN)$(DOXYGEN) $(builddir)/Doxyfile
	    >api-docs/doxygen.out.tmp 2>&1 &&
	    mv api-docs/doxygen.out.tmp api-docs/doxygen.out
endif

With this done, we now have a check-api-docs make target that we can depend on as part of the build. Such target fails the build only if there are new docstring problems. (Yes, we can manually invoke this target if we so desire.)

Hooking the run into Travis

The Travis configuration file supports specifying a single command to fetch the required dependencies and another command to execute the build. Builds can be configured both by predefined settings, such as the compilers to use or the operating systems to build on, and by manually specified environment variables.

The most naive approach to running maintenance tasks would be to add their actions to the all target in the Makefile so that a simple make invocation from the build script ran the maintenance task. This is overkill though: the maintenance job, which is not going to yield different results in every job matrix entry, will be executed for all entries and thus will stress an already overloaded worker pool. In particular, installing Doxygen in the builder takes a significant amount of time because of Doxygen’s dependency on TeX and running Doxygen sucks precious CPU resources.

What is the alternative then? Easy: have a single entry in the matrix running the maintenance task. Can you do that? Yes. How? With environment variables.

Travis allows you to add arbitrary entries to the job matrix in the configuration file by specifying combinations of environment variables that are passed to the scripts of the build. Using this feature, I introduced a global DO variable that tells the scripts what is being done: apidocs to verify the API documentation and build to execute the actual build of Kyua (see travis-build.sh and travis-install-deps.sh for an example).

With this new DO variable, we can customize the environment entries in the matrix to introduce a new run for the Doxygen invocation (see .travis.yml for full details on this code snippet):

env:
    - DO=apidocs
    - DO=distcheck AS_ROOT=no
    - DO=distcheck AS_ROOT=yes UNPRIVILEGED_USER=no
    - DO=distcheck AS_ROOT=yes UNPRIVILEGED_USER=yes

But this is still suboptimal. Travis builds a matrix of all possible combinations given by the operating systems, compilers, and the environment entries you defined. Running Doxygen on the source tree is independent of all these parameters: it does not matter what operating system you are running on or what compiler is used to build the source tree: Doxygen will yield the same output every single time.

Therefore, adding DO=apidocs as an entry to the matrix makes the number of build combinations explode, which is not acceptable because it is wasteful.

We can do better. We can tell Travis to exclude matrix entries that are unnecessary. To do so, we need to pick an arbitrary combination of settings to serve as the “baseline” for our maintenance tasks and then we have to disable all other matrix entries for this particular environment combination:

matrix:
    exclude:
        - compiler: gcc
          env: DO=apidocs
        - compiler: gcc
          env: DO=distcheck AS_ROOT=yes UNPRIVILEGED_USER=no
        - compiler: gcc
          env: DO=distcheck AS_ROOT=yes UNPRIVILEGED_USER=yes

Having to think of exclusions is not the most pleasant thing to do, but is easy enough if you have a small set of combinations. (It’d be easier and nicer if one could just list all matrix entries explicitly.)

Anyway: voila! That gives you a new entry in your build matrix to represent the new maintenance task. See a green build and a red build for a couple of examples of how things look like.