Deploying distcc in a kubernetes cluster to transform it into a big, fast build-machine.

4 min read

What is distcc?

Distcc acts like a simple, distributed C/C++ compiler. It has a client program distcc and a server application distccd. When you compile with distcc, it will use your compiler (typically gcc, g++ or clang) to preprocess your source files, and then send the output to a server for compilation. Since you preprocess the sources locally, all header files, macro expansions and compile definitions are correct - just like when you compile locally. On the server, all that is required is the same compiler as you use locally. I use Kubuntu 18.04 for development, so I just install distccd, gcc, g++ and clang to a Ubuntu 18.04 Docker image. That's all that is required to compile the code.

You tell Distcc what servers to use, and how many simultaneous jobs they can handle, in the DISTCC_HOSTS environment variable on the client side, in ~/.distcc/hosts or /etc/distcc/hosts config files. It will pick jobs from the first servers first, so it makes sense to start with localhost, and then the most powerful nodes sorted by CPU performance.

The only thing to be aware of is that since the local machine must preprocess all the source files, it has less CPU to contribute to the compile pool. On my setup, with 40 cores combined on the k8 nodes, I use only 2 of my 6 (12 with hyper-threading) cores for compile, and the rest for preprocessing and running the distcc client. When I also use ccache, my client machine need all it's cores to preprocess and cache the object files - so all the compilations are done on the kubernetes cluster.

Local k8 cluster?

Why would somebody have a local multi-node kubernetes cluster? I have one, partially deployed on bare metal servers and partially in virtual machines because I like to play with technology - and you know - the cloud is really just somebody else's computer. For me it's cheaper to have my own cluster built with normal PC hardware than to rent 16, 32 or 48 core cloud servers whenever I build a large C++ project or stress-test some kubernetes ready distributed application.

A local cluster is also required for distributed builds from a local PC or laptop. If I use cloud servers I need to run the full build from there - and that mean that I cannot just build & debug with kdevelop or Qt Creator.

Making the distcc Docker image

For my setup I wanted a simple Docker container that use all cores on the local machine when building, but virtually no resources when idle. This is simple to achieve if you just start distccd from the command line.

Dockerfile

FROM ubuntu:18.04

LABEL maintainer="jarle.lastviking.eu"

RUN DEBIAN_FRONTEND="noninteractive" apt-get -q update &&\
    DEBIAN_FRONTEND="noninteractive" apt-get -y -q --no-install-recommends upgrade &&\
    DEBIAN_FRONTEND="noninteractive" apt-get install -y -q g++ gcc clang distcc &&\
    DEBIAN_FRONTEND="noninteractive" apt-get -y -q autoremove &&\
    DEBIAN_FRONTEND="noninteractive" apt-get -y -q clean

ENV ALLOW 192.168.0.0/16
RUN useradd distcc
USER distcc
EXPOSE 3632

CMD distccd --jobs $(nproc) --log-stderr --no-detach --daemon --allow ${ALLOW} --log-level info

I have already built this container and pushed it to dockerhub.

Deploying the DaemonSet

My local kubernetes cluster is for software development, so unless I use it for something (like testing some software) it's idle. That makes it perfectly suited as a build machine when I need some extra horse power.

I deploy the distcc pod as a DaemonSet, which means that kubernetes will deploy it on each node in the cluster. I don't specify any resource constrains or limits - so it won't reserve any resources when it's unused, and it will use all the CPU and memory it needs when its building.

The deployment file distcc.yml

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: distcc
  namespace: devtools
  labels:
    app: distcc-devtool
spec:
  selector:
    matchLabels:
      name: distcc-devtool
  template:
    metadata:
      labels:
        name: distcc-devtool
    spec:
      containers:
        - name: distcc
          image: jgaafromnorth/distcc
          ports:
            - containerPort: 3632
              hostPort: 3632
              protocol: TCP
          env:
            - name: ALLOW
              value: 10.0.0.0/8

As you may notice, I use a hostPort with distcc's default port, so from the outside world, it appears that distcc is running as a normal daemon on the server. To send jobs to a node, I just need to know it's ip and it's number of cores.

To deploy it, just run:

$ kubectl apply -f distcc.yml

Using with cmake and ccache

There are several ways to use distcc with cmake. The simplest is to set -DCMAKE_CXX_COMPILER_LAUNCHER=distcc

But I want builds, not just compilations, to be fast. ccache is a compiler cache that tries to avoid repeating the same compilation several times by caching the compilations. Ccache can be configured to call distcc to perform an actual compilation.

On my system (Ubuntu 18.04) it seems like cmake or make finds ccache automatically, so all I have to do is to install ccache and configure it to call distcc when it needs to compile a source file.

What's the difference?

I'll use arangodb as an example here. It's a medium sized C++ project using cmake. On my 6 core (12 with hyper-threading) laptop, it takes about 14 minutes to compile without ccache and distcc.

Normal compilation

time make -j 12
...
[100%] Built target arangodbtests

real    14m5,931s
user    113m21,600s
sys     13m29,895s

Ccache and distcc, using Cmake's ccache detection. In this case parts of the compilations for 3rd party libraries were not sent to ccache/distcc, and slowed down the compilation significantly, and consumed > 12G of memory (running 40 simultaneous compilations on 6 cores).

time make -j 40
...
[100%] Built target arangodbtests

real    10m21,424s
user    59m14,562s
sys     10m23,702s

Ccache and distcc, with ccache called from symlinks from cc/c++ to ccache. This is supported out of the box on Debian and Ubuntu, if you prefix your PATH variable with /usr/lib/ccache: export PATH=/usr/lib/ccache:$PATH. However, this don't work with arangodb, without a small tweak.

In stead of calling distcc from ccache, we call a little script:

distcc-wrap.sh

#!/bin/sh
compiler=$(basename $1)
shift
exec distcc "/usr/bin/$compiler" "$@"

With this change in place, a full build with a clear ccache took about 8 minutes:

time make -j40
...
[100%] Built target arangodbtests

real    8m31,189s
user    22m24,886s
sys     7m45,816s

Without ccache and using two cores on the client machine for build, it went down to about 5 minutes. But we don't want to avoid ccache. Ccache is your friend.

Second full build with ccache and distcc. Now ccache will detect that no compilations are required, so most of the time is spent linking the cached object files.

make clean
time make -j40
...
[100%] Built target arangodbtests

real    1m38,137s
user    2m24,563s
sys     0m47,227s

So the real performance gain is from ccache. But if you need to clear the cache for some reason, or change some header file or declaration that require many files to be re-compiled, distcc cuts the build-time significantly.

Configuration

My ccache Configuration

~/.ccache/ccache.conf

max_size = 25.0G
compression = true
prefix_command = distcc-wrap.sh

My distcc configuration

~/.distcc/hosts

k8vm0/16 k8n0/12 k8n1/12

The hostnames here, k8vm0 ... are present in /etc/hosts. I could just as well have used the IP numbers to the nodes. The numbers after the slash specifies how many concurrent compilations that node can handle.

The Last Viking LTD