Notes to self (wjd.nu)

Notes to self, 2018

Year: 2025 |2024 |2023 |2022 |2021 |2020 |2019 |2018 |2017 |2016 |2015 |2014 |2013 |2012 |2011 |2010 |2009 |2008 |2007 |2006 |index

2018-11-08 - gbp buildpackage / gpg2

If you prefer gpg2 over gpg, building a debian package with debuild or gbp buildpackage may require some fiddling.

In my case, I'm using gpg2 instead of gpg for signing because unlike version 1, version 2 does PGP key forwarding. That way I can sign on a remote machine, using a local PGP key card.

However gbp buildpackage, dpkg-buildpackage and debuild are hardwired to call gpg. And — it turns out — using a simple /usr/local/bin/gpg to /usr/bin/gpg2 symlink was not sufficient to convince gbp (and debuild) to use the gpg2 binary, while for dpkg-buildpackage that is sufficient.

The cause? gbp calls debuild which cleans the environment. debuild in turn calls dpkg-buildpackage with a clean(er) environment.

The fix? Update the builder command with --prepend-path /usr/local/bin:

$ gbp buildpackage --git-builder='debuild --prepend-path /usr/local/bin -i -I' ...

2018-09-20 - kubectl / broken terminal / ipython

Just now I ran into an IPython interpreter inside a Docker container inside Kubernetes misbehaving:

After starting ipython inside a kubectl for a second time, IPython wouldn't show the input prompt. It only showed the output prompt.

Turns out it was due to the terminal settings. For some reasons, after logging out of kubectl exec, the next exec would get 0 rows and 0 columns; as if someone had run stty rows 0 on the terminal. And of course, this wasn't reliably reproducible, but happened often enough.

This already appeared to mangle the Bash shell somewhat, but it was still usable, and perfectly mitigatable through a bit of extra environment:

$ kubectl exec -it my-pod-xxx env LINES=$LINES COLUMNS=$COLUMNS bash

But that only fixed the shell, not the IPython interpreter. Doing a reset on the original shell fixes things. But for a more soft approach that actually works, I resorted to this:

$ kubectl exec -it my-pod-xxx -- sh -c "stty rows $LINES cols $COLUMNS && exec bash"

A few more characters to type, but it does the trick. See stty -a for the current values.

2018-09-19 - vimrc / debian stretch

In Debian/Stretch, the default ViM settings have been changed — for the worse, in my opinion.

However, undoing the bad settings is not a matter of fixing them in your ~/.vimrc, because when that file is detected no defaults at all are set.

The quick fix is to create a custom /etc/vim/vimrc.local file with the following settings:

" Instead of auto-sourcing this afterwards, source it now.
source $VIMRUNTIME/defaults.vim
let g:skip_defaults_vim = 1

" Now we undo the "wrong" settings.
set mouse=
set noincsearch
set nosi

(Adjust "wrong" settings as needed.)

This way you still get the Debian defaults, but can change those that aren't to your liking.

2018-09-18 - core file / docker image / auplink

A while, I've been looking at a stray /core file in some of our daily Xenial Docker images. Time to find out where it comes from.

Tracing with a few well placed RUN ls -l /core || true, tells us that the dump appeared after a large RUN statement and not during one.

Running gdb on the core revealed that it was a dump of auplink, a part of Docker. Opening the core on a Xenial machine with docker installed, showed the following backtrace:

Core was generated by `auplink /var/lib/docker/aufs/mnt/21c482c11476d6fb9842fa91c0d9e2c49cfb51c3d04dd5'.
Program terminated with signal SIGSEGV, Segmentation fault.

(gdb) bt
#0  ftw_startup (
    dir=0x1d66010 "/var/lib/docker/aufs/mnt/21c482c11476d6fb9842fa91c0d9e2c49cfb51c3d04dd5d0dee424d4080d0a4f",
    is_nftw=1, func=0x40149c, descriptors=1048566, flags=19) at ../sysdeps/wordsize-64/../../io/ftw.c:654
#1  0x0000000000401d52 in ?? ()
#2  0x00000000004013ec in ?? ()
#3  0x00007f4331728830 in __libc_start_main (main=0x401266, argc=3, argv=0x7ffc32267318, init=<optimized out>,
    fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7ffc32267308) at ../csu/libc-start.c:291
#4  0x0000000000401199 in ?? ()

And, for completeness sake, the innermost frame from full bt:

(gdb) bt full
#0  ftw_startup (
    dir=0x1d66010 "/var/lib/docker/aufs/mnt/21c482c11476d6fb9842fa91c0d9e2c49cfb51c3d04dd5d0dee424d4080d0a4f",
    is_nftw=1, func=0x40149c, descriptors=1048566, flags=19) at ../sysdeps/wordsize-64/../../io/ftw.c:654
        data = {dirstreams = 0x7ffc31a67030, actdir = 0, maxdir = 1048566, dirbuf = 0x0,
          dirbufsize = 4412750543122677053, ftw = {base = 1027423549, level = 1027423549}, flags = 0,
          cvt_arr = 0xff0000, func = 0xff, dev = 18446744073709486080, known_objects = 0xffff000000000000}
        st = {st_dev = 0, st_ino = 0, st_nlink = 0, st_mode = 0, st_uid = 0, st_gid = 0, __pad0 = 0, st_rdev = 0,
          st_size = 30828704, st_blksize = 4, st_blocks = 30826928, st_atim = {tv_sec = 122, tv_nsec = 30826928},
          st_mtim = {tv_sec = 50, tv_nsec = 30826624}, st_ctim = {tv_sec = 139926575187608, tv_nsec = 1048576},
          __glibc_reserved = {19, 1048566, 4199580}}
        result = 0
        cwdfd = -1
        cwd = 0x0
        cp = <optimized out>
...

From this, a quick Google search returned various pages where the aufs filesystem is to blame.

Without going into detail this time, the fix was to change the Docker filesystem of the daily docker runners. Apparently we were still using the (poor) default aufs filesystem. Updating to overlay2 did the trick.

$ cat /etc/docker/daemon.json
{
  "storage-driver": "overlay2"
}

2018-04-11 - ubuntu bionic / crashing gdm / eglgetdisplay

After upgrading from Ubuntu 17.10 to Ubuntu 18.04, and rebooting, the GNOME Display Manager (gdm) went into a restart loop. No promised speed gains. Instead, I got an unusable desktop.

Being quick with CTRL+ALT+F3, I could enter my username and password in the text console after a couple attempts — the gdm restart would continuously steal console/tty focus — after which a sudo systemctl stop gdm was possible. This left me with a shell and plenty of time to examine the situation.

The Xorg.0.log (and syslog) went a little like this:

/usr/lib/gdm3/gdm-x-session[1849]: X.Org X Server 1.19.6
/usr/lib/gdm3/gdm-x-session[1849]: Release Date: 2017-12-20
/usr/lib/gdm3/gdm-x-session[1849]: X Protocol Version 11, Revision 0
...
/usr/lib/gdm3/gdm-x-session[1849]: (II) Loading sub module "glamoregl"
/usr/lib/gdm3/gdm-x-session[1849]: (II) LoadModule: "glamoregl"
/usr/lib/gdm3/gdm-x-session[1849]: (II) Loading /usr/lib/xorg/modules/libglamoregl.so
/usr/lib/gdm3/gdm-x-session[1849]: (II) Module glamoregl: vendor="X.Org Foundation"
/usr/lib/gdm3/gdm-x-session[1849]: #011compiled for 1.19.6, module version = 1.0.0
/usr/lib/gdm3/gdm-x-session[1849]: #011ABI class: X.Org ANSI C Emulation, version 0.4
/usr/lib/gdm3/gdm-x-session[1849]: (II) glamor: OpenGL accelerated X.org driver based.
/usr/lib/gdm3/gdm-x-session[1849]: (EE) modeset(0): eglGetDisplay() failed
/usr/lib/gdm3/gdm-x-session[1849]: (EE) modeset(0): glamor initialization failed
...
gnome-session[1923]: X Error of failed request:  BadValue (integer parameter out of range for operation)
gnome-session[1923]:   Major opcode of failed request:  154 (GLX)
gnome-session[1923]:   Minor opcode of failed request:  3 (X_GLXCreateContext)
gnome-session[1923]:   Value in failed request:  0x0
gnome-session[1923]:   Serial number of failed request:  19
gnome-session[1923]:   Current serial number in output stream:  20
gnome-session[1923]: gnome-session-check-accelerated: GL Helper exited with code 256
gnome-session-c[1936]: eglGetDisplay() failed
gnome-session[1923]: gnome-session-check-accelerated: GLES Helper exited with code 256
...
gnome-session-c[1992]: eglGetDisplay() failed
gnome-session[1923]: gnome-session-check-accelerated: GLES Helper exited with code 256
gnome-session[1923]: gnome-session-binary[1923]: WARNING: software acceleration check failed: Child process exited with code 1
gnome-session[1923]: gnome-session-binary[1923]: CRITICAL: We failed, but the fail whale is dead. Sorry....
gnome-session-binary[1923]: WARNING: software acceleration check failed: Child process exited with code 1
gnome-session-binary[1923]: CRITICAL: We failed, but the fail whale is dead. Sorry....
at-spi-bus-launcher[1925]: XIO:  fatal IO error 11 (Resource temporarily unavailable) on X server ":0"
...
/usr/lib/gdm3/gdm-x-session[1849]: (II) Server terminated successfully (0). Closing log file.
gdm3: Child process -1849 was already dead.
gdm3: Child process 1833 was already dead.

Luckily I had another system with a browser, and quickly found libglvnd0/libegl installed in Ubuntu 18.04 breaks graphics drivers and forces LLVMpipe driver on i915 systems (launchpad bug 1733136).

But I was on a system without any NVIDIA graphics card, so that couldn't be my problem, right?

$ sudo lshw -c video
  *-display
       description: VGA compatible controller
       product: Xeon E3-1200 v3 Processor Integrated Graphics Controller
       vendor: Intel Corporation
...
       configuration: driver=i915 latency=0

So I tried to figure out what this eglGetDisplay() was and why it would fail. I tried this adapted code I found on the internet:

$ cat >eglinfo.c <<EOF
#include <X11/Xlib.h>
#include <stdio.h>
#include <EGL/egl.h>
int main(int argc, char *argv[])
{
        Display *xlib_dpy = XOpenDisplay(NULL);
        // there is no DISPLAY= so, XOpenDisplay will fail anyway
        //if (!xlib_dpy) {
        //      return 1;
        //}
        int maj, min;
        EGLDisplay d = eglGetDisplay(xlib_dpy);
        if (!eglInitialize(d, &maj, &min)) {
                printf("eglinfo: eglInitialize failed\n");
                return 2;
        }
        return 0;
}
EOF

$ gcc -o eglinfo eglinfo.c -lX11 -lEGL
$ ./eglinfo
modprobe: ERROR: could not insert 'nvidia': No such device
eglinfo: eglInitialize failed

That eglInitialize failed was to be expected, since we passed NULL as display, but the "modprobe" was unexpected. Why on earth would it try to load (only) NVIDIA modules on this machine with Intel graphics.

I went back to the bugreport above, and there it was: “It's true that the first time you install the nvidia driver (and I mean via the PPA deb file), it tends to make /usr/lib/xorg/modules/extensions/libglx.so and /usr/lib/x86_64-linux-gnu/libGL.so point at the nvidia drivers, but this isn't the case in my setup, which works fine in artful, just not in bionic.”

A-ha. So, the NVIDIA packages were possibly symlinking over other stuff. Since the NVIDIA card is gone, a purge of all things nvidia should be a good cleanup. And indeed, removing those packages showed things like:

$ dpkg -l | grep nvidia | awk '/^ii/{print$2}' | xargs sudo apt-get remove --purge
...
Removing 'diversion of /usr/lib/x86_64-linux-gnu/libGL.so.1 to /usr/lib/x86_64-linux-gnu/libGL.so.1.distrib by nvidia-340'
...

And now gdm started succesfully and everything ran smoothly.

Except for the following snag: compiling the sample code above was not possible anymore. How was that possible?

$ gcc -o eglinfo eglinfo.c -lX11 -lEGL
/usr/bin/ld: cannot find -lEGL
collect2: error: ld returned 1 exit status

$ grep ^Libs: /usr/lib/x86_64-linux-gnu/pkgconfig/egl.pc
Libs: -L${libdir} -lEGL

$ ls -l /usr/lib/x86_64-linux-gnu/libEGL.so
ls: cannot access '/usr/lib/x86_64-linux-gnu/libEGL.so': No such file or directory

You'd expect the libEGL.so symlink to be in the same packages as that egl.pc, but no. But it wasn't in libegl1-mesa-dev. Instead, it was created by libglvnd-dev, but had since been raped by the nvidia packages.

$ sudo apt-get install libglvnd-dev --reinstall
...

$ ls -l /usr/lib/x86_64-linux-gnu/libEGL.so
lrwxrwxrwx 1 root root 15 mrt  5 10:45 /usr/lib/x86_64-linux-gnu/libEGL.so -> libEGL.so.1.0.0

Good, now we're back in business.

2018-03-26 - checking client ssl certificate / from python

A quick howto on checking SSL/TLS client certificates from Django/Python.

Generally, when you want to use client certificates, you'll let the HTTPS server (e.g. NGINX) do the certificate validation.

For NGINX you'd add this config, and be done with it.

# TLS server certificate config:
...
# TLS client certificate config:
ssl_verify_client on;  # or 'optional'
ssl_client_certificate /PATH/TO/my-ca.crt;
...
location ... {
    ...
    # $ssl_client_s_dn contains: "/C=.../O=.../CN=...", where you're
    # generally interested in the CN-part (commonName) to identify the
    # "who".
    #
    # You'll want one of these, depending on your backend:
    proxy_set_header X-Client-Cert-Dn $ssl_client_s_dn;    # for HTTP(S)-proxy
    fastcgi_param HTTP_X_CLIENT_CERT_DN $ssl_client_s_dn;  # for fastcgi
    uwsgi_param HTTP_X_CLIENT_CERT_DN $ssl_client_s_dn;    # for uwsgi
}

The above config instruct the browser that a TLS client certificate is needed. The supplied certificate is checked against the my-ca.crt to validate that my-ca.key was used to create the certificate. The subject line (DN) is passed along to the target (another HTTP(S) consumer, or fastcgi, or uwsgi), where you can inspect the commonName to see who is using the system.

This is the logical place to do things: it places the burden of all TLS handling on the world-facing proxy.

However, sometimes you want to delegate the checking to the backend. For example when the application developer is allowed to update the certificates, but not allowed to touch the ingress controller (the world-facing https proxy).

For that scenario, NGINX provides an optional_no_ca value to the ssl_verify_client setting. That option instructs NGINX to allow client certificates, and use them for TLS, but it will not check the validity of the client certificate. In this case NGINX checks that the client has the private key for the certificate, but it does not check that the certificate itself is generated by the correct certificate authority (CA) — in our case my-ca.crt.

We change the following to the NGINX config:

# TLS client certificate config:
ssl_verify_client optional_no_ca; # there is no 'mandatory_no_ca' value?
#ssl_client_certificate # UNUSED: checked by backend
...
    proxy_set_header X-Client-Cert $ssl_client_cert;    # for HTTP(S)-proxy
    fastcgi_param HTTP_X_CLIENT_CERT $ssl_client_cert;  # for fastcgi
    uwsgi_param HTTP_X_CLIENT_CERT $ssl_client_cert;    # for uwsgi

From now on, you must validate in the backend, with something like this:

pem = environ.get('HTTP_X_CLIENT_CERT') or '<BAD_CERT>'
# or request.META['HTTP_X_CLIENT_CERT'] for Django setups

try:
    cert = BaseCert.from_pem(pem)
except ValueError:
    raise Http400()  # cert required

try:
    verify(cert)
except VerificationError:
    raise Http403()  # invalid client cert

Of course you'll need an implementation of BaseCert and verify.

The following tlshelper3.py uses python3-openssl to implement BaseCert:

from OpenSSL.crypto import (
    FILETYPE_ASN1, X509Store, X509StoreContext, X509StoreContextError,
    load_certificate)
from base64 import b64decode


class VerificationError(ValueError):
    pass


class BaseCert:
    @classmethod
    def from_pem(cls, pem_data):
        try:
            assert isinstance(pem_data, str), pem_data
            pem_lines = [l.strip() for l in pem_data.strip().split('\n')]
            assert pem_lines, 'Empty data'
            assert pem_lines[0] == '-----BEGIN CERTIFICATE-----', 'Bad begin'
            assert pem_lines[-1] == '-----END CERTIFICATE-----', 'Bad end'
        except AssertionError as e:
            raise ValueError('{} in {!r}'.format(e.args[0], pem_data)) from e

        try:
            der_data = b64decode(''.join(pem_lines[1:-1]))
        except ValueError as e:
            raise ValueError('Illegal base64 in {!r}'.format(pem_data)) from e

        return cls.from_der(der_data)

    @classmethod
    def from_der(cls, der_data):
        assert isinstance(der_data, bytes)
        cert = load_certificate(FILETYPE_ASN1, der_data)
        return cls(cert)

    def __init__(self, x509):
        self._x509 = x509
        self._revoked_fingerprints = set()

    def __str__(self):
        try:
            cn = self.get_common_name()
        except Exception:
            cn = '<could_not_get_common_name>'
        try:
            issuer = self.get_issuer_common_name()
        except Exception:
            issuer = '<could_not_get_issuer>'

        return '{} issued by {}'.format(cn, issuer)

    def get_common_name(self):
        return self._get_common_name_from_components(self._x509.get_subject())

    def get_fingerprints(self):
        ret = {
            'SHA-1': self._x509.digest('sha1').decode('ascii'),
            'SHA-256': self._x509.digest('sha256').decode('ascii'),
        }
        assert len(ret['SHA-1']) == 59, ret
        assert all(i in '0123456789ABCDEF:' for i in ret['SHA-1']), ret
        assert len(ret['SHA-256']) == 95, ret
        assert all(i in '0123456789ABCDEF:' for i in ret['SHA-256']), ret
        return ret

    def get_issuer_common_name(self):
        return self._get_common_name_from_components(self._x509.get_issuer())

    def _get_common_name_from_components(self, obj):
        return (
            # May contain other components as well, 'C', 'O', etc..
            dict(obj.get_components())[b'CN'].decode('utf-8'))

    def set_trusted_ca(self, cert):
        self._trusted_ca = cert

    def add_revoked_fingerprint(self, fingerprint_type, fingerprint):
        if fingerprint_type not in ('SHA-1', 'SHA-256'):
            raise ValueError('fingerprint_type should be SHA-1 or SHA-256')

        fingerprint = fingerprint.upper()
        assert all(i in '0123456789ABCDEF:' for i in fingerprint), fingerprint
        self._revoked_fingerprints.add((fingerprint_type, fingerprint))

    def verify(self):
        self.verify_expiry()
        self.verify_against_revoked()
        self.verify_against_ca()

    def verify_expiry(self):
        if self._x509.has_expired():
            raise VerificationError(str(self), 'is expired')

    def verify_against_revoked(self):
        fingerprints = self.get_fingerprints()
        for fingerprint_type, fingerprint in self._revoked_fingerprints:
            if fingerprints.get(fingerprint_type) == fingerprint:
                raise VerificationError(
                    str(self), 'matches revoked fingerprint', fingerprint)

    def verify_against_ca(self):
        if not hasattr(self, '_trusted_ca'):
            raise VerificationError(str(self), 'did not load trusted CA')

        store = X509Store()
        store.add_cert(self._trusted_ca._x509)
        store_ctx = X509StoreContext(store, self._x509)
        try:
            store_ctx.verify_certificate()
        except X509StoreContextError as e:
            # [20, 0, 'unable to get local issuer certificate']
            raise VerificationError(str(self), *e.args)

Some examples and tests:

if __name__ == '__main__':
    def example():
        # "Creating a CA with openssl"
        # openssl genrsa -out ca.key 4096
        # openssl req -new -x509 -days 365 -key ca.key -out ca.crt \
        #   -subj '/C=NL/CN=MY-CA'
        cacert = '''-----BEGIN CERTIFICATE-----
            MIIE8zCCAtugAwIBAgIJAN6Zb03+GwJUMA0GCSqGSIb3DQEBCwUAMBAxDjAMBgNV
            BAMMBU1ZLUNBMB4XDTE4MDMyMzA5MTg1NFoXDTE5MDMyMzA5MTg1NFowEDEOMAwG
            A1UEAwwFTVktQ0EwggIiMA0GCSqGSIb3DQEBAQUAA4ICDwAwggIKAoICAQCTKOv/
            /rLvSh4Emdjhlsp7/1SFMlRbPJCZFHTtr0iFAENYdvMXShL/5EQVnt92e0zFD5kj
            m3dx5WrKhc60CgF2fwJ9g0X64s8UQ0160BidboyLWgPQxUtYuJZfCa1Jp2at35Rb
            KTTcgcvGHHM9Bl3tRvE6r3MeBtHvAgZHhjqd59g73svILVVyM0n/SHNbQiv+yOfU
            87nPgbIq0hgs5v5atycFUzvzNimUH8vKmiCkYWuwM+UuHUUBDN/FESyANUJm2Eoi
            hJcPnQX+JBfhGcgRUrvLiA59fMJEVU2s16vix55evnoZbe2hN2QQ9FH9LbZp6evR
            qoNa9BoJVEFGHR6DCUfPDHT9EhPYe70w3Wlv3wO8vFsmKiCJivFQQCx21M8tXQug
            b47x0vhbpR0gi8Cz+UsOWZvrAOKqoBGwtxEjmuc+eFKiU3h4/Mv1v3yb5W41S+eM
            IGaCnXDW32X+ypHW0RirhRuRoGu67hAGVAP3KWKWuBtwaMoYErGPCSeoAy3fD0Dw
            0l762mnqn5BIJmvMwjeM+CBRylXfRj/xsBs/+G6Com1zRgzkkbU+G2yYOF+2MgxK
            mak/RLCx13u/VMUJDQzP3thUABCn+ZTCu+yCsFhPlj/zJU1QFu0uiGqTiqAHWYSQ
            spvY6NXel2JPk/nFE1HWpyXBVyF8Ksm1XkGF8wIDAQABo1AwTjAdBgNVHQ4EFgQU
            Ptqs7zPsJS7oEi76bZNHayUhzi0wHwYDVR0jBBgwFoAUPtqs7zPsJS7oEi76bZNH
            ayUhzi0wDAYDVR0TBAUwAwEB/zANBgkqhkiG9w0BAQsFAAOCAgEAMBzjxsBLbXBI
            TWHG4bPmHu/3Pv7p1gkiNNPh7GNA3Q9zMiN4NrstsuAFqDGBHWB5G8mfJ5v9F5qS
            fX0MUQWCOqhJCopt+03l/mo6O068POZ6aGrNf9staGA1x0hJaDlAM5HusEZ6WVA4
            EJySDSCmRlonzkAqOmN8mT1jzYzjCK1Q53O8/41Dv6I9RcDeU5gBs4MvFTOCmzrD
            AsXX9UyOkcRMNJUBq1t9oQipciu0y2bAZSOHA0JxSiGEijRtEbnBJ1Z74orgBvYk
            rPt9oEgEKkkYzT5jLL9aShSMm3UiHIhaDtCiky3qmH4GcXYZMCc3f3TF+L9Fl1YT
            ExDQJvFkx1h8nWdpMFroWLX3gIawW3mWMbpokt6quW1ndnH/6i0cva7nr+5CYBJq
            +RKnuF2M1z8NNDXzSLypX4MFa/LL+oj/q4r7dcELjYTClHzQ5i2ztGuyltAQSged
            ECkO8b9BqXGxGbWQv4L7OXy/fjrzMw3a3ErgDcTtRdL4IUF3pTsJuhkosPSM+REs
            OevV+s0sXRGRl/IlWo8mLXJp9ZKWXi+aTShitxu/FNp6LR/9/0TmVblMx0mjubfS
            06lMltPa7mep4m9rfhowgf1ElSXquWTjj3bMzfvOsHrreq50NMxWCJjCeYHM2oNI
            JzIhDr6afzQ62acSEV3/w7SAtkDsfFw=
            -----END CERTIFICATE-----'''

        # "Creating a CA-signed client cert with openssl"
        # openssl genrsa -out client.key 1024
        # openssl req -new -key client.key -out client.csr \
        #   -subj '/C=NL/CN=MY-CLIENT'
        # openssl x509 -req -days 365 -in client.csr -CA ca.crt -CAkey ca.key \
        #    -set_serial 01 -out client.crt
        clientcert = '''-----BEGIN CERTIFICATE-----
            MIIDEzCB/AIBATANBgkqhkiG9w0BAQsFADAQMQ4wDAYDVQQDDAVNWS1DQTAeFw0x
            ODAzMjMwOTIwNTNaFw0xOTAzMjMwOTIwNTNaMBQxEjAQBgNVBAMMCU1ZLUNMSUVO
            VDCBnzANBgkqhkiG9w0BAQEFAAOBjQAwgYkCgYEAueUyGPY5JrZcWT9MdjsxmZB/
            XexDT+cKif1dxq+rxLZO7qt5jMVPZLnxCX3cypTZ1u3cvnwGkqfkYT1hRDTfs6WU
            b9qwEYKz9W/9WEbh1hvVmaxRK3k+UspN1WdwOFer5k1zORzYCVZATHBj05QRztF1
            +Wx9m9avXMxqLnRsRuUCAwEAATANBgkqhkiG9w0BAQsFAAOCAgEAAJ922lE2qm8k
            OSCc/+BlyWJN78gxjE46S6/EnFEUgFBJhzqhIDIAApf5FDuA+5xeXt2RzrtJO/+0
            vFwVuyXssbZB6R6433VN8KsyEwEp+dxaP3u4tzZ+82J6VlCDnGt1t5smXUPUzEzh
            NdSeGe/11OvxKVV8b9gyy+007+l4u30vvatrpMaXRM2LpcKtmTu1B+FAPiP93G0U
            vMCw6+PbMGoQitwAIHW+86aycfUzYq5mivjVaaf4wgwo3rbAwcKK8aFmCarDbtwy
            cuzzvcsTdT/OxaPvGO3mOQpbcZpOFTjwNBc5LAOBRGDvbg3VOoPwOnS0lFJD5uc+
            MZOKcYOmHUeKqWOyCW6svGqlvZnuDDd808tqzVnBqTYo6UoV+dj4wEL2iRE+6zFg
            GuUKfbi2wV6exRisr6dBDLxIX068wbWVOHxAJrW/Ww0hKB78IqtSUXuBNuPUQg2m
            8JOFkMRrNtMZCyjF+ijEEFvfvqakLk+IzXuXXDS8h0A8O7jG4ehAxe1pkbZ/g3E9
            OUiJfKws5LVBLxh3HfpQe8JGfVI/5/naaqrB77gqf8Ub7YePczAEdJMiSgWBL5/l
            SIW14UwkbyH6fAbbVQC5O1Px0GhpiRV0hfBLx4ZaQ5wuDU3O866endNp48Ho6mM4
            /hnbcHOCf6zlThuDSGPkb76D54HdO1s=
            -----END CERTIFICATE-----'''

        ca = BaseCert.from_pem(cacert)
        cert = BaseCert.from_pem(clientcert)
        cert.set_trusted_ca(ca)

        print('Certificate:', cert)
        print('Fingerprints:', cert.get_fingerprints())
        # cert.add_revoked_fingerprint('SHA-1',
        #     cert.get_fingerprints()['SHA-1'])
        # cert.add_revoked_fingerprint(
        #     'SHA-1',
        #     '05:62:27:A5:6E:A1:52:F3:E7:E7:44:16:D6:F4:BD:27:B4:D8:1B:E5')

        cert.verify()
        print('Verification: OK')

    example()

An example verify implementation can be found in the following sample WSGI code — we'll call it client-cert-wsgi.py — runnable by e.g. uWSGI:

from tlshelper3 import BaseCert, VerificationError

with open('/PATH/TO/my-ca.crt') as fp:
    CA_CERT = BaseCert.from_pem(fp.read())

REVOKED_CERTS = (
    # Example X
    ('SHA-256', (
        'F8:7F:30:7B:12:15:15:47:07:93:D4:99:8F:7B:2E:DF:'
        '12:5A:2C:0F:C4:BD:5E:56:B8:5C:93:A3:65:CB:63:9B')),
    # Example Y
    ('SHA-256', (
        '00:11:22:33:44:55:66:77:88:99:AA:BB:CC:DD:EE:FF:'
        '12:5A:2C:0F:C4:BD:5E:56:B8:5C:93:A3:65:CB:63:9B')),
    # Example Z
    # ('SHA-256',
    #     '36:9F:36:7F:0C:90:26:A1:AD:A3:79:E9:A9:8B:F5:74:'
    #     '21:B1:29:4B:67:73:78:B4:DE:CF:FA:C5:A6:42:BA:03'),
)


def verify(cert):
    cert.set_trusted_ca(CA_CERT)
    for revoked_cert in REVOKED_CERTS:
        cert.add_revoked_fingerprint(*revoked_cert)
    cert.verify()  # raises VerificationError


def application(environ, start_response):
    # Call this with: curl -E client_key_and_crt.pem URL
    pem = environ.get('HTTP_X_CLIENT_CERT') or '<BAD_CERT>'

    try:
        cert = BaseCert.from_pem(pem)
    except ValueError:
        cert = None

    if not cert:
        return handle400(start_response)

    try:
        verify(cert)
    except VerificationError:
        return handle403(start_response)

    status = '200 OK'
    output = (
        'Hello World!\n\nGot valid CERT {} with fingerprints:\n\n{!r}\n'
        .format(cert, cert.get_fingerprints()).encode('utf-8'))
    response_headers = [
        ('Content-type', 'text/plain'),
        ('Content-Length', str(len(output)))]

    start_response(status, response_headers)
    return [output]

I added a list of revoked certificate fingerprints in there — for the application developer to maintain — as an easy alternative to the more troublesome certificate revocation lists (CRLs). You can find the certificate fingerprints using cert.get_fingerprints().

If you want to check the commonName (CN) to identify the who, there is cert.get_issuer_common_name().

The rest of the WSGI code, for completeness sake:

def handle400(start_response):
    # This does NOT cause the browser to request a client cert. We'd need
    # access to the TLS layer for that, and we don't have that. The
    # nginx option 'ssl_verify_client optional_no_ca' will not force a
    # certicate.
    #
    # If you want a client.pem, you'll just concat the client.key and
    # client.crt.
    #
    # If you want it in the browser, you'll use:
    # openssl pkcs12 -export -in client.crt -inkey client.key -out client.p12
    # But that's more troublesome, because -- like stated above -- the browser
    # won't prompt you for a certificate.
    #
    status = '400 Bad Request'
    output = b'400 No required SSL certificate was sent\n'
    response_headers = [
        ('Content-type', 'text/plain'),
        ('Content-Length', str(len(output)))]

    start_response(status, response_headers)
    return [output]


def handle403(start_response):
    status = '403 Access Denied'
    output = b'403 Access denied to the requested resource\n'
    response_headers = [
        ('Content-type', 'text/plain'),
        ('Content-Length', str(len(output)))]

    start_response(status, response_headers)
    return [output]

Lastly, some example uWSGI config to go with that:

[uwsgi]
plugins = python3
wsgi-file = /PATH/TO/client-cert-wsgi.py
chdir = /PATH/TO

I'll leave creating a pretty Django middleware class as an excercise to the reader.

A word of caution: you'll want to ensure that external/untrusted clients cannot set X-Client-Cert themselves — if it's set, it must be set by the HTTPS server handling the TLS. Otherwise a stolen/sniffed CRT without KEY could be used sneak past the authentication.

2018-02-14 - docker application placement / paths

Where do you place the application inside the Docker image?

In the past, when deploying Python/Django projects, we'd put them in /srv/django-projects/APPNAME on a (possibly shared) machine. The python-virtualenv that came with it, went into /srv/virtualenvs/APPNAME.

Now that we're dockerizing many projects, we don't need the virtualenv (there is only one environment) and we don't need the APPNAME either (there is only one application). So, where should we place the project?

The usual suspects would be one of: /srv, /opt, /usr/src, /usr/local — optionally suffixed by APPNAME. Other common paths are: /app and /code.

/app - is not in the Filesystem Hierarchy Standard (FHS). The advantage of this over other paths is that it is just one level deep and short.
/code - is not in the FHS either, but it is short. However it feels like it contains source code, not runnable code.
/usr/src/APPNAME - anything here feels to me like it still needs compilation and copying to a real location.
/usr/local - /usr/local/APPNAME is inconveniently far away, and /usr/local/src/APPNAME suffers from the same problem as /code and /usr/src.
/opt - /opt or better /opt/APPNAME has always been my preferred place to put pre-built binary packages (for example Java applications, or other closed-source binaries) that did not arrive through the package manager. Placing my own code here feels wrong.
/srv would probably be my preferred location, because it's in the FHS and short enough, and empty. But on the other hand, it's not necessarily immediately clear that the application code resides here.

A quick scan of Dockerfile examples yields this:

golang: /go/src/app  [https://hub.docker.com/_/golang/]
hylang: /usr/src/app [https://hub.docker.com/_/hylang/]
perl:   /usr/src/app [https://hub.docker.com/_/perl/]
python: /app         [https://docs.docker.com/get-started/part2/#your-new-development-environment]
ruby:   /app         [https://github.com/buildkite/ruby-docker-example/blob/master/Dockerfile]

A quick Google search for the phrase COPY . /PATH yields these counts:

26k: /app
25k: /usr/src/app
 3k: /usr/src/APPNAME
 2k: /usr/local/...
 2k: /opt/APPNAME
 1k: /code + /code/APPNAME
 1k: /srv + /srv/APPNAME

Okay, so COPY . /app wins hands down. If we turn a blind eye to the FHS violation, this is the most common and best option. For golang applications we may make an exception, since go is rather picky about its paths.

2018-02-01 - ubuntu / goodbye unity / welcome gnome-shell

After having gotten used to Unity on the Ubuntu desktop, with Ubuntu Artful it is time to say goodbye. When Ubuntu first added the Unity shell with just the sidebar with big buttons, in favor of the more traditional GNOME with its Windows 95 style interface, many were skeptical, me included. But removing the clutter was good, and I've happily worked with it for years. And you really don't want to waste time tweaking your desktop away from the OS provided defaults.

Ubuntu has now moved on, and now I'm faced with a new shell, the (new) GNOME Shell. Here are some tweaks/tips to make things bearable/usable for those moving from Unity.

CTRL-ALT is now SUPER

Moving windows to LEFT/RIGHT. Locking the screen: no more CTRL+ALT+left/right/L, now it's SUPER+left/right/L. This one requires updates to the muscle memory.

The close cross has moved back to the right

Unity moved the window buttons to the left (to be more like OS X, I guess?), and now they're back on the right again. I'm hesitant about switching this back to the left. (Change as little as possible.) But it feels like I have to move my mouse more to the right side of the screen than usual now.

Switching back and forth is a matter of:

$ gsettings set org.gnome.desktop.wm.preferences \
    button-layout 'close,minimize,maximize:'  # buttons left

$ gsettings set org.gnome.desktop.wm.preferences \
    button-layout ':minimize,maximize,close'  # buttons right

Fixing the ALT-Tab application switcher

Both in Unity and in GNOME Shell the default Alt-TAB behaviour does some crazy application switching with window-subswitching. That has never been usable and needs fixing ASAP.

In Unity you would use ccsm and disable the Ubuntu Unity Switcher, and then enable the Classic Switcher.

In GNOME Shell, it is even easier. It can be adjusted using the "Switch windows" and "Switch applications" shortcuts, either through Settings -> Devices -> Keyboard -> Keyboard shortcuts, or through the CLI:

$ dconf dump /org/gnome/desktop/wm/keybindings/ |
    grep -E '^switch-(application|window)s='
switch-windows=['<Alt>Tab']
switch-applications=@as []

$ gsettings set org.gnome.desktop.wm.keybindings switch-applications "[]" &&
    gsettings set org.gnome.desktop.wm.keybindings switch-applications-backward "[]" &&
    gsettings set org.gnome.desktop.wm.keybindings switch-windows "['<Alt>Tab']" &&
    gsettings set org.gnome.desktop.wm.keybindings switch-windows-backward "['<Shift><Alt>Tab']"

Adding seconds to the clock

Another gsettings to add seconds to the clock.

$ gsettings set org.gnome.desktop.interface clock-show-seconds true

Keyboard shortcut based window positioning

With Unity (actually Compiz), I could position three windows side by side (widths 33%, 34%, 33%) on my very wide screen using CTRL-ALT-num4, CTRL-ALT-num5 and CTRL-ALT-num6. After enabling "Cycle Through Multiple Sizes" in the CompizConfig Settings Manager (ccsm), in the Grid plugin's Resize Actions page.

For GNOME Shell, there is the Put Windows extension that works for the left and the right positioning. (Again, use SUPER instead of CTRL+ALT now.) However, as of writing this, it needs tweaks to correctly positioning the center window. (See below.)

Setting up extensions is oddly enough done through your browser. You can open up the Looking Glass "extensions inspector" with ALT-F2 "lg", but that's only used for debugging. You'll need the browser plugin too. For the Chromium browser, you'll apt install chrome-gnome-shell.

Now you can install the Put Windows extension, and go to https://extensions.gnome.org/local/ to see the locally installed extensions — and edit their properties.

THESE CHANGES HAVE BEEN MERGED ON 2018-02-08. You may get away with skipping this step:
After installing Put Windows, go to ~/.local/share/gnome-shell/extensions/putWindow@clemens.lab21.org and replace it with the ossobv-changes branch of the Put Windows extensions (OSSO B.V.). Then log out of GNOME Shell, and log back in.

Browse to the local extensions settings page again, and bring up the Configure window. On the "Main" pane, you'll set the "Center Width & Height" heights all to 100, and set the widths to 100, 50 and 34.

Now CTRL+ALT+num5 will cycle the window width through 100%, 50% and 34%.

You'll probably want to enable "Keep width when moving north/south" as well; it makes sense.

Switching between X and console

Technically, switching between graphical shell and console. It's now done through CTRL+ALT+F1 (graphical login), CTRL+ALT+F2 (console), CTRL+ALT+F3 (graphical desktop). ALT+F7 is not the place to return to anymore.

Fixing Wayland / GNOME Shell crashes

On my first day of using GNOME Shell on my desktop, it crashed as soon as I locked the screen or turned the monitor off. It appears I ran into Launchpad bugs #1724557 and #1726352, as I got the following messages in syslog:

gnome-shell[2281]: segfault at 38 ip 00007fb4cef46cf0 sp 00007ffd8abb78f8 error 4
  in libmutter-1.so.0.0.0[7fb4ceef4000+142000]

gnome-shell[7103]: segfault at 18 ip 00007f483ef261bc sp 00007ffd23760320 error 4
  in libmutter-1.so.0.0.0[7f483ee6c000+142000]

Running ubuntu-bug on the files in /var/crash revealed useful stack traces.

A possible fix, which appears to work: an updated libmutter_3.26.2-0ubuntu0.1osso1. Don't forget to log out and restart gdm after installing.

Further

After the update, you may stuble upon screen(1) wiping the copy-buffer again.

And unfortunately, with Wayland, my xpaste CLI tool to copy-paste to the Java IPMIView iKVM viewer doesn't work without workarounds (because Wayland doesn't support poking into / monitoring other windows).

And you may notice that CTRL+s does a scroll-lock again. I don't know where this used to work and where it didn't, but it appears to lock the screen almost everywhere now. Use CTRL+q to get out of the scroll-lock.

Update 2018-09-19

I noticed that clean installs may not set the Compose Key the way the Lord intended. And I've even had GNOME Shell forget the proper config.

Here's the oneliner to get your keys back to normal again:

$ dconf write /org/gnome/desktop/input-sources/xkb-options "['compose:ralt']"

Now you can type:
R-Alt x x for ×,
R-Alt " a for ä,
R-Alt * a for å and
R-Alt = e for € again.

Update 2023-06-22

Replacing the audible bell with a visual one:

$ gsettings set org.gnome.desktop.wm.preferences audible-bell false
$ gsettings set org.gnome.desktop.wm.preferences visual-bell true
$ gsettings set org.gnome.desktop.wm.preferences visual-bell-type frame-flash

Prefer dark mode; gets autopropagated through the browser and CSS.

$ gsettings set org.gnome.desktop.interface color-scheme prefer-dark

2018-01-10 - screen / wipes copy buffer

A mismash of bugs and workarounds causes the copy buffer (X selection) to get wiped some of the time in my recent desktop environment. And that in a seemingly unpredictable manner.

The following bug is mostly in play: GNOME VTE soft reset wipes selection

That bug causes:

reset(1) to wipe the middle-mouse (primary) buffer (although this differs per system — could not put my finger on this);
reset(1) to wipe the clipboard buffer, but only if the reset was called from window that originated the current clipboard buffer contents;
GNU screen(1) initialization to misbehave as reset does, as described above — even through an ssh session — by wiping the buffer, if TERM=xterm-256color.

To add to the confusion, the fix from debian bug 134198 has masked the problem for screen, for as long as the TERM environment has been set to 'xterm'.

You can probably reproduce by simply echoing (part of) the init_string2:

$ TERM=xterm infocmp -1 -C | grep :is=
        :is=\E[!p\E[?3;4l\E[4l\E>:\
$ printf '\x1b[!p'

At this point one or more of your copy buffers may be wiped. Although the effects appear to be worse on my Ubuntu/Artful than on Zesty.

Madness. Well at least now you know where to look when your copy buffer is wiped.

2018-01-08 - dovecot / roundcube / mail read error

Today we ran into a dovecot/imap crash on a Xenial box. The Dovecot in question was the patched dovecot-2.2.22.

Due to an as of yet unexplained cause, reading mail through Thunderbird mail client worked fine, but when opening a message with Roundcube (webmail), most messages would give an odd error about a "message that could not be opened".

An IMAP trace of Roundcube revealed that the IMAP server stopped responding after the client A0004 UID FETCH command. dmesg revealed that this was due to a segfault of the dovecot/imap binary.

# dmesg | tail -n1
[2405902.912457] imap[5745]: segfault at 0 ip 00007f7a56b251f1 sp 00007fff7f6e9fd0
  error 4 in libdovecot.so.0.0.0[7f7a56aa0000+f4000]
# addr2line -Cfe /usr/lib/dovecot/libdovecot.so \
    $(printf %x $((0x7f7a56b251f1 - 0x7f7a56aa0000)))
i_stream_seek
??:?

On our box, the imap server runs chrooted inside the users homedir. After adding LimitCORE=infinity to the dovecot.service file. The segfault produced a nice core dump inside the virtual mail dir, in our case located in /var/mail/DOMAIN/USER.

# gdb /usr/lib/dovecot/imap /var/mail/DOMAIN/USER/core
...
Core was generated by `dovecot/imap'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  i_stream_seek (stream=0x0, v_offset=0) at istream.c:286
...

(gdb) bt
#0  i_stream_seek (stream=0x0, v_offset=0) at istream.c:286

  282 void i_stream_seek(struct istream *stream, uoff_t v_offset)
  ...
  286   if (v_offset >= stream->v_offset &&  // &stream->v_offset == 0x0
  287           i_stream_can_optimize_seek(_stream))

#1  0x00007f7a56b2a4c2 in i_stream_limit_read (stream=0x5632721f8270) at istream-limit.c:34

  27 static ssize_t i_stream_limit_read(struct istream_private *stream)
  ...
  34   i_stream_seek(stream->parent, lstream->istream.parent_start_offset +
  35                 stream->istream.v_offset);

#2  0x00007f7a56b252f3 in i_stream_read (stream=stream@entry=0x5632721f82e0) at istream.c:162
#3  0x00007f7a56b25d2d in i_stream_read_data (stream=0x5632721f82e0, data_r=data_r@entry=0x7fff7f6ea098, size_r=size_r@entry=0x7fff7f6ea0a0, threshold=threshold@entry=1)
    at istream.c:551
#4  0x00007f7a56b09920 in message_parse_header_next (ctx=0x5632721f8420, hdr_r=hdr_r@entry=0x7fff7f6ea0f0) at message-header-parser.c:82
#5  0x00007f7a56b0b37a in preparsed_parse_next_header (ctx=ctx@entry=0x5632721f7ea0, block_r=block_r@entry=0x7fff7f6ea1a0) at message-parser.c:938
#6  0x00007f7a56b0b518 in preparsed_parse_next_header_init (ctx=0x5632721f7ea0, block_r=0x7fff7f6ea1a0) at message-parser.c:987
...

(gdb) print stream.read
$1 = (ssize_t (*)(struct istream_private *)) 0x7f7a56b2a490 <i_stream_limit_read>
(gdb) print stream.seek
$2 = (void (*)(struct istream_private *, uoff_t, bool)) 0x7f7a56b24c70 <i_stream_default_seek_seekable>
(gdb) print stream.stat
$3 = (int (*)(struct istream_private *, bool)) 0x7f7a56b2a3f0 <i_stream_limit_stat>
(gdb) print stream.get_size
$4 = (int (*)(struct istream_private *, bool, uoff_t *)) 0x7f7a56b2a6d0 <i_stream_limit_get_size>

(gdb) print *stream
$5 = {iostream = {refcount = 1914672576,
    name = 0x7f7a56a9ab78 <main_arena+88> "\360\326*r2V", error = 0x0,
    close = 0x7f7a56b25050 <i_stream_default_close>,
    destroy = 0x7f7a56b2a690 <i_stream_limit_destroy>,
    set_max_buffer_size = 0x7f7a56b250a0
      <i_stream_default_set_max_buffer_size>,
    destroy_callbacks = {arr = {buffer = 0x0, element_size = 0}, v = 0x0,
      v_modifiable = 0x0}}, read = 0x7f7a56b2a490 <i_stream_limit_read>,
  seek = 0x7f7a56b24c70 <i_stream_default_seek_seekable>, sync = 0x0,
  stat = 0x7f7a56b2a3f0 <i_stream_limit_stat>,
  get_size = 0x7f7a56b2a6d0 <i_stream_limit_get_size>, switch_ioloop = 0x0,
  istream = {v_offset = 0, stream_errno = 0, mmaped = 0, blocking = 1,
    closed = 0, readable_fd = 1, seekable = 1, eof = 0,
    real_stream = 0x5632721f8270}, fd = 15, abs_start_offset = 0,
  statbuf = {st_dev = 0, st_ino = 0, st_nlink = 0, st_mode = 0, st_uid = 0,
    st_gid = 0, __pad0 = 0, st_rdev = 0, st_size = -1, st_blksize = 0,
    st_blocks = 0, st_atim = {tv_sec = 1515411215, tv_nsec = 0}, st_mtim = {
      tv_sec = 1515411215, tv_nsec = 0}, st_ctim = {tv_sec = 1515411215,
      tv_nsec = 0}, __glibc_reserved = {0, 0, 0}}, io = 0x0, buffer = 0x0,
  w_buffer = 0x0, buffer_size = 0, max_buffer_size = 8192,
  init_buffer_size = 8192, skip = 0, pos = 0, try_alloc_limit = 0,
  parent = 0x0, parent_start_offset = 0, parent_expected_offset = 0,
  access_counter = 0, line_str = 0x0, line_crlf = 0, return_nolf_line = 0,
  stream_size_passthrough = 0, nonpersistent_buffers = 0}

(gdb) print stream->parent
$6 = (struct istream *) 0x0

The above is wrong, it crashes because stream->parent is NULL. But it should never be NULL when the stream is of istream-limit type (the read virtual method being i_stream_limit_read).

According to the i_stream_create_limit() constructor the parent should've been set to something non-NULL (in i_stream_create()). So — unless it's legal to blank out the parent once it exists — it looks like we're looking at some kind of memory corruption.

A quick round of googling of the Dovecot mailing list turned up nothing useful. The Dovecot 2.3 changelog did show this:

v2.3.0 2017-12-22  Timo Sirainen <tss iki.fi>
...
  - Input streams are more reliable now when there are errors or when
    the maximum buffer size is reached. Previously in some situations
    this could have caused Dovecot to try to read already freed memory.

Perhaps that would fix things, but upgrading the entire Dovecot package was beyond the scope at this point.

A last resort was cleaning up the per-user cache/log dovecot files:

# ls -1 /var/mail/DOMAIN/USER/dovecot*
/var/mail/DOMAIN/USER/dovecot.index
/var/mail/DOMAIN/USER/dovecot.index.cache
/var/mail/DOMAIN/USER/dovecot.index.log
/var/mail/DOMAIN/USER/dovecot.index.log.2
/var/mail/DOMAIN/USER/dovecot-keywords
/var/mail/DOMAIN/USER/dovecot-uidlist
/var/mail/DOMAIN/USER/dovecot-uidvalidity
/var/mail/DOMAIN/USER/dovecot-uidvalidity.5a535834

# rm /var/mail/DOMAIN/USER/dovecot*

And at last, this fixed the issue. (Except not, see below!)

This also changed the IMAP login handshake from:

S: A0002 OK [CAPABILITY IMAP4rev1 LITERAL+ SASL-IR LOGIN-REFERRALS ID ENABLE
   IDLE SORT SORT=DISPLAY THREAD=REFERENCES THREAD=REFS
   THREAD=ONSELECT
   CHILDREN NAMESPACE UIDPLUS LIST-EXTENDED I18NLEVEL=1 CONDSTORE QRESYNC
   ESEARCH ESORT SEARCHRES WITHIN CONTEXT=SEARCH LIST-STATUS BINARY MOVE
   SPECIAL-USE] Logged in
...
S: * FLAGS (\Answered \Flagged \Deleted \Seen \Draft $Forwarded Junk NonJunk $label2 $label1 $ATTACHMENT)
S: * OK [PERMANENTFLAGS (\Answered \Flagged \Deleted \Seen \Draft $Forwarded Junk NonJunk $label2 $label1 $ATTACHMENT \*)] Flags permitted

To this:

S: A0002 OK [CAPABILITY IMAP4rev1 LITERAL+ SASL-IR LOGIN-REFERRALS ID ENABLE
   IDLE SORT SORT=DISPLAY THREAD=REFERENCES THREAD=REFS
   THREAD=ORDEREDSUBJECT MULTIAPPEND URL-PARTIAL CATENATE UNSELECT
   CHILDREN NAMESPACE UIDPLUS LIST-EXTENDED I18NLEVEL=1 CONDSTORE QRESYNC
   ESEARCH ESORT SEARCHRES WITHIN CONTEXT=SEARCH LIST-STATUS BINARY MOVE
   SPECIAL-USE] Logged in
...
S: * FLAGS (\Answered \Flagged \Deleted \Seen \Draft unknown-0 unknown-2 unknown-1)
S: * OK [PERMANENTFLAGS (\Answered \Flagged \Deleted \Seen \Draft unknown-0 unknown-2 unknown-1 \*)] Flags permitted.

That might help someone else debugging this further.

Update 2018-01-26

Turns out things weren't fixed at all. Large directories would still cause dovecot to dump core.

Also, removing all the dovecot* files has the drawback of clearing/hiding IMAP-client set labels (like "Work" and "Important"). Don't do it if you don't need to.

Instead, updating Dovecot to version 2.3.0 did the trick. There are sources, and some prebuilt packages on their website. Or you can build your own Ubuntu/Xenial package.