Notes to self, 2023
2023-10-15 - mplayer / screen saver / wayland
I don't usually watch movies from my laptop, but when I do, I don't want the screen saver to kick in.
Occasionally, I notice that my old time movie player mplayer does not inhibit the screen saver or screen blanking. That means that 5 minutes into a movie, the screen turns black.
I don't think this used to be the case. Maybe it's because I'm running the Wayland display server in GNOME now.
In any case, inhibiting the screen blanking can be done using the gnome-session-inhibit program:
gnome-session-inhibit --inhibit idle PROGRAM
For example:
gnome-session-inhibit --inhibit idle mplayer /path/to/wholesome-documentary.mkv
2023-09-20 - bash / postfix health check / dev tcp
Using /dev/tcp in bash for a health check? Here's an example.
I had a script that used netcat to connect to a Postfix email daemon to check its health status. To avoid pipelining errors I had it sleep between each write. The core looked somewhat like this:
messages=$(for x in \ 'EHLO localhost' \ 'MAIL FROM:<healthz@localhost>' \ 'RCPT TO:<postmaster@example.com>' \ RSET \ QUIT do sleep 0.1; echo "$x"; done | nc -v $HOST 25 2>&1 | tr -d '\r')
This works, but the sleeps make it slower than necessary, and more brittle. If the daemon is temporarily slow, we can trigger a Postfix SMTP command pipelining error anyway.
Ideally, we want to read the responses, and act on them immediately instead.
Here's a script that uses bash instead of POSIX sh
because bash has /dev/tcp
support, which makes
doing networking I/O easier.
Starting bash might be slightly costlier than starting a smaller POSIX sh like dash. But we avoid calling netcat and some other tools, so we win out not only in speed but also in resource usage.
#!/bin/bash # Using bash (not POSIX sh) for /dev/tcp I/O. : ${BASH_VERSION:?Use bash, not POSIX-sh} set -u DEBUG=0 READ_TIMEOUT=10 CHAR_CR=$'\r' #HOST=mail.example.com #HOST=1.2.3.4 HOST=127.0.0.1 PROXY_PROTOCOL= if [ "$HOST" = 127.0.0.1 ]; then # This should succeed in mere milliseconds. But sometimes postfix # decides to take just over a second for the RCPT TO check. READ_TIMEOUT=3 PROXY_PROTOCOL='PROXY TCP4 127.0.0.1 127.0.0.1 12345 25' # (or empty) fi getresp() { local line status while :; do read -t $READ_TIMEOUT -r line [ $DEBUG -ne 0 ] && printf '%s\n' "<<< $line" >&2 printf '%s\n' "${line%$CHAR_CR}" test -z "$line" && exit 65 status=${line:0:3} if [ $status -lt 200 -o $status -ge 300 ]; then exit 66 elif [ "${line:3:1}" = ' ' ]; then break # "250 blah" elif [ "${line:3:1}" = '-' ]; then true # "250-blah" (continue) else exit 67 fi done } if ! exec 3<>/dev/tcp/$HOST/25; then # open fd # Takes a looooot of time. 2m10 in the test case. You will want to wrap # this script in a timeout(1) call. # $0: connect: Connection timed out # $0: line 40: /dev/tcp/1.2.3.4/25: Connection timed out exit 1 fi messages=$(for x in \ "$PROXY_PROTOCOL" \ 'EHLO localhost' \ 'MAIL FROM:<healthz@localhost>' \ 'RCPT TO:<postmaster@example.com>' \ RSET \ QUIT do \ [ -n "$x" -a $DEBUG -ne 0 ] && printf '>>> %s\n' "$x" >&2 [ -n "$x" ] && printf '%s\r\n' "$x" >&3 getresp <&3 || exit $? done) ret=$? exec 3>&- # close fd ok=$(echo "$messages" | grep -xF '250 2.0.0 Ok') fail=$(echo "$messages" | sed -e1d | grep -v ^2 | grep '') if [ $ret -ne 0 -o -z "$ok" ]; then echo "Missing OK line (ret $ret). Got:" $messages false elif [ -n "$fail" ]; then echo "$fail" false else true fi
One can invoke this from something like a Haproxy check script — this is also the part where you add a timeout call.
#!/bin/sh CHECK_SCRIPT=/usr/local/bin/postfix-is-healthy if test -f /srv/in_maintenance; then echo 'drain stopped 0%' elif error=$(timeout -k3 -s1 2 $CHECK_SCRIPT 2>&1); then echo 'ready up 100%' else echo 'fail #' $error fi
And that script could be invoked from something like xinetd:
service postfix-haproxy { flags = REUSE socket_type = stream type = unlisted port = 1025 wait = no user = nobody server = /usr/local/bin/postfix-haproxy-agent-check log_on_failure += USERID disable = no only_from = 0.0.0.0/0 ::1 per_source = UNLIMITED }
2023-09-18 - gpgv / can't allocate lock for
gpgv prints out a warning that it cannot allocate a lock. This looks like something should be fixable, but it isn't.
Observed with gpg version 2.2.27-3ubuntu2.1:
$ gpgv </dev/null gpgv: can't allocate lock for '/home/walter/.gnupg/trustedkeys.gpg' gpgv: verify signatures failed: Unknown system error
The issue at hand here is the “gpgv: can't allocate lock
for '/home/walter/.gnupg/trustedkeys.gpg'”
. Can we fix
something to suppress that?
TL;DR: no
Investigation
Fire up the debugger gdb and break at keybox_lock.constprop.0.isra.0
.
(gdb) break keybox_lock.constprop.0.isra.0 Breakpoint 1 at 0x2f850: file ../kbx/../../kbx/keybox-init.c, line 271.
(gdb) run Starting program: /usr/bin/gpgv [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". Breakpoint 1, keybox_lock.constprop.0.isra.0 (yes=yes@entry=1, timeout=0, hd=<optimized out>, hd=<optimized out>) at ../kbx/../../kbx/keybox-init.c:271 271 keybox_lock (KEYBOX_HANDLE hd, int yes, long timeout)
(gdb) list 266 * Lock the keybox at handle HD, or unlock if YES is false. TIMEOUT 267 * is the value used for dotlock_take. In general -1 should be used 268 * when taking a lock; use 0 when releasing a lock. 269 */ 270 gpg_error_t 271 keybox_lock (KEYBOX_HANDLE hd, int yes, long timeout) 272 { 273 gpg_error_t err = 0; 274 KB_NAME kb = hd->kb; 275
(gdb) list 276 if (!keybox_is_writable (kb)) 277 return 0; 278 279 /* Make sure the lock handle has been created. */ 280 if (!kb->lockhd) 281 { 282 kb->lockhd = dotlock_create (kb->fname, 0); 283 if (!kb->lockhd) 284 { 285 err = gpg_error_from_syserror ();
(gdb) next 276 if (!keybox_is_writable (kb))
(gdb) 280 if (!kb->lockhd)
(gdb) 285 err = gpg_error_from_syserror ();
Okay, there was a dotlock_create
that supposedly failed.
But it isn't even called!
We can confirm this in the machine code:
$ objdump -d `which gpgv` --disassemble=keybox_lock.constprop.0.isra.0 ... 000000000002f850 <keybox_lock.constprop.0.isra.0>: 2f850: f3 0f 1e fa endbr64 2f854: 41 55 push %r13 2f856: 41 54 push %r12 2f858: 55 push %rbp 2f859: 53 push %rbx 2f85a: 48 83 ec 08 sub $0x8,%rsp 2f85e: 48 85 ff test %rdi,%rdi 2f861: 74 3d je 2f8a0 <keybox_lock.constprop.0.isra.0+0x50> 2f863: 4c 8d 6f 30 lea 0x30(%rdi),%r13 2f867: 48 89 fb mov %rdi,%rbx 2f86a: 89 f5 mov %esi,%ebp 2f86c: be 02 00 00 00 mov $0x2,%esi 2f871: 4c 89 ef mov %r13,%rdi 2f874: e8 97 6f fd ff call 6810 <gpgrt_access@plt>
This call
was the keybox_is_writable
check.
2f879: 41 89 c4 mov %eax,%r12d 2f87c: 85 c0 test %eax,%eax 2f87e: 75 20 jne 2f8a0 <keybox_lock.constprop.0.isra.0+0x50> 2f880: 48 83 7b 20 00 cmpq $0x0,0x20(%rbx)
Here is the first !kb->lockhd
check.
2f885: 74 41 je 2f8c8 <keybox_lock.constprop.0.isra.0+0x78>
It was zero, so we jump to 0x2f8c8
.
2f887: 8b 43 28 mov 0x28(%rbx),%eax 2f88a: 85 ed test %ebp,%ebp 2f88c: 74 2a je 2f8b8 <keybox_lock.constprop.0.isra.0+0x68> 2f88e: 85 c0 test %eax,%eax ...
0x2f8c8
is here: immediate error handling. No attempt to
call dotlock_create
at all.
2f8c8: e8 d3 6a fd ff call 63a0 <gpg_err_code_from_syserror@plt> 2f8cd: 4c 89 ee mov %r13,%rsi 2f8d0: 48 8d 3d db c6 00 00 lea 0xc6db(%rip),%rdi # 3bfb2 <__FUNCTION__.0.lto_priv.14+0x192>
Here the "can't allocate lock for '%s'\n"
string is loaded.
2f8d7: 41 89 c4 mov %eax,%r12d 2f8da: 0f b7 c0 movzwl %ax,%eax 2f8dd: 0d 00 00 00 08 or $0x8000000,%eax 2f8e2: 45 85 e4 test %r12d,%r12d 2f8e5: 44 0f 45 e0 cmovne %eax,%r12d 2f8e9: 31 c0 xor %eax,%eax 2f8eb: e8 70 5b fe ff call 15460 <log_info> 2f8f0: eb b1 jmp 2f8a3 <keybox_lock.constprop.0.isra.0+0x53>
So, we get a “can't allocate” warning always.
If we build gpgv
ourselves, we can take a look at the
gpgv.o
object file, which has a no-op
dotlock_create
:
0000000000000400 <dotlock_create>: 400: f3 0f 1e fa endbr64 404: 31 c0 xor %eax,%eax 406: c3 ret 407: 66 0f 1f 84 00 00 00 nopw 0x0(%rax,%rax,1) 40e: 00 00
This function does nothing and returns NULL
. When
linking, it is elided, so the call disappears, along with the double
!kb->lockhd
check.
(That nopw
is an extended nop
(nop-op). It is there for alignment purposes only.)
Conclusion
This leaves two questions:
- Why is
dotlock_create
a no-op? Apparentlygpgv.c
contains this:// ... /* We do not do any locking, so use these stubs here */ void dotlock_disable (void) { } dotlock_t dotlock_create (const char *file_to_lock, unsigned int flags) { (void)file_to_lock; (void)flags; return NULL; } // ...
- Why are we forced to look at a warning message for something that will never succeed anyway? I guess no one saw this as an issue.
2023-09-15 - etckeeper / git / pack-objects died of signal 9
Is your etckeeper dying on the git gc?
On several machines I have now seen git run out of memory
when trying to do repo optimization and garbage collection. Usually
this happened in /etc
where we like to have
etckeeper. It might look like this:
... warning: The last gc run reported the following. Please correct the root cause and remove .git/gc.log Automatic cleanup will not be performed until the file is removed. error: pack-objects died of signal 9 fatal: failed to run repack
A manual git gc did not help:
# git gc --prune=now Enumerating objects: 11391, done. Counting objects: 100% (11391/11391), done. Delta compression using up to 2 threads error: pack-objects died of signal 9) fatal: failed to run repack
But combining it with a memory limit did:
# git -c pack.windowMemory=1024m gc --prune=now Enumerating objects: 11391, done. Counting objects: 100% (11391/11391), done. Delta compression using up to 2 threads Compressing objects: 100% (5664/5664), done. Writing objects: 100% (11391/11391), done. Total 11391 (delta 7052), reused 8042 (delta 5041), pack-reused 0
That was quite slow. But now everything is okay again.
2023-09-14 - segfault in library / addr2line / objdump
Yesterday, we spotted some SEGFAULTs on an Ubuntu/Focal server. We did not have core dumps, but the kernel message in dmesg was sufficient to find a culprit.
The observed messages were these:
nginx[854]: segfault at 6d702e746379 ip 00007ff40dc2f5a3 sp 00007fffd51c8420 error 4 in libperl.so.5.30.0[7ff40dbc7000+166000] Code: 48 89 43 10 48 83 c4 18 5b 5d 41 5c 41 5d 41 5e 41 5f c3 0f 1f 40 00 0f b6 7f 30 48 c1 e8 03 48 29 f8 48 89 c3 74 89 48 8b 02 <4c> 8b 68 10 4d 85 ed 0f 84 28 01 00 00 0f b6 40 30 49 c1 ed 03 49
nginx[951947]: segfault at 10 ip 00007fba4a1645a3 sp 00007ffe57b0f8a0 error 4 in libperl.so.5.30.0 (deleted)[7fba4a0fc000+166000] Code: 48 89 43 10 48 83 c4 18 5b 5d 41 5c 41 5d 41 5e 41 5f c3 0f 1f 40 00 0f b6 7f 30 48 c1 e8 03 48 29 f8 48 89 c3 74 89 48 8b 02 <4c> 8b 68 10 4d 85 ed 0f 84 28 01 00 00 0f b6 40 30 49 c1 ed 03 49
And after upgrading libperl5.30 from 5.30.0-9ubuntu0.3 to 5.30.0-9ubuntu0.4, we got these similar ones:
traps: nginx[955774] general protection fault ip:7f6af33345a3 sp:7ffe74310100 error:0 in libperl.so.5.30.0[7f6af32cc000+166000]
nginx[1049280]: segfault at 205bd ip 00007f5e60d265d9 sp 00007ffe7b5f08c0 error 4 in libperl.so.5.30.0[7f5e60cbe000+166000] Code: 00 0f b6 40 30 49 c1 ed 03 49 29 c5 0f 84 17 01 00 00 48 8b 76 10 48 8b 52 10 4c 8d 3c fe 4c 8d 0c c2 84 c9 0f 84 c7 02 00 00 <49> 83 39 00 0f 85 ad 03 00 00 49 83 c1 08 49 83 ed 01 49 8d 74 1d
Apparently they were triggered by an nginx reload.
If we had a proper core dump, we could extract lots of useful info from it: where the crash occurred, which registers and variables were set, and the call chain (backtrace). With the info from above, we can at most get where the crash happened, and maybe which register had a bad value. But it is definitely better than nothing.
Feeding calculated offset to addr2line
For the most basic attempt, I found a box which still had libperl version
5.30.0-9ubuntu0.3. I installed the perl-debug apt
package — perl-debug_5.30.0-9ubuntu0.3_amd64.deb
from
https://launchpadlibrarian.net/ — there. From the kernel message
“nginx[854]: segfault at 6d702e746379 ip 00007ff40dc2f5a3 sp
00007fffd51c8420 error 4 in
libperl.so.5.30.0[7ff40dbc7000+166000]”
we take the
instruction pointer 00007ff40dc2f5a3
and subtract
the library starting position 7ff40dbc7000
:
0x7ff40dc2f5a3 - 0x7ff40dbc7000 = 0x685a3
Feed that to addr2line and get the location of the crash... right?
$ addr2line -Cfe /usr/lib/x86_64-linux-gnu/libperl.so.5.30.0 685a3 Perl_vload_module op.c:7750
At first glance that appears okay. But when we check what happens in the machine instructions there, it is not:
$ objdump -d /usr/lib/x86_64-linux-gnu/libperl.so.5.30.0 --disassemble=Perl_vload_module ... 00000000000685a3 <Perl_vload_module@@Base+0xd3>: ... 6859b: 83 c0 08 add $0x8,%eax 6859e: 49 03 57 10 add 0x10(%r15),%rdx 685a2: 41 89 07 mov %eax,(%r15) 685a5: 48 8b 0a mov (%rdx),%rcx 685a8: 45 31 e4 xor %r12d,%r12d 685ab: 48 85 c9 test %rcx,%rcx ...
There is no instruction start at 0x685a3
!
Searching for machine code inside a binary
What if we simply look for the instructions as shown in the Code:
message?
To this end, I hacked together a script that does the following:
- Spawn a copy of
objdump
to disassemble the binary; - look for the instructions as passed on the command line;
- display where the instructions are found.
The objdump-find-instructions.py
script is collapsed here (see “details”):
details of objdump-find-instructions.py
#!/usr/bin/env python3 import re import subprocess import sys # Look for these: # > 19640c: 48 89 44 24 28 mov %rax,0x28(%rsp) # > 196411: 31 c0 xor %eax,%eax # > 196413: 48 85 db test %rbx,%rbx code_re = re.compile( br'^\s+(?P<addr>[0-9a-f]+):(?P<code>(\s[0-9a-f]{2})+)\s+' br'(?P<decoded>.*)') code_without_decoded_re = re.compile( br'^\s+(?P<addr>[0-9a-f]+):(?P<code>(\s[0-9a-f]{2})+)\s*$') # Look for these: # > 000000000004ea40 <Perl_ck_concat@@Base>: func_re = re.compile(br'^(?P<addr>[0-9a-f]+) <(?P<name>[^<>]*)>:') # Look for blanks: blank_re = re.compile(br'^\s*$') # Lines to ignore: ignore_re = re.compile( br'^/.*:\s+file format |^Disassembly of section ') def to_bin(binstr_array): return bytes([int(i, 16) for i in binstr_array]) def to_hex(binarray): return ' '.join('{:02x}'.format(i) for i in binarray) # Get executable/binary from argv executable = sys.argv[1] # /usr/lib/x86_64-linux-gnu/libperl.so.5.30 # Get needle from argv needle = [i.encode() for i in sys.argv[2:]] # ['48', '89', '44', '24', '28'] needle_len = len(needle) assert needle_len >= 2, 'must specify XX XX XX bytes to search for' needle_bin = to_bin(needle) MAX_BUF = needle_len + 30 class Matcher: def search(self, haystack, regex): self.match = regex.search(haystack) if self.match: self.dict = self.match.groupdict() return self.match def get(self, key): return self.dict[key] # Execute proc = subprocess.Popen( ['/usr/bin/objdump', '-d', executable], stdout=subprocess.PIPE) # Parse code_bin = bytearray() last_func = None last_addr = None matcher = Matcher() for line in proc.stdout: line = line.rstrip() if matcher.search(line, blank_re): last_func = None last_addr = None elif matcher.search(line, func_re): last_func = matcher.get('name') last_addr = matcher.get('addr') elif (matcher.search(line, code_re) or matcher.search(line, code_without_decoded_re)): new_code_bin = to_bin(matcher.get('code').lstrip().split()) code_bin.extend(new_code_bin) code_bin = code_bin[-MAX_BUF:] # truncate early # This contains search on binary is pretty fast, compared to doing # sub-array comparisons. # real 0m9.873s --> 0m4.000s # user 0m12.637s --> 0m6.624s if needle_bin in code_bin: print( last_addr.decode(), last_func.decode(), matcher.get('addr').decode(), matcher.get('decoded').decode(), to_hex(new_code_bin)) # print('//', to_hex(code_bin)) assert needle_len > 1 code_bin = code_bin[(-needle_len + 1):] # skip same results elif matcher.search(line, ignore_re): pass else: print('discarding', line) exit(2)
The script is invoked like this:
$ python3 objdump-find-instructions.py PATH_TO_BINARY INSTRUCTIONS...
We include all instructions up to and including the <4c>
and
invoke it like this:
$ python3 objdump-find-instructions.py \ /usr/lib/x86_64-linux-gnu/libperl.so.5.30.0 \ 48 89 43 10 48 83 c4 18 5b 5d 41 5c 41 5d 41 \ 5e 41 5f c3 0f 1f 40 00 0f b6 7f 30 48 c1 e8 \ 03 48 29 f8 48 89 c3 74 89 48 8b 02 4c
It spews out this one line:
00000000000b0500 Perl__invlist_intersection_maybe_complement_2nd@@Base b05a3 mov 0x10(%rax),%r13 4c 8b 68 10
That contains the following info:
- The function
Perl__invlist_intersection_maybe_complement_2nd@@Base
starts at00000000000b0500
. - At
0xb05a3
there is amov 0x10(%rax),%r13
instruction. - That instruction is
4c 8b 68 10
in machine code.
That instruction corresponds with the position in the Code:
log line.
Code: [... 8b 02] <4c> 8b 68 10 [4d 85 ...]
This looks like a much better candidate than the
Perl_vload_module
we got from addr2line.
The reading of 0x10(%rax)
matches the second
crash perfectly: if the %rax
register is 0 — a
common value — then this would produce a segfault at
10
.
Getting the surrounding code from objdump:
$ objdump -d /usr/lib/x86_64-linux-gnu/libperl.so.5.30.0 --start-address=0xb0500 ... 00000000000b0500 <Perl__invlist_intersection_maybe_complement_2nd@@Base>: ... b059e: 74 89 je b0529 <Perl__invlist_intersection_maybe_complement_2nd@@Base+0x29> b05a0: 48 8b 02 mov (%rdx),%rax b05a3: 4c 8b 68 10 mov 0x10(%rax),%r13 b05a7: 4d 85 ed test %r13,%r13 b05aa: 0f 84 28 01 00 00 je b06d8 <Perl__invlist_intersection_maybe_complement_2nd@@Base+0x1d8> ...
Offset 0x48000
I was confident that this is the right crash location. And because Perl did have a problem with the code in this vicinity, it was easy to file a lp2035339 bug report.
But I could not explain yet why the calculated offset of
0x685a3
is off. The difference between
0x685a3
and 0xb05a3
is 0x48000
.
A bit of poking around the binary did turn up this:
$ objdump -p /usr/lib/x86_64-linux-gnu/libperl.so.5.30.0 ... Dynamic Section: NEEDED libdl.so.2 NEEDED libm.so.6 NEEDED libpthread.so.0 NEEDED libc.so.6 NEEDED libcrypt.so.1 SONAME libperl.so.5.30 INIT 0x0000000000048000 FINI 0x00000000001ad6b4 ...
The machine instructions reside between 0x48000
and
0x1ad6b4
. That's where we got the extra
0x48000
we need.
So, next time we do an addr2line lookup of a library, we should
check the INIT
offset, and add that to calculated
instruction pointer position.
Check with newer version
After upgrading both libperl and perl-debug on the test box, we could confirm that the latest crashes were caused by the same problem.
From “traps: nginx[955774] general protection fault
ip:7f6af33345a3 sp:7ffe74310100 error:0 in
libperl.so.5.30.0[7f6af32cc000+166000]”
and the INIT
offset of 0x48000
we get 0xb05a3
and from
“nginx[1049280]: segfault at 205bd ip 00007f5e60d265d9 sp
00007ffe7b5f08c0 error 4 in
libperl.so.5.30.0[7f5e60cbe000+166000]”
we get
0xb05d9
.
addr2line gives us:
$ addr2line -Cfe /usr/lib/x86_64-linux-gnu/libperl.so.5.30.0 b05a3 b05d9 Perl__invlist_intersection_maybe_complement_2nd invlist_inline.h:51 Perl__invlist_intersection_maybe_complement_2nd regcomp.c:9841
Both in Perl__invlist_intersection_maybe_complement_2nd
. Same problem.
general protection fault vs. segfault
Lastly, why did we get a “traps: ... general protection fault ...
error:0”
for one crash and “segfault at ... ip ... error
4”
for the others?
I'm not entirely sure. As far as I can gather, this could be the
difference between the segmentation violation happening while running
in kernel mode versus running in user mode. The error code
of 0
vs. 4
does indicate as much.
(See “details” for a snippet from arch/x86/include/asm/trap_pf.h
.)
details of error_code
/* * Page fault error code bits: * * bit 0 == 0: no page found 1: protection fault * bit 1 == 0: read access 1: write access * bit 2 == 0: kernel-mode access 1: user-mode access * bit 3 == 1: use of reserved bit detected * bit 4 == 1: fault was an instruction fetch * bit 5 == 1: protection keys block access * bit 15 == 1: SGX MMU page-fault */ enum x86_pf_error_code { X86_PF_PROT = 1 << 0, X86_PF_WRITE = 1 << 1, X86_PF_USER = 1 << 2, X86_PF_RSVD = 1 << 3, X86_PF_INSTR = 1 << 4, X86_PF_PK = 1 << 5, X86_PF_SGX = 1 << 15, };
But maybe it has a different reason, like the specific memory location that was tried (we don't see it in this message). Let me know if you know!
2023-09-12 - qpress / qz1 extension
This is a quick note to my future self. As we're using qpress less in favor of lz4 that has been available on Ubuntu Focal and above, we're inclined to forget what the .qz1 extension means.
Wondering what files with the .qz1
extension do?
This concerns single stream qpress compressed files.
Historically qpress uses the .qp
extension, but that concerns
multifile archives. The qpress binary can write compressed streams
to stdout, but it will not decompress them to stdout.
When using streams of data, we often prefer not touching the disk in
between. For that reason there are also qlzip1 and qlzcat1.
What does the 1
in .qz1
mean?
The qpress source also has other levels (2
and 3
). They need to be set at compile time, so
switching at will is not possible.
More importantly, qlzip1 does not write a header that specifies
which level we're dealing with. Trying to decompress a level
3 stream with a level 1 binary would fail.
Where are qpress, qlzip1 and qlzcat1?
Originally, the source code and a summary could be found at https://www.quicklz.com/. Today however, this page is suspended.
The applications one might use can be found at GitHub/ossobv/qpress-deb. See the Releases page for sources and precompiled binaries.
2023-09-01 - dirmngr / keyserver / disable-ipv6 / bionic
This morning a build pipeline failed. dirmngr called by apt-key tried to use IPv6, even though it was disabled.
The build logs had this to say:
21.40 + apt-key adv --recv-keys --keyserver hkp://keyserver.ubuntu.com:80 0xF1656F24C74CD1D8 21.50 Warning: apt-key output should not be parsed (stdout is not a terminal) 21.57 Executing: /tmp/apt-key-gpghome.KzTTOZjgZP/gpg.1.sh --recv-keys --keyserver hkp://keyserver.ubuntu.com:80 0xF1656F24C74CD1D8 21.65 gpg: keyserver receive failed: Cannot assign requested address
This was strange for a number of reasons:
- The same Docker build scripts succeeded for other Ubuntu releases. The problem only seemed to affect the bionic based build: on focal it was not broken.
- I could reproduce in a similar bionic Docker environment.
But behaviour appeared to be random — maybe based on the order of the DNS responses:
# apt-key adv --recv-keys --keyserver hkp://keyserver.ubuntu.com:80 0xF1656F24C74CD1D8 Executing: /tmp/apt-key-gpghome.2XtcT1xWZb/gpg.1.sh --recv-keys --keyserver hkp://keyserver.ubuntu.com:80 0xF1656F24C74CD1D8 gpg: keyserver receive failed: Cannot assign requested address
# apt-key adv --recv-keys --keyserver hkp://keyserver.ubuntu.com:80 0xF1656F24C74CD1D8 Executing: /tmp/apt-key-gpghome.TyIbtADNXD/gpg.1.sh --recv-keys --keyserver hkp://keyserver.ubuntu.com:80 0xF1656F24C74CD1D8 gpg: key F1656F24C74CD1D8: "MariaDB Signing Key <signing-key@mariadb.org>" not changed gpg: Total number processed: 1 gpg: unchanged: 1
- Calling it with strace showed that it tried to connect to an IPv6 address:
# strace -e 'signal=!all' -e trace=clone,connect,execve -s 32 -f \ apt-key adv --recv-keys --keyserver hkp://keyserver.ubuntu.com:80 \ 0xF1656F24C74CD1D8 2>&1 | grep -vE 'attached$|exited with|resumed>' ... [pid 9749] execve("/usr/bin/dirmngr", ["dirmngr", "--daemon", "--homedir", "/tmp/apt-key-gpghome.0zrNQAlNLG"], 0x7fff73d35cb8 /* 9 vars */ <unfinished ...> ... [pid 9749] clone(child_stack=NULL, flags=..., child_tidptr=...) = 9750 ... [pid 9750] clone(child_stack=..., flags=..., parent_tidptr=..., tls=..., child_tidptr=...) = 9751 ... [pid 9751] connect(6, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("1.1.1.1")}, 16) = 0 [pid 9751] connect(6, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("1.1.1.1")}, 16) = 0 [pid 9751] connect(7, {sa_family=AF_INET6, sin6_port=htons(80), inet_pton(AF_INET6, "2620:2d:4000:1007::70c", &sin6_addr), sin6_flowinfo=htonl(0), sin6_scope_id=0}, 28) = -1 EADDRNOTAVAIL (Cannot assign requested address) gpg: keyserver receive failed: Cannot assign requested address ...
That log reads as:apt-key
spawns (something that spawns)dirmngr
, which forks twice and then does two DNS lookups (via1.1.1.1
) and finally tries to connect to2620:2d:4000:1007::70c
.
That is unexpected because... - In this Docker environment there wasn't even any IPv6 at all:
# ip -br a lo UNKNOWN 127.0.0.1/8 eth0@if153 UP 172.17.0.2/16
No assigned IP address.
# sysctl -a 2>/dev/null | grep disable_ipv6 net.ipv6.conf.all.disable_ipv6 = 1 net.ipv6.conf.default.disable_ipv6 = 1 net.ipv6.conf.eth0.disable_ipv6 = 1 net.ipv6.conf.lo.disable_ipv6 = 1
And in fact IPv6 is disabled on all interfaces through sysctl.
- If
getaddrinfo()
with AF_UNSPEC is used for the DNS lookup, it shouldn't even try to get an IPv6 address, because dns4only is enabled by default in this environment. With dns4only enabled requesting an unspecified address family (IPv4 or IPv6), only results in IPv4 responses.
Without dns4only, you'd see this Python transcript:>>> from socket import * >>> set([i[4][0] for i in getaddrinfo('keyserver.ubuntu.com', 80)]) {'2620:2d:4000:1007::d43', '185.125.188.27', '185.125.188.26', '2620:2d:4000:1007::70c'}
But here, Python correctly yields only these:>>> from socket import * >>> set([i[4][0] for i in getaddrinfo('keyserver.ubuntu.com', 80)]) {'185.125.188.27', '185.125.188.26'}
This meant that gnupgdirmngr
was doing something custom with DNS resolving, like manually looking up both IPv6 and IPv4. - The offending
dirmngr
has version2.2.4-1ubuntu1.6
and according to the changelog (1.3..1.5 diff) there was a fix for IPv6 in2.2.4-1ubuntu1.4
; but we were already using that. - Interestingly, that bugfix from LP#1910432
does point to the means of checking connectivity. It checks
whether
socket(AF_INET6)
succeeds instead of whetherconnect(ipv6_address)
succeeds:$ cat debian/patches/dirmngr-handle-EAFNOSUPPORT-at-connect_server.patch ... @@ -2940,6 +2942,15 @@ connect_server (const char *server, unsi sock = my_sock_new_for_addr (ai->addr, ai->socktype, ai->protocol); if (sock == ASSUAN_INVALID_FD) { + if (errno == EAFNOSUPPORT) + { + if (ai->family == AF_INET) + ignore_v4 = 1; + if (ai->family == AF_INET6) + ignore_v6 = 1; + continue; ...
But the exposed Linux kernel interface (version5.x
) has no problem with creating an AF_INET6 socket:>>> from socket import * >>> s = socket(AF_INET6) # no failure on this line >>> s.connect(('keyserver.ubuntu.com', 80)) Traceback (most recent call last): File "<stdin>", line 1, in <module> OSError: [Errno 99] Cannot assign requested address
It's first theconnect()
that fails — with a EADDRNOTAVAIL: Cannot assign requested address. - However, the
dirmngr
code in the Ubuntu focal version doesn't seem that different. That one also does two DNS lookups too, but that one correctly selects AF_INET when connecting.
The lookups as seen withstrace -e trace=sendto
:>>> print(':'.join('%02x' % (i,) for i in ( ... b"\317/\1\0\0\1\0\0\0\0\0\0\tkeyserver\6ubuntu\3com\0\0\1\0\1"))) cf:2f:01:00:00:01:00:00:00:00:00:00:09:6b:65:79:73:65:72:76:65:72:06:75:62:75:6e:74:75:03:63:6f:6d:00:00:01:00:01 >>> print(':'.join('%02x' % (i,) for i in ( ... b"\307\275\1\0\0\1\0\0\0\0\0\0\tkeyserver\6ubuntu\3com\0\0\34\0\1"))) c7:bd:01:00:00:01:00:00:00:00:00:00:09:6b:65:79:73:65:72:76:65:72:06:75:62:75:6e:74:75:03:63:6f:6d:00:00:1c:00:01
(Here the lookup with00:1c
is the theAAAA
query.)
Really unexpected. And another example of old software sinking development time. I hope this writeup saves someone else a little time.
Workarounds?
For a while, I thought the dirmngr code might respect
/etc/gai.conf
and tried to add/enable precedence
::ffff:0:0/96 100
there. But that just proved to be intermittent
(random) success.
A shittier, but working, workaround is hacking the
/etc/hosts
file with this oneliner in the Dockerfile:
{ test "$UBUNTU_RELEASE" = "bionic" && \ getent ahosts keyserver.ubuntu.com | \ sed -e '/STREAM/!d;s/[[:blank:]].*/ keyserver.ubuntu.com/' | \ shuf >> /etc/hosts || true; }
Not pretty, but it works. And it only affects the (soon to be obsolete) bionic build.
An alternative solution is provided at usbarmory-debian-base_image#9:
{ mkdir -p ~/.gnupg && \ echo "disable-ipv6" >> ~/.gnupg/dirmngr.conf && \ apt-key adv --homedir ~/.gnupg --keyserver ...; }
2023-08-31 - mariadb / gdb / debugging shutdown deadlock / part 2
We were looking at a core dump of MariaDB instance that was in a
deadlocked state. It was killed with SIGABRT
. We now like
to get MySQL thread info from it. Python support in gdb is here to help.
To recap: we
were dissecting the core dump with gdb (gdb
/usr/sbin/mariadbd core.208161
), found out that it was waiting
on one of the threads to stop before stopping completely. We now want
to know what exactly it was waiting for.
It's too late to get a SHOW PROCESSLIST
now that the
process is dead. But we might be able to fetch the data from the
server_threads
structure:
(gdb) print ((THD*)server_threads->threads->first)->proc_info $7 = 0x559aca9152dc "Freeing items" (gdb) print ((THD*)server_threads->threads->first->next)->proc_info $8 = 0x559aca922c02 "committing" (gdb) print ((THD*)server_threads->threads->first->next->next)->proc_info $9 = 0x559acaac848a "wsrep applier committed"
Threads
There aren't terribly many threads:
(gdb) info threads Id Target Id Frame * 1 Thread 0x7f70061c3840 (LWP 208161) 0x00007f700639c3db in kill 2 Thread 0x7f635435b700 (LWP 208827) futex_wait_cancelable 3 Thread 0x7f5109852700 (LWP 208868) 0x00007f700643623f in __GI___clock_nanosleep 4 Thread 0x7f4fe83de700 (LWP 279027) futex_abstimed_wait_cancelable 5 Thread 0x7f6ffe06c700 (LWP 208425) futex_abstimed_wait_cancelable 6 Thread 0x7f63543f1700 (LWP 208799) 0x00007f700639d1a2 in __GI___sigtimedwait 7 Thread 0x7f63553fd700 (LWP 208750) futex_wait_cancelable 8 Thread 0x7f6357fff700 (LWP 208749) futex_abstimed_wait_cancelable 9 Thread 0x7f6ffdc17700 (LWP 208426) futex_wait_cancelable 10 Thread 0x7f6ffc492700 (LWP 208430) futex_wait_cancelable 11 Thread 0x7f6ffc09f700 (LWP 208790) futex_wait_cancelable 12 Thread 0x7f63543a6700 (LWP 208826) futex_wait_cancelable 13 Thread 0x7f63540b8700 (LWP 208836) futex_wait_cancelable 14 Thread 0x7f6ffc054700 (LWP 208791) futex_wait_cancelable 15 Thread 0x7f510922b700 (LWP 208889) futex_wait_cancelable 16 Thread 0x7f6340436700 (LWP 208841) futex_wait_cancelable 17 Thread 0x7f4f7d31f700 (LWP 1633036) futex_abstimed_wait_cancelable 18 Thread 0x7f6ffcc93700 (LWP 1642065) futex_abstimed_wait_cancelable
But these threads do not show what it's doing. A thread apply
all bt
will have info, but is very verbose and not necessarily helpful.
Let's first pursue the server_threads
. But not by
manually adding ->next
for each thread. There must be a better way.
Enter Python
gdb comes with Python integration, and it's
straight forward enough to call: python
, then the code,
then end with end
or EOF (press ^D
).
(gdb) python >print(gdb.parse_and_eval('server_threads->threads->first')) >end 0x7f5080046088
Use the following Python script:
thd_type = gdb.lookup_type('THD').pointer() it = gdb.parse_and_eval('server_threads->threads->first') last = gdb.parse_and_eval('server_threads->threads->last') while it != last.address: thdp = it.cast(thd_type) print(it, thdp['proc_info'], thdp['query_string']['string']['str']) it = it['next']
Invoking that looks like this:
(gdb) python >thd_type = gdb.lookup_type('THD').pointer() >it = gdb.parse_and_eval('server_threads->threads->first') >last = gdb.parse_and_eval('server_threads->threads->last') >while it != last.address: > thdp = it.cast(thd_type) > print(it, thdp['m_thread'], thdp['proc_info'], thdp['query_string']['string']['str']) > it = it['next'] >end 0x7f5080046088 0x559aca9152dc "Freeing items" 0x7f5080055c10 "COMMIT" 0x7f4fc4000c58 0x559aca922c02 "committing" 0x0 0x7f5030000c58 0x559acaac848a "wsrep applier committed" 0x0 0x7f635c002228 0x559acaac848a "wsrep applier committed" 0x0 0x7f5098000c58 0x559aca922c02 "committing" 0x0 0x7f50b8000c58 0x559aca922c02 "committing" 0x0 0x7f50bc000c58 0x559aca922c02 "committing" 0x0 0x7f6360000c58 0x559acaacd584 "wsrep aborter idle" 0x0
No mapping to the info threads
yet.
But after some digging in various (sub)structures, I found
thread
in the lock_info
struct:
(gdb) x/((THD*)server_threads->threads->first)->lock_info.thread 0x7f4fe83de700: -398596352
The improved code:
thd_type = gdb.lookup_type('THD').pointer() it = gdb.parse_and_eval('server_threads->threads->first') last = gdb.parse_and_eval('server_threads->threads->last') while it != last.address: thdp = it.cast(thd_type) print( it, 'thread', hex(thdp['lock_info']['thread']), thdp['killed'], thdp['proc_info'], thdp['query_string']['string']['str']) it = it['next']
0x7f5080046088 thread 0x7f4fe83de700 0x559aca9152dc "Freeing items" 0x7f5080055c10 "COMMIT" 0x7f4fc4000c58 thread 0x7f510922b700 0x559aca922c02 "committing" 0x0 0x7f5030000c58 thread 0x7f5109852700 0x559acaac848a "wsrep applier committed" 0x0 0x7f635c002228 thread 0x7f6340436700 0x559acaac848a "wsrep applier committed" 0x0 0x7f5098000c58 thread 0x7f63540b8700 0x559aca922c02 "committing" 0x0 0x7f50b8000c58 thread 0x7f635435b700 0x559aca922c02 "committing" 0x0 0x7f50bc000c58 thread 0x7f63543a6700 0x559aca922c02 "committing" 0x0 0x7f6360000c58 thread 0x7f6ffc492700 0x559acaacd584 "wsrep aborter idle" 0x0
Yes. These thread_id
can be mapped to the list above
(manully annotated and sorted):
0x7f50b8000c58 thread [ 2] 0x559aca922c02 "committing" 0x0 0x7f5030000c58 thread [ 3] 0x559acaac848a "wsrep applier committed" 0x0 0x7f5080046088 thread [ 4] 0x559aca9152dc "Freeing items" 0x7f5080055c10 "COMMIT" 0x7f6360000c58 thread [10] 0x559acaacd584 "wsrep aborter idle" 0x0 0x7f50bc000c58 thread [12] 0x559aca922c02 "committing" 0x0 0x7f5098000c58 thread [13] 0x559aca922c02 "committing" 0x0 0x7f4fc4000c58 thread [15] 0x559aca922c02 "committing" 0x0 0x7f635c002228 thread [16] 0x559acaac848a "wsrep applier committed" 0x0
Deadlock?
Back to the task at hand. Now that we found something akin to
SHOW PROCESSLIST
, it looks like Thread 4 could
use some closer inspection.
(gdb) thread 4 [Switching to thread 4 (Thread 0x7f4fe83de700 (LWP 279027))] #0 futex_abstimed_wait_cancelable (private=<optimized out>, abstime=0x7f4fe83dcd80, clockid=<optimized out>, expected=0, futex_word=0x7f50080b05c8) at ../sysdeps/nptl/futex-internal.h:320 320 ../sysdeps/nptl/futex-internal.h: No such file or directory.
(gdb) bt #0 futex_abstimed_wait_cancelable (private=<optimized out>, abstime=0x7f4fe83dcd80, clockid=<optimized out>, expected=0, futex_word=0x7f50080b05c8) at ../sysdeps/nptl/futex-internal.h:320 #1 __pthread_cond_wait_common (abstime=0x7f4fe83dcd80, clockid=<optimized out>, mutex=0x559acbbc4dd8, cond=0x7f50080b05a0) at pthread_cond_wait.c:520 #2 __pthread_cond_timedwait (cond=cond@entry=0x7f50080b05a0, mutex=mutex@entry=0x559acbbc4dd8, abstime=0x7f4fe83dcd80) at pthread_cond_wait.c:665 #3 0x00007f6ffde2e673 in gu_cond_timedwait_SYS (ts=0x7f4fe83dcd80, mutex=0x559acbbc4dd8, cond=<optimized out>) at ./galerautils/src/gu_threads.h:264 #4 gu::Lock::wait (this=<synthetic pointer>, date=<synthetic pointer>..., cond=...) at ./galerautils/src/gu_lock.hpp:64 #5 galera::Monitor<galera::ReplicatorSMM::ApplyOrder>::wait (wait_until=<synthetic pointer>..., gtid=..., this=0x559acbbc4dd8) at ./galera/src/monitor.hpp:399 #6 galera::ReplicatorSMM::sync_wait (this=0x559acbbc3b30, upto=<optimized out>, tout=<optimized out>, gtid=0x0) at ./galera/src/replicator_smm.cpp:1605 #7 0x00007f6ffde0b514 in galera_sync_wait (wsrep=<optimized out>, upto=<optimized out>, tout=<optimized out>, gtid=<optimized out>) at ./galera/src/wsrep_provider.cpp:503 #8 0x0000559aca90406e in wsrep::wsrep_provider_v26::wait_for_gtid (this=<optimized out>, gtid=..., timeout=<optimized out>) at ./wsrep-lib/src/wsrep_provider_v26.cpp:1056 #9 0x0000559aca6049e6 in Wsrep_server_service::set_position (this=<optimized out>, c=..., gtid=...) at ./wsrep-lib/include/wsrep/server_state.hpp:320 #10 0x0000559aca8f9753 in wsrep::transaction::release_commit_order (this=0x7f508004c5c8, lock=...) at ./wsrep-lib/include/wsrep/provider.hpp:142 #11 0x0000559aca901c6f in wsrep::transaction::after_statement (this=this@entry=0x7f508004c5c8) at ./wsrep-lib/src/transaction.cpp:895 #12 0x0000559aca8e7cb2 in wsrep::client_state::after_statement (this=this@entry=0x7f508004c560) at ./wsrep-lib/src/client_state.cpp:287 #13 0x0000559aca0d9a08 in wsrep_after_statement (thd=0x7f5080046088) at ./sql/sql_class.h:5454 #14 wsrep_after_statement (thd=0x7f5080046088) at ./sql/wsrep_trans_observer.h:443 #15 wsrep_mysql_parse (thd=0x7f5080046088, rawbuf=0x7f5080055c10 "COMMIT", length=6, parser_state=0x7f4fe83dd3e0) at ./sql/sql_parse.cc:7878 #16 0x0000559aca0e7fee in dispatch_command (command=COM_QUERY, thd=0x7f5080046088, packet=<optimized out>, packet_length=<optimized out>, blocking=<optimized out>) at ./sql/sql_class.h:1388 #17 0x0000559aca0e899e in do_command (thd=0x7f5080046088, blocking=blocking@entry=true) at ./sql/sql_parse.cc:1409 #18 0x0000559aca1fd617 in do_handle_one_connection (connect=<optimized out>, put_in_cache=true) at ./sql/sql_connect.cc:1416 ...
Unfortunately Thread 4 is not waiting for a lock that we can trace somewhere:
(gdb) ptype pthread_mutex_t type = union { struct __pthread_mutex_s __data; char __size[40]; long __align; }
(gdb) print *(pthread_mutex_t*)0x559acbbc4dd8 $10 = pthread_mutex_t = {Type = Normal, Status = Not acquired, Robust = No, Shared = No, Protocol = None}
(gdb) print *(__pthread_mutex_s*)0x559acbbc4dd8 $11 = {__lock = 0, __count = 0, __owner = 0, __nusers = 1, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}}
No owner
. And the pthread_cont_t
argument doesn't help us either. Apparently this thread is waiting for someone to notify cond=0x7f50080b05a0
.
A thread apply all bt full
does not reveal
0x7f50080b05a0
being in stack scope in the vicinity. It
appears that it is waiting in this bit:
wsrep_status_t galera::ReplicatorSMM::sync_wait(wsrep_gtid_t* upto, int tout, wsrep_gtid_t* gtid) { gu::GTID wait_gtid; // ... // Note: Since wsrep API 26 application may request release of // commit monitor before the commit actually happens (commit // may have been ordered/queued on application side for later // processing). Therefore we now rely on apply_monitor on sync // wait. This is sufficient since apply_monitor is always released // only after the whole transaction is over. apply_monitor_.wait(wait_gtid, wait_until); // <-- HERE
void wait(gu::GTID& gtid, const gu::datetime::Date& wait_until) { gu::Lock lock(mutex_); if (gtid.uuid() != uuid_) { throw gu::NotFound(); } while (last_left_ < gtid.seqno()) { size_t idx(indexof(gtid.seqno())); lock.wait(process_[idx].wait_cond_, wait_until); } }
Debug info here is scarce:
(gdb) up 5 #5 galera::Monitor<galera::ReplicatorSMM::ApplyOrder>::wait (wait_until=<synthetic pointer>..., gtid=..., this=0x559acbbc4dd8) at ./galera/src/monitor.hpp:399
(gdb) info args wait_until = <synthetic pointer>: <optimized out> gtid = @0x7f4fe83dcd90: {static SEQNO_UNDEFINED = -1, uuid_ = {<gu::UUID_base> = {uuid_ = {data = "s|\344\\\352\370\021\354\217G\a\177S0\226r", alignment = 17010650953019587699}}, <No data fields>}, seqno_ = 223595135} this = 0x559acbbc4dd8
(gdb) info locals idx = <optimized out> cond = <optimized out> lock = {_vptr.Lock = <optimized out>, mtx_ = @0x559acbbc4dd8} lock = <optimized out> idx = <optimized out> cond = <optimized out>
idx(indexof(gtid.seqno()))
can be rewritten as
idx = (gtid.seqno_ & 65535)
, so the lock/wait is happening
on the process_
with index 51839:
(gdb) print this->process_[51839] $12 = {obj_ = 0x7f6354359f70, cond_ = 0x0, wait_cond_ = std::shared_ptr<gu::Cond> (use count 2, weak count 0) = {get() = 0x7f50080b05a0}, state_ = galera::Monitor<galera::ReplicatorSMM::ApplyOrder>::Process::S_APPLYING}
Scanning over all 65536 process_
entries — most are empty, like $13
— yields:
(gdb) print this->process_[51836] $13 = {obj_ = 0x0, cond_ = 0x0, wait_cond_ = std::shared_ptr<gu::Cond> (empty) = {get() = 0x0}, state_ = galera::Monitor<galera::ReplicatorSMM::ApplyOrder>::Process::S_IDLE} (gdb) print this->process_[51837] $14 = {obj_ = 0x7f5109229f70, cond_ = 0x0, wait_cond_ = std::shared_ptr<gu::Cond> (empty) = {get() = 0x0}, state_ = galera::Monitor<galera::ReplicatorSMM::ApplyOrder>::Process::S_APPLYING} (gdb) print this->process_[51838] $15 = {obj_ = 0x7f63540b6f70, cond_ = 0x0, wait_cond_ = std::shared_ptr<gu::Cond> (empty) = {get() = 0x0}, state_ = galera::Monitor<galera::ReplicatorSMM::ApplyOrder>::Process::S_APPLYING} (gdb) print this->process_[51839] $16 = {obj_ = 0x7f6354359f70, cond_ = 0x0, wait_cond_ = std::shared_ptr<gu::Cond> (use count 2, weak count 0) = {get() = 0x7f50080b05a0}, state_ = galera::Monitor<galera::ReplicatorSMM::ApplyOrder>::Process::S_APPLYING} (gdb) print this->process_[51840] $17 = {obj_ = 0x7f4fe83dce90, cond_ = 0x0, wait_cond_ = std::shared_ptr<gu::Cond> (empty) = {get() = 0x0}, state_ = galera::Monitor<galera::ReplicatorSMM::ApplyOrder>::Process::S_APPLYING} (gdb) print this->process_[51841] $18 = {obj_ = 0x7f63543a4f70, cond_ = 0x0, wait_cond_ = std::shared_ptr<gu::Cond> (empty) = {get() = 0x0}, state_ = galera::Monitor<galera::ReplicatorSMM::ApplyOrder>::Process::S_APPLYING} (gdb) print this->process_[51842] $19 = {obj_ = 0x0, cond_ = 0x0, wait_cond_ = std::shared_ptr<gu::Cond> (empty) = {get() = 0x0}, state_ = galera::Monitor<galera::ReplicatorSMM::ApplyOrder>::Process::S_FINISHED} (gdb) print this->process_[51843] $20 = {obj_ = 0x0, cond_ = 0x0, wait_cond_ = std::shared_ptr<gu::Cond> (empty) = {get() = 0x0}, state_ = galera::Monitor<galera::ReplicatorSMM::ApplyOrder>::Process::S_FINISHED} (gdb) print this->process_[51844] $21 = {obj_ = 0x0, cond_ = 0x0, wait_cond_ = std::shared_ptr<gu::Cond> (use count 1, weak count 0) = {get() = 0x7f4fb008f520}, state_ = galera::Monitor<galera::ReplicatorSMM::ApplyOrder>::Process::S_FINISHED} (gdb) print this->process_[51845] $22 = {obj_ = 0x0, cond_ = 0x0, wait_cond_ = std::shared_ptr<gu::Cond> (empty) = {get() = 0x0}, state_ = galera::Monitor<galera::ReplicatorSMM::ApplyOrder>::Process::S_IDLE}
That 0x7f50080b05a0
is indeed the pthread_cont_t
that we're waiting for.
Conclusion
At this point I'm stuck. That last piece of info only confirms what we already know. No obvious reason stands out why it is waiting for something that is not happening.
Searching through the MariaDB JIRA bug tracker did not yield any candidate issues. Luckily this deadlock happened two months ago, and we haven't seen it since. But if we do see it again, we'll at least have this post as a starting point.
2023-08-30 - mariadb / gdb / debugging shutdown deadlock / part 1
I was asked to look into a MariaDB deadlock during shutdown. We had a core dump (luckily). Now it's time to dissect it.
Ensure you can get core dumps
For starters, you want a nice core dump whenever you hit an issue. We do. Here are some tips to ensure you do too.
You need to have a couple of parameters set correctly: the equivalent of the
systemd LimitCORE=infinity
and the assurance that the
working directory is writable (maybe
WorkingDirectory=/var/lib/mysql
). And a non-empty
kernel.core_pattern
(e.g. core.%p
).
If MariaDB crashes, you get a core dump immediately. If it
deadlocks, you can kill it with SIGABRT
to create one.
(Once you get a dump, it's nice to salvage it by moving it to less
volatile storage. This is especially important in a containerized environment
where the /var/lib/mysql
can get flushed by a container
restart.)
Getting the symbols
Apart from gdb, the debugger, you'll also need the debug symbols. For a
Debian/Ubuntu install, you'll want to add these to your
sources.list
:
deb http://ddebs.ubuntu.com jammy main restricted universe multiverse deb http://ddebs.ubuntu.com jammy-updates main restricted universe multiverse deb http://ddebs.ubuntu.com jammy-proposed main restricted universe multiverse
Or, if you're running packages from MariaDB directly:
deb [arch=amd64,arm64,ppc64el] http://archive.mariadb.org/mariadb-10.6.13/repo/ubuntu/ focal main deb [arch=amd64,arm64,ppc64el] http://archive.mariadb.org/mariadb-10.6.13/repo/ubuntu/ focal main/debug
The main/debug
component holds the dbgsym packages.
This is also the part where you ensure that you have the same
MariaDB version (and distro) as where the crash
occurred. If you have different versions, the core dump will
make no sense at all.
Fetch at least these:
mariadb-server-10.6-dbgsym mariadb-server-core-10.6-dbgsym
And confirm that they belong to the same binary version. It's also
nice to have libc6-dbg
.
Firing up gdb
We have gdb. We have symbols. We have a core dump. Fire it up:
$ gdb /usr/sbin/mariadbd core.208161 ... Core was generated by `/usr/sbin/mysqld --defaults-file=/etc/mysql/my.cnf --wsrep_start_position=73'. Program terminated with signal SIGSEGV, Segmentation fault. #0 0x00007f700639c3db in kill () at ../sysdeps/unix/syscall-template.S:78 78 ../sysdeps/unix/syscall-template.S: No such file or directory. [Current thread is 1 (Thread 0x7f70061c3840 (LWP 208161))] (gdb)
At this point, we can do the usual stuff: bt
, bt
full
, info threads
, thread apply all bt
full
, info regs
and so on.
In this case, we were stuck in shutdown. This is confirmed by the first thread we look at:
(gdb) bt #0 0x00007f700639c3db in kill () at ../sysdeps/unix/syscall-template.S:78 #1 0x0000559aca322bae in handle_fatal_signal (sig=11) at ./sql/signal_handler.cc:372 #2 <signal handler called> #3 0x00007f700643623f in __GI___clock_nanosleep () at ../sysdeps/unix/sysv/linux/clock_nanosleep.c:78 #4 0x00007f700643bec7 in __GI___nanosleep () at nanosleep.c:27 #5 0x00007f700643bdfe in __sleep (seconds=0, seconds@entry=1) at ../sysdeps/posix/sleep.c:55 #6 0x0000559aca5ea13a in wsrep_close_client_connections (wait_to_end=<optimized out>, except_caller_thd=0x0) at ./sql/wsrep_mysqld.cc:3253 #7 0x0000559aca5ea83a in wsrep_shutdown_replication () at ./sql/wsrep_mysqld.cc:1150 #8 0x0000559aca00f7dd in mysqld_main (argc=<optimized out>, argv=<optimized out>) at ./sql/mysqld.cc:5899 #9 0x00007f700637d083 in __libc_start_main (main=0x559ac9fbecb0 <main(int, char**)>, argc=3, argv=0x7ffdd3be23c8, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7ffdd3be23b8) at ../csu/libc-start.c:308 #10 0x0000559aca00300e in _start () at ./include/mysql/plugin.h:215
The code appeared to be busy in a sleep(0)
.
It's best to have the MariaDB 10.6.13 source code checked out somewhere.
Git clone the entire repository, check out the correct tag, and get those git submodules.
There you can now find ./sql/wsrep_mysqld.cc:3253
, which lists:
void wsrep_close_client_connections(my_bool wait_to_end, THD* except_caller_thd) { // ... while (wait_to_end && server_threads.iterate(have_client_connections)) { sleep(1); DBUG_PRINT("quit",("One thread died (count=%u)", THD_count::value())); }
Yes, it is in fact sleep(1)
not 0
—
and now that you mention it, it did say seconds@entry=1
:
(gdb) up 6 #6 0x0000559aca5ea13a in wsrep_close_client_connections () at ./sql/wsrep_mysqld.cc:3253 3253 ./sql/wsrep_mysqld.cc: No such file or directory. (gdb) disass Dump of assembler code for function wsrep_close_client_connections(char, THD*): Address range 0x559aca5e9dc0 to 0x559aca5ea3fb: ... 0x0000559aca5ea12b <+875>: callq 0x559ac9f54150 <pthread_rwlock_unlock@plt> 0x0000559aca5ea130 <+880>: mov $0x1,%edi 0x0000559aca5ea135 <+885>: callq 0x559ac9f54080 <sleep@plt> => 0x0000559aca5ea13a <+890>: jmpq 0x559aca5ea080 <wsrep_close_client_connections(char, THD*)+704> 0x0000559aca5ea13f <+895>: nop 0x0000559aca5ea140 <+896>: cmpq $0x0,(%r14) 0x0000559aca5ea144 <+900>: jne 0x559aca5ea240 <wsrep_close_client_connections(char, THD*)+1152>
(That mov $0x1,%edi
puts 1
into the first
argument — RDI, RSI, RDX, RCX, R8, R9 — and then sleep
is
called. See the System
V amd64 calling convention.)
Inspecting server_threads.iterate
So, wsrep_close_client_connections
is looping until
server_threads.iterate(have_client_connections)
returns
zero.
static my_bool have_client_connections(THD *thd, void*) { DBUG_PRINT("quit",("Informing thread %lld that it's time to die", (longlong) thd->thread_id)); if (is_client_connection(thd) && (thd->killed == KILL_CONNECTION || thd->killed == KILL_CONNECTION_HARD)) { (void)abort_replicated(thd); return 1; } return 0; }
Assuming that server_threads.iterate
iterates over the
server threads, we can conclude that one or more threads are (a) a
client connection and (b) in some kind of "KILL" state and
(c) receiving an abort_replicated(thd)
call.
(gdb) ptype server_threads type = class THD_list : public THD_list_iterator { public: void init(void); void destroy(void); void insert(THD *); void erase(THD *); }
server_threads
is a global. It inherits from
the THD_list_iterator
. If we look there, we see this:
/** THD registry */ class THD_list_iterator { // ... template <typename T> int iterate(my_bool (*action)(THD *thd, T *arg), T *arg= 0) { int res= 0; mysql_rwlock_rdlock(&lock); I_List_iterator<THD> it(threads); while (auto tmp= it++) if ((res= action(tmp, arg))) break; mysql_rwlock_unlock(&lock); return res; }
So, server_threads.iterate
iterates over all threads
and stops as soon as it finds one with the previously mentioned
characteristics, returning the (boolean) value of the callback.
Figuring out which threads were doing what
We want to know what those unstopped threads were doing. If we could
get a SHOW PROCESSLIST
from the dead process, that'd be
perfect. Maybe we can iterate over the server_threads
and
get the info.
(gdb) print server_threads->threads $1 = { <base_ilist> = { first = 0x7f5080046088, last = {_vptr.ilink = 0x559acaf1b670 <vtable for ilink+16>, prev = 0x7f6360000c68, next = 0x0}}, <No data fields> }
(gdb) print server_threads.threads.first $2 = (ilink *) 0x7f5080046088
(gdb) print *server_threads.threads.first $3 = {_vptr.ilink = 0x559acaf21550 <vtable for THD+16>, prev = 0x559acb121b00 <server_threads>, next = 0x7f4fc4000c58}
This is a linked list structure, wrapped in a bit of C++ sauce.
We can in fact get the elements out of it fairly easily:
(gdb) print *(THD*)server_threads.threads.first $5 = { <THD_count> = { static count = { m_counter = { ...
(gdb) print ((THD*)server_threads.threads.first)->proc_info $5 = 0x559aca9152dc "Freeing items"
And we can continue with the next items:
(gdb) print ((THD*)server_threads.threads.first->next)->proc_info $6 = 0x559aca922c02 "committing"
That's nice. But tedious. We don't want to do this for every thread — there could be many.
Luckily gdb has builtin Python support. And that's what we'll be looking at in the next post...
2023-08-24 - mch2022 badge / arduino ide / part 3 / leds and more
The colored LCD display on my living room HUD has reached its end-of-life. Time to replace it with an MCH2022 badge?
In the first two parts, we looked at interfacing with the serial output, reading the input button state and using the TFT display on the MCH2022 badge. In this post, we'll look at interfacing with the RP2040 where other components are hooked up to.
Interfacing with the RP2040
As seen with the buttons already, not everything on the badge is controlled by the ESP32 directly. The display is connected to an SPI bus. A Raspberry PI2040 (RP2040) is connected to an I2C bus. On that RP2040 various functions are implemented.
We we look at esp32-component-mch2022-rp2040/include/rp2040.h
(or the implementation at mch2022-firmware-rp2040/i2c_peripheral.h)
we can see that there are lots of other registers we can read from and write to.
For instance, there is RP2040_REG_LCD_BACKLIGHT
(the
0x04
element in the enum).
Step one is configuring the I2C bus:
#include <Wire.h> #define GPIO_I2C_SDA 22 #define GPIO_I2C_SCL 21 Wire.begin(GPIO_I2C_SDA, GPIO_I2C_SCL);
And then, we can...
Set LCD backlight brightness
We can set the LCD backlight brightness:
#define RP2040_REG_LCD_BACKLIGHT 0x04 /** * Backlight brightness can be set from 0 (off) to 255. */ void set_backlight(uint8_t brightness) { Wire.beginTransmission(RP2040_ADDR); Wire.write(RP2040_REG_LCD_BACKLIGHT); Wire.write(brightness); Wire.endTransmission(); }
Read button presses
Check the button state — we did this already:
#define RP2040_REG_INPUT1 0x06 uint32_t read_inputs() { Wire.beginTransmission(RP2040_ADDR); Wire.write(RP2040_REG_INPUT1); Wire.endTransmission(); Wire.requestFrom(RP2040_ADDR, 4); uint8_t values[4]; values[0] = Wire.read(); // RP2040_REG_INPUT1 / I2C_REGISTER_INPUT1 values[1] = Wire.read(); // RP2040_REG_INPUT2 / I2C_REGISTER_INPUT2 values[2] = Wire.read(); // RP2040_REG_INTERRUPT1 / I2C_REGISTER_INTERRUPT1 values[3] = Wire.read(); // RP2040_REG_INTERRUPT2 / I2C_REGISTER_INTERRUPT2 // ... }
Send infrared (transmitter on the back)
On the back of the badge there is an IR-transmitter. We can have the RP2040 send remote control signals and use the badge as a remote for the TV.
#define RP2040_REG_IR_ADDRESS_LO 0x60 /** * Supports extended NEC (NECext/NECX) remote control codes. See: * https://en.wikipedia.org/wiki/Consumer_IR * https://github.com/flipperdevices/flipperzero-firmware/blob/ * 52b59662627254ae5ccde6201b5c4921573fc769/documentation/file_formats/InfraredFileFormats.md */ void send_ir_nec(uint16_t address, uint16_t command) { Wire.beginTransmission(RP2040_ADDR); Wire.write(RP2040_REG_IR_ADDRESS_LO); Wire.write(address & 0xFF); Wire.write(address >> 8); Wire.write(command); Wire.write(0x01); /* trigger */ Wire.endTransmission(); }
We can map the joystick keys to useful functions:
void loop() { if (!digitalRead(GPIO_INT_RP2040)) { uint32_t input = read_inputs(); if (BUTTON_RELEASED(input, BTN_HOME)) { return_to_launcher(); /** * See these LG codes: * https://github.com/Lucaslhm/Flipper-IRDB/blob/48f26b13715547d579a4f17a3fc0983a11d36776/TVs/LG/LG.ir * > name: Power * > type: parsed * > protocol: NEC * > address: 04 00 00 00 * > command: 08 00 00 00 */ } else if (BUTTON_PRESSED(input, BTN_START)) { send_ir_nec(0x04, 0x08); // LG and others: power (on/off) } else if (BUTTON_PRESSED(input, BTN_LEFT)) { send_ir_nec(0x04, 0x01); // LG and others: chan-down } else if (BUTTON_PRESSED(input, BTN_RIGHT)) { send_ir_nec(0x04, 0x00); // LG and others: chan-up } else if (BUTTON_PRESSED(input, BTN_UP)) { send_ir_nec(0x04, 0x02); // LG and others: vol-up } else if (BUTTON_PRESSED(input, BTN_DOWN)) { send_ir_nec(0x04, 0x03); // LG and others: vol-down } else if (BUTTON_PRESSED(input, BTN_JOY)) { send_ir_nec(0x04, 0x09); // LG and others: mute (on/off) } else { Serial.printf("input: 0x%08x // 0x<button><state>\r\n", input); } } // ... }
Unfortunately it is currently limited to sending (extended) NEC codes. If we want to send different protocols, like the Philips RC-5 protocol, we'll need to add such functionality to the RP2040 firmware.
Get battery state
#define RP2040_REG_ADC_VALUE_VBAT_LO 0x0d #define RP2040_REG_CHARGING_STATE 0x15 /** * Get charging state. */ void get_battery_state(bool& charging, float& voltage) { const float conversion_factor = 3.3f / (1 << 12); /* 12-bit ADC with 3.3v vref */ uint16_t raw_voltage; Wire.beginTransmission(RP2040_ADDR); Wire.write(RP2040_REG_ADC_VALUE_VBAT_LO); Wire.endTransmission(); Wire.requestFrom(RP2040_ADDR, 2); raw_voltage = (Wire.read() | (Wire.read() << 8)); voltage = raw_voltage * conversion_factor * 2; /* Connected through 100k/100k divider */ Wire.beginTransmission(RP2040_ADDR); Wire.write(RP2040_REG_CHARGING_STATE); Wire.endTransmission(); Wire.requestFrom(RP2040_ADDR, 2); charging = (bool)Wire.read(); raw_voltage = (Wire.read() | (Wire.read() << 8)); }
Showing this on the screen:
void update_display() { // ... canvas->setCursor(0, 0); canvas->setTextColor(ILI9341_WHITE); canvas->setTextSize(1); canvas->print(voltage); canvas->print(" (V)"); if (charging) { canvas->print(" charging"); } // ... }
Control the kite LEDs
There are 5 LEDs in a kite layout on the top left of the
badge. The Badge.team documentation
luckily already mentioned that it has to be activated after
initializing the display — hence
init_leds_after_display()
. But we'll need to actually get
them to update their state. This is done by calling
FastLED.show()
.
Here are the parts you need:
#include <FastLED.h> /* library to interface with SK9822 (and other) leds */ #define GPIO_LED_DATA 5 #define GPIO_SD_PWR 19 #define GPIO_LED_ENABLE GPIO_SD_PWR /* enables power to both SD and LEDs */ #define NUM_LEDS 5 /* LED configuration of the kite on the top left: * _2_ _0_ * / | \ / | \ * 0--1--3 == reordered => 1--2--3 * \ | / \ | / * \|/ \|/ * 4 4 */ CHSV led_color[NUM_LEDS]; CRGB leds[NUM_LEDS]; const int led_reorder[NUM_LEDS] = {2, 0, 1, 3, 4}; /* left-to-right */
void setup() { // ... init_leds_after_display(); /* Initialize the 5 LED colors at a slight offset, so we get a wave. */ for (int i = 0; i < NUM_LEDS; ++i) { led_color[led_reorder[NUM_LEDS - i - 1]].hue = ((i * 8) % 256); led_color[i].saturation = 255; led_color[i].value = 255; } } void loop() { // ... change_leds(); }
void init_leds_after_display() { FastLED.addLeds<SK6812, GPIO_LED_DATA, GRB>(leds, NUM_LEDS).setCorrection(TypicalLEDStrip); FastLED.setBrightness(96); /* This has to be placed after SPI (LCD) has been initialized. The * Arduino library wants to use this pin as SPI MISO... */ pinMode(GPIO_LED_ENABLE, OUTPUT); digitalWrite(GPIO_LED_ENABLE, HIGH); }
/** * Update the kite LEDs. Loop over the colors of the rainbow. */ void change_leds() { /* This would set the first LED to white. */ //leds[0] = CRGB::White; /* This merely sets all of the LEDs local storage (leds[]) to the new value. */ for (int i = 0; i < NUM_LEDS; ++i) { led_color[led_reorder[i]].hue = ((led_color[led_reorder[i]].hue + 1) % 256); hsv2rgb_rainbow(led_color[i], leds[i]); } /* This uses the leds[] values and notifies the SK6812 of the new colors. */ FastLED.show(); }
Compiling with FastLED.h
will show a warning:
No hardware SPI pins defined. All SPI access will default to bitbanged output
That is a fact, and it is no problem. It just means that changing LED colors will take more than a few microseconds (but not milliseconds). Luckily there are only a few of them.
To the right of the display, there is actually also a sixth
LED. It's a 5050 RGB
. It's not controlled by the
ESP32 or the RP2040, but it's connected to the
ICE40 FPGA through separate R, G and B pins.
I don't have the know-how to explain what an FPGA even is, nor do I know how to interface with it yet. Getting that sixth LED to work will be for a future post, if ever.
Putting it all together
Here's a downloadable version of the hello-display-plus.ino sketch that contains all of the above code. If you've read parts 1 and 2 of this blog series, you should now know how to get it to work.
2023-08-23 - mch2022 badge / arduino ide / part 2 / display
The colored LCD display on my living room HUD has reached its end-of-life. Time to replace it with an MCH2022 badge?
In the previous blog post, we explored how to upload and start a simple "hello world" sketch programmed in the Arduino IDE to the MCH2022 badge. In this one, we look at writing to the 320x240 display.
As mentioned in the previous post, the documentation for building
MCH2022 badge apps with the Arduino IDE is incomplete.
Notably the mentioned LCD_FREQ
is undefined, and more
importantly the steps to select video output through
GPIO_LCD_MODE
is missing.
Here are the changes in relation to the hello-serial
sketch from the previous post. I also added a screen buffer (canvas),
which you'll likely want to employ.
--- hello-serial/hello-serial.ino +++ hello-display/hello-display.ino @@ -6,6 +6,9 @@ /* Needed for *_inputs() */ #include <Wire.h> +/* Needed for *_display() */ +#include <Adafruit_ILI9341.h> /* Adafruit ILI9341 library */ + /* Workaround to skip esp_ota_get_running_partition() */ extern "C" bool verifyRollbackLater() { return true; @@ -18,9 +21,14 @@ extern "C" bool verifyRollbackLater() { #define GPIO_I2C_SCL 21 #define GPIO_INT_RP2040 34 /* active LOW */ +#define GPIO_SPI_CS_LCD 32 +#define GPIO_SPI_DC_LCD 33 +#define GPIO_LCD_RESET 25 +#define GPIO_LCD_MODE 26 + /* I2C addresses */ #define RP2040_ADDR 0x17 /* RP2040 co-processor */ /* See RP register addresses here: * https://github.com/badgeteam/esp32-component-mch2022-rp2040/ * blob/45fce00c790a2918bf506dd8af0c0b68db05d031/include/rp2040.h @@ -53,13 +61,19 @@ extern "C" bool verifyRollbackLater() { void init_inputs(); uint32_t read_inputs(); +void init_display(); +void update_display(); void return_to_launcher(); +Adafruit_ILI9341 tft = Adafruit_ILI9341(GPIO_SPI_CS_LCD, GPIO_SPI_DC_LCD, GPIO_LCD_RESET); +GFXcanvas16* canvas; /* uses malloc(); cannot be initialized statically */ + void setup() { Serial.begin(115200); Serial.println("hello MCH2022"); init_inputs(); + init_display(); } void loop() { @@ -71,6 +85,10 @@ void loop() { Serial.printf("input: 0x%08x // 0x<button><state>\r\n", input); } } + + //unsigned long t0 = millis(); + update_display(); // takes about 60ms (= 17Hz) + //Serial.printf("updating display took %lu ms\r\n", millis() - t0); } void init_inputs() {
Aside from that diff
, the definitions of
init_display
and update_display
:
void init_display() { pinMode(GPIO_LCD_MODE, OUTPUT); digitalWrite(GPIO_LCD_MODE, LOW); /* set "ESP32" output instead of "FPGA" */ tft.begin(); tft.fillScreen(ILI9341_BLACK); tft.setRotation(1); /* Using a buffer is slightly slower than writing directly. But it's * significantly easier to avoid flicker. */ canvas = new GFXcanvas16(ILI9341_TFTHEIGHT, ILI9341_TFTWIDTH); /* 16-bit, 320x240 */ canvas->setRotation(0); } void update_display() { static int x = 0; static int y = 0; canvas->fillScreen(ILI9341_PURPLE); canvas->setCursor(x, y); canvas->setTextColor(ILI9341_YELLOW); canvas->setTextSize(3); canvas->print("MCH2022"); /* Move the text for the next round */ x += 16; if (x >= ILI9341_TFTHEIGHT) { y += 24; x = 0; if (y >= ILI9341_TFTWIDTH) { y = 0; } } tft.drawRGBBitmap(0, 0, canvas->getBuffer(), canvas->width(), canvas->height()); }
For a long while, I only got a white-ish blank screen.
Getting that pinMode(GPIO_LCD_MODE, OUTPUT);
and
digitalWrite(GPIO_LCD_MODE, LOW);
in there was critical
to get the display to behave.
A nice Makefile helps too, for quick building/uploading:
PROJECT := $(notdir $(CURDIR)) FQBN := esp32.esp32.esp32wrover BINARY := build/$(FQBN)/$(PROJECT).ino.bin .PHONY: upload upload: build ~/Arduino/projmch2022/mch2022-tools/app_push.py \ --run $(BINARY) 'my_cool_app' main.bin 1 .PHONY: build build: $(BINARY) $(BINARY): $(PROJECT).ino arduino-cli compile -eb esp32:esp32:esp32wrover $(CURDIR)
(Don't forget to convert 8 spaces to tabs in any Makefile.)
2023-08-22 - mch2022 badge / arduino ide / part 1 / serial
The colored LCD display on my living room HUD has reached its end-of-life. Time to replace it with an MCH2022 badge?
The HUD I have in my living room is a breadboard with an ESP8266 development board, some sensors and a Grove 16x2 LCD I2C display. The background color of the display nicely visualizes current power production/usage, but the bottom row of text has now become defunct.
I could attempt to find a new LCD component at the closest electronics shop. But I also happen to have a bunch of MCH2022 badges laying around, after the company paid trip there last year. Maybe I could use one of those instead. It has a 320x240 full color display and various free pins. And an on-board ESP32. The existing ESP8266 code should run more than fine on that.
The Badge development documentation lists Arduino as a possible route. This would allow me to reuse most of my existing HUD code. Unfortunately the page says: “PLEASE BE AWARE THAT THE ARDUINO SDK IS NOT FULLY SUPPORTED!! YOU MAY RUN INTO SOME ISSUES”
They were not wrong...
Steps
The steps listed are:
- Install the Arduino IDE if you haven’t already — do use
version 2.x and not 1.x (because it's old,
unless you want to do
a lot of serial monitoring, in which case you probably want
sermon
anyway (see below). - Install ESP32 support using these instructions — from "esp32" select the "ESP32 Wrover Module" board.
- Download mch2022-tools
— you'll be needing
app_push.py
and its prerequisites: the files next to it, andpython3-usb
.
You can write an Arduino sketch, and then instead of
Upload you instead do an Export Compiled Binary
(ctrl+alt+S). You upload the .bin
to the MCH2022
appfs like this:
$ .../app_push.py --run <binfile> <name> <title> <version>
Hello world
We start off with this hello-serial
project:
void setup() { Serial.begin(115200); Serial.println("hello MCH2022"); } void loop() { Serial.println(millis()); delay(1000); }
Building and exporting with ctrl+alt+S gives us this output:
... Compiling sketch... ~/.arduino15/packages/esp32/tools/xtensa-esp32-elf-gcc/esp-2021r2-patch5-8.4.0/bin/xtensa-esp32-elf-g++ ... -o /tmp/arduino/sketches/2EA640295D81B35604C751A774B1A1FE/sketch/hello-serial.ino.cpp.o Compiling libraries... Compiling core... bash -c "echo -DARDUINO_CORE_BUILD >; /tmp/arduino/sketches/2EA640295D81B35604C751A774B1A1FE/file_opts" Using precompiled core: /tmp/arduino/cores/cb14c27b7a099bfbee683e2ff88af527/core.a bash -c ": > /tmp/arduino/sketches/2EA640295D81B35604C751A774B1A1FE/file_opts" Linking everything together... ~/.arduino15/packages/esp32/tools/xtensa-esp32-elf-gcc/esp-2021r2-patch5-8.4.0/bin/xtensa-esp32-elf-g++ -Wl,--Map=/tmp/arduino/sketches/... python3 ~/.arduino15/packages/esp32/tools/esptool_py/4.5.1/esptool.py --chip esp32 elf2image --flash_mode dio --flash_freq 80m --flash_size 4MB --elf-sha256-offset 0xb0 -o /tmp/arduino/sketches/2EA640295D81B35604C751A774B1A1FE/hello-serial.ino.bin /tmp/arduino/sketches/2EA640295D81B35604C751A774B1A1FE/hello-serial.ino.elf esptool.py v4.5.1 Creating esp32 image... Merged 2 ELF sections Successfully created esp32 image. python3 ~/.arduino15/packages/esp32/hardware/esp32/2.0.11/tools/gen_esp32part.py -q /tmp/arduino/sketches/2EA640295D81B35604C751A774B1A1FE/partitions.csv /tmp/arduino/sketches/2EA640295D81B35604C751A774B1A1FE/hello-serial.ino.partitions.bin bash -c "[ ! -d \"/tmp/arduino/sketches/2EA640295D81B35604C751A774B1A1FE\"/libraries/Insights ] || python3 \"~/.arduino15/packages/esp32/hardware/esp32/2.0.11\"/tools/gen_insights_package.py /tmp/arduino/sketches/2EA640295D81B35604C751A774B1A1FE hello-serial.ino \"~/Arduino/projmch2022/hello-serial\"" bash -c "[ 0 -eq 0 ] || cp -f \"~/.arduino15/packages/esp32/tools/openocd-esp32/v0.11.0-esp32-20221026/share/openocd/scripts/\"board/esp32-wrover-kit-3.3v.cfg \"~/Arduino/projmch2022/hello-serial\"/debug.cfg" bash -c "[ 0 -eq 0 ] || cp -f \"~/.arduino15/packages/esp32/hardware/esp32/2.0.11\"/tools/ide-debug/esp32.json \"~/Arduino/projmch2022/hello-serial\"/debug_custom.json" bash -c "[ 0 -eq 0 ] || cp -f \"~/.arduino15/packages/esp32/hardware/esp32/2.0.11\"/tools/ide-debug/svd/esp32.svd \"~/Arduino/projmch2022/hello-serial\"/debug.svd" ~/.arduino15/packages/esp32/tools/xtensa-esp32-elf-gcc/esp-2021r2-patch5-8.4.0/bin/xtensa-esp32-elf-size -A /tmp/arduino/sketches/2EA640295D81B35604C751A774B1A1FE/hello-serial.ino.elf Sketch uses 281621 bytes (21%) of program storage space. Maximum is 1310720 bytes. Global variables use 21480 bytes (6%) of dynamic memory, leaving 306200 bytes for local variables. Maximum is 327680 bytes.
We now have these files:
$ ls -og hello-serial/build/esp32.esp32.esp32wrover/ total 4589 -rw-rw-r-- 1 281984 aug 21 23:44 hello-serial.ino.bin -rw-rw-r-- 1 18992 aug 21 23:44 hello-serial.ino.bootloader.bin -rwxrwxr-x 1 6050184 aug 21 23:44 hello-serial.ino.elf -rw-rw-r-- 1 9089963 aug 21 23:44 hello-serial.ino.map -rw-rw-r-- 1 3072 aug 21 23:44 hello-serial.ino.partitions.bin
Uploading:
$ .../app_push.py --run \ hello-serial/build/esp32.esp32.esp32wrover/hello-serial.ino.bin \ my_cool_app main.bin 1
The results:
Failed to start app. Check console for more details. App: my_cool_app
So, the app does get uploaded. But it fails to start for some reason or the other.
A crashing app
Luckily the badge is robust: it handles that the app fails,
reports the failure and gets back into a working start by pressing A.
Our "my_cool_app" app can be found under "Apps" where we can retry starting it, but
not before hooking up a serial console (see sermon
below):
$ sermon -t /dev/ttyACM0 115200 sermon [-t] DEV BAUD; now listening on /dev/ttyACM0 with 115200 baud 00:08:08.085638: E (127) esp_core_dump_flash: No core dump partition found! 00:08:08.219348: E (128) esp_core_dump_flash: No core dump partition found! 00:08:08.224508: 00:08:08.224691: abort() was called at PC 0x400d6309 on core 0 00:08:08.228778: 00:08:08.228933: 00:08:08.229113: Backtrace: 0x40083495:0x3ffbc160 0x400893d5:0x3ffbc180 0x4008e1b1:0x3ffbc1a0 0x400d6309:0x3ffbc220 0x400d2088:0x3ffbc240 0x400d2d1e:0x3ffbc270 0x400f2fd4:0x3ffbc2a0 00:08:08.243526: 00:08:08.243700: 00:08:08.243879: 00:08:08.244054: 00:08:08.244211: ELF file SHA256: e104f0bd5a9e7c1a 00:08:08.247249: 00:08:08.247409: E (150) esp_core_dump_flash: Core dump flash config is corrupted! CRC=0x7bd5c66f instead of 0x0 00:08:08.255852: Rebooting... 00:08:08.731141: I (473) psram: This chip is ESP32-D0WD ...
Starting the app again yielded the above messages in the serial monitor. It looks like
abort()
is called because of something, after
which the device is reboooted. The I
line marks the
first log of the Launcher (Home) application.
That backtrace will give us sufficient clues to go on, but
we'll need some means to figure out where those addresses are. If we
had installed the Expressif tools, we would have a bunch of
binaries in ~/.espressif/tools/xtensa-esp32-elf/.../bin/
,
like xtensa-esp32-elf-addr2line
,
xtensa-esp32-elf-gdb
and
xtensa-esp32-elf-objdump
. Now I could only find
objdump
as
~/.arduino15/packages/esp32/tools/.../bin/objdump
.
objdump
is not addr2line
, but we can use
its output to find function locations with a helper script
objdump-syms-addr2line
:
#!/usr/bin/perl # Replacement for addr2line that uses "objdump --syms" output # # Why: because you might have objdump but not addr2line # # Example: # $ objdump --syms bin.elf | objdump-syms-addr2line 0x400d2d1e 0x400f2fd4 # 0x400d2d1e app_main [0x400d2d0c+50] # 0x400f2fd4 main_task [0x400f2f60+128] # # Alternative (if you have it): # $ addr2line -Cfe bin.elf 0x400d2d1e 0x400f2fd4 | sed -e '0~2s/^/ /' # app_main # ~/.arduino15/packages/esp32/hardware/esp32/2.0.11/cores/esp32/main.cpp:70 # main_task # /Users/ficeto/Desktop/ESP32/ESP32S2/esp-idf-public/components/freertos... # use warnings; my %l = (); my %f = (); for (<STDIN>) { # 400d63f4 l F .flash.text 000000c2 esp_ota_current_ota_is_workable if (/^([0-9a-f]+)\s+(\S+\s+){3}([0-9a-f]+)\s+(\S+)/ and hex($1)) { $l{hex($1)} = hex($3); $f{hex($1)} = $4; } } foreach (@ARGV) { my ($i, $x) = (hex, undef); foreach my $s (keys %l) { if ($s <= $i and $i < ($s + $l{$s})) { $x = $s; last; } } if ($x) { printf("0x%08x %s [0x%x+%d]\n", $i, $f{$x}, $x, $l{$x}); } else { print "0x".hex($i)." <unknown>\n"; } }
(Next time I should probably check all of find
~/.arduino15/packages/esp32/tools/ -perm /0700 -type f -path
'*/bin/*'
for more tools.)
Using this script on the addresses from the backtrace, we get this:
$ ~/.arduino15/packages/esp32/tools/xtensa-esp32-elf-gcc/esp-2021r2-patch5-8.4.0/xtensa-esp32-elf/bin/objdump \ --syms hello-serial/build/esp32.esp32.esp32wrover/hello-serial.ino.elf | \ objdump-syms-addr2line \ 0x40083495 0x400893d5 0x4008e1b1 0x400d6309 0x400d2088 0x400d2d1e 0x400f2fd4 0x40083495 panic_abort [0x40083480+29] 0x400893d5 esp_system_abort [0x400893d0+8] 0x4008e1b1 abort [0x4008e128+140] 0x400d6309 esp_ota_get_running_partition [0x400d62a0+108] 0x400d2088 initArduino [0x400d2080+348] 0x400d2d1e app_main [0x400d2d0c+50] 0x400f2fd4 main_task [0x400f2f60+128]
Good. Finally some useful clues.
esp_ota_get_running_partition
I cannot find the esp_ota_get_running_partition
definition in the .arduino15
dir, but I can find
initArduino
where that function is invoked —
in esp32-hal-misc.c
:
void initArduino() { #ifdef CONFIG_APP_ROLLBACK_ENABLE if(!verifyRollbackLater()){ const esp_partition_t *running = esp_ota_get_running_partition(); esp_ota_img_states_t ota_state; if (esp_ota_get_state_partition(running, &ota_state) == ESP_OK) { if (ota_state == ESP_OTA_IMG_PENDING_VERIFY) { if (verifyOta()) { esp_ota_mark_app_valid_cancel_rollback(); } else { log_e("OTA verification failed! Start rollback to the previous version ..."); esp_ota_mark_app_invalid_rollback_and_reboot(); } } } } #endif //init proper ref tick value for PLL (uncomment if REF_TICK is different than 1MHz) //ESP_REG(APB_CTRL_PLL_TICK_CONF_REG) = APB_CLK_FREQ / REF_CLK_FREQ - 1; #ifdef F_CPU setCpuFrequencyMhz(F_CPU/1000000); ...
So, it looks like it bails out right after starting:
initArduino()
is invoked,
verifyRollbackLater()
returns false and then
esp_ota_get_running_partition()
calls abort because of an
unexpected situation.
For normal ESP32 applications, this panic is not triggered. But the badge does something special when chainloading the second app — the Launcher (Home) app being the first one.
I figure we don't need any rollback. The badge handles bad apps just fine — as we noticed already — so we'll need to figure out a way to skip this code.
Fixing through sdkconfig.h
Avoiding the call to esp_ota_get_running_partition()
can
be done by unsetting/removing the CONFIG_APP_ROLLBACK_ENABLE
define.
In
~/.arduino15/packages/esp32/.. ../tools/sdk/esp32/sdkconfig
the config is seen, but this is not the active config:
CONFIG_BOOTLOADER_APP_ROLLBACK_ENABLE=y CONFIG_APP_ROLLBACK_ENABLE=y
We cannot undefine it from our compilation unit. We have to alter
sdkconfig.h
. There are a bunch of those. We're using
~/.arduino15/packages/esp32/..
../tools/sdk/esp32/qio_qspi/include/sdkconfig.h
at the moment (see Tools > Flash Mode menu option):
// ... #define CONFIG_BOOTLOADER_APP_ROLLBACK_ENABLE 1 // ... #define CONFIG_APP_ROLLBACK_ENABLE CONFIG_BOOTLOADER_APP_ROLLBACK_ENABLE // ...
By removing/commenting both we can build a new binary without the
code that calls esp_ota_get_running_partition()
:
// ... //#define CONFIG_BOOTLOADER_APP_ROLLBACK_ENABLE 1 // ... //#define CONFIG_APP_ROLLBACK_ENABLE CONFIG_BOOTLOADER_APP_ROLLBACK_ENABLE // ...
Build, Upload, Run. Don't worry about the screen not updating (showing "Connected to PC"). Check with sermon:
$ sermon -t /dev/ttyACM0 115200 sermon [-t] DEV BAUD; now listening on /dev/ttyACM0 with 115200 baud [ 455][I][esp32-hal-psram.c:96] psramInit(): PSRAM enabled 10:03:41.117540: hello MCH2022 10:03:41.118636: 480 10:03:42.117090: 1479 10:03:43.117141: 2479 ...
Yes, victory! The app starts and appears to run fine. (Don't worry about any garbage showing up in the serial output during upload.)
Fixing from the project
Editing that sdkconfig.h
is cumbersome. There must be a
better way. And luckily there is. (Start by undoing your edits to
sdkconfig.h
if you haven't already.)
We could try to override the
esp_ota_get_running_partition
function itself, and have it
return NULL
instead of aborting:
extern "C" const esp_partition_t* esp_ota_get_running_partition() { return NULL; }
Alas, the linking step fails:
Linking everything together... ~/.arduino15/packages/esp32/tools/xtensa-esp32-elf-gcc/esp-2021r2-patch5-8.4.0/bin/xtensa-esp32-elf-g++ ... -o /tmp/arduino/sketches/2EA640295D81B35604C751A774B1A1FE/hello-serial.ino.elf ~/.arduino15/packages/esp32/tools/xtensa-esp32-elf-gcc/esp-2021r2-patch5-8.4.0/bin/../lib/gcc/xtensa-esp32-elf/8.4.0/../../../../xtensa-esp32-elf/bin/ld: ~/.arduino15/packages/esp32/hardware/esp32/2.0.11/tools/sdk/esp32/lib/libapp_update.a(esp_ota_ops.c.obj): in function `esp_ota_get_running_partition': /Users/ficeto/Desktop/ESP32/ESP32S2/esp-idf-public/components/app_update/esp_ota_ops.c:533: multiple definition of `esp_ota_get_running_partition'; /tmp/arduino/sketches/2EA640295D81B35604C751A774B1A1FE/sketch/hello-serial.ino.cpp.o:~/Arduino/projmch2022/hello-serial/hello-serial.ino:3: first defined here collect2: error: ld returned 1 exit status
That was to be expected. We're not using shared libraries, so we cannot have multiple functions with the same name.
But the verifyRollbackLater()
function we saw earlier
can be overridden. In fact, it's specifically tailored to be,
through __attribute__((weak))
. If we create one and have
it return true
, we'll also skip the call to
esp_ota_get_running_partition()
and achieve our goal:
extern "C" bool verifyRollbackLater() { return true; }
Add the above line to our sketch. This builds, and uploads. All is well!
Ending the application
Now we need a way to exit the application and return to the Launcher.
The Badge.team guys already helpfully added the relevant code. Reproduced here with some extra functionality for future reference:
/* Needed for return_to_launcher() */ #include <esp_system.h> #include <soc/rtc.h> #include <soc/rtc_cntl_reg.h> /* Needed for *_inputs() */ #include <Wire.h> /* Workaround to skip esp_ota_get_running_partition() */ extern "C" bool verifyRollbackLater() { return true; } /* See ESP PIN numbers here: * https://github.com/badgeteam/esp32-component-mch2022-bsp/ * blob/2da8dbd50679512058820b40bfd6d7a5c529b28c/include/mch2022_badge.h */ #define GPIO_I2C_SDA 22 #define GPIO_I2C_SCL 21 #define GPIO_INT_RP2040 34 /* active LOW */ /* I2C addresses */ #define RP2040_ADDR 0x17 /* RP2040 co-processor */ /* See RP register addresses here: * https://github.com/badgeteam/esp32-component-mch2022-rp2040/ * blob/45fce00c790a2918bf506dd8af0c0b68db05d031/include/rp2040.h * Feed through: * sed -e '1,/RP2040_REG_FW_VER/d;/}/,$d;s/,//' | awk '{printf "#define %s 0x%02x\n",$1,NR}' */ #define RP2040_REG_INPUT1 0x06 /* Buttons, accessible through the RP2040 through wire I2C */ #define BTN_HOME 0x0001 /* home */ #define BTN_MENU 0x0002 /* menu */ #define BTN_START 0x0004 /* start */ #define BTN_A 0x0008 /* accept */ #define BTN_B 0x0010 /* back */ #define BTN_RES0 0x0020 /* (RP2040_INPUT_FPGA_CDONE) */ #define BTN_RES1 0x0040 /* (RP2040_INPUT_BATTERY_CHARGING) */ #define BTN_SELECT 0x0080 /* select */ #define BTN_LEFT 0x0100 /* joy left */ #define BTN_JOY 0x0200 /* joy press */ #define BTN_DOWN 0x0400 /* joy down */ #define BTN_UP 0x0800 /* joy up */ #define BTN_RIGHT 0x1000 /* joy right */ #define BTN2STATE(button) (button & 0xffff) #define BTN2BTN(button) ((button >> 16) & 0xffff) #define BUTTON_PRESSED(button, compare) \ ((BTN2BTN(button) & compare) && (BTN2STATE(button) & compare) == compare) #define BUTTON_RELEASED(button, compare) \ ((BTN2BTN(button) & compare) && (BTN2STATE(button) & compare) == 0) void init_inputs(); uint32_t read_inputs(); void return_to_launcher(); void setup() { Serial.begin(115200); Serial.println("hello MCH2022"); init_inputs(); } void loop() { if (!digitalRead(GPIO_INT_RP2040)) { uint32_t input = read_inputs(); if (BUTTON_RELEASED(input, BTN_HOME)) { return_to_launcher(); } else { Serial.printf("input: 0x%08x // 0x<button><state>\r\n", input); } } } void init_inputs() { Wire.begin(GPIO_I2C_SDA, GPIO_I2C_SCL); pinMode(GPIO_INT_RP2040, INPUT); read_inputs(); } uint32_t read_inputs() { Wire.beginTransmission(RP2040_ADDR); Wire.write(RP2040_REG_INPUT1); Wire.endTransmission(); Wire.requestFrom(RP2040_ADDR, 4); /* state = 0x20 + sum of all pressed buttons */ uint16_t state = Wire.read() | (Wire.read() << 8); /* which_button = 1, 2, 4, 8, ... */ uint16_t which_button = Wire.read() | (Wire.read() << 8); /* NOTE: which_button may hold more than one if the presses/releases * are registered "simultaneously" */ return (which_button << 16 | state); } void return_to_launcher() { REG_WRITE(RTC_CNTL_STORE0_REG, 0); esp_restart(); }
You can now see buttons get pressed/depressed:
10:40:24.257248: input: 0x00080028 // 0x<button><state> 10:40:24.497229: input: 0x00080020 // 0x<button><state>
Note that this sketch still lacks the code to interface with the screen. That will be for another day...
sermon (serial-monitor)
As promised above, this sermon
snippet is better than the crippled
built-in Serial Monitor in Arduino IDE 2:
#!/bin/sh # sermon (serial-monitor) -- more convenient for serial logs than screen(1) set -eu test "${1:-}" = '-t' && { with_ts=true; shift; } || with_ts=false dev=${1:-/dev/ttyACM0} baud=${2:-115200} stty -F "$dev" "$baud" raw -clocal -echo || { echo "Usage: ${0##*/} [-t] DEV BAUD" >&2; exit 1; } echo "${0##*/} [-t] DEV BAUD; now listening on $dev with $baud baud" >&2 if $with_ts; then exec perl -e 'use strict;use warnings; use Time::HiRes qw(gettimeofday);use POSIX qw(strftime); my ($nl,$t,$ut)=1;while(sysread STDIN,$b,1){ if($nl){($t,$ut)=gettimeofday;syswrite STDOUT, (strftime "%H:%M:%S",localtime $t).(sprintf ".%06d: ",$ut);$nl=0} $nl=1 if $b eq "\n";syswrite STDOUT,$b}' <"$dev" else exec cat "$dev" fi
Usage example: sermon -t /dev/ttyACM0 115200
2023-07-08 - laptop battery discharge / logging
I recently got a new Framework Laptop. It is generally nice. Apart from the reflective screen, and the excessive battery consumption.
In suspend mode, it draws too much battery: more than a Watt. This is a known issue. The worst offenders are the expansion cards. For instance the USB-A cards consume about 350mW each, just by being plugged in.
To do some testing, I whipped up a script allowing easy access to battery usage logs: discharge-log
Place it in /usr/lib/systemd/system-sleep
with execute permissions.
It will log usage before going into sleep and after waking up. This allows for easy testing of various configurations of expansion cards. The values look like this:
# journalctl -o cat -t discharge-log -b0 --grep dT | tail -n1 (sleep) dT = 362 (s), dQ = -9 (mAh) = 138 (mWh), rate = 0 (plmpt), rate/h = 0 (plmpt/h), dE = -542 (J), P = -1498 (mW)
One can run it manually too:
$ /usr/lib/systemd/system-sleep/discharge-log begin
$ /usr/lib/systemd/system-sleep/discharge-log show '(testing charge)' (testing charge) dT = 185 (s), dQ = 32 (mAh) = 561 (mWh), rate = 10 (pmlpt), rate/h = 194 (pmlpt/h), dE = 2019 (J), P = 10916 (mW)
$ /usr/lib/systemd/system-sleep/discharge-log begin
$ /usr/lib/systemd/system-sleep/discharge-log show '(testing discharge)' (testing discharge) dT = 314 (s), dQ = -50 (mAh) = -854 (mWh), rate = -15 (pmlpt), rate/h = -171 (pmlpt/h), dE = -3074 (J), P = -9791 (mW)
(The plmpt values are promille-points.)
Ejecting all (non USB-C) cards leaves us with a battery usage of about
400mW. Still steep, but a lot better than a Watt. In my tests —
with 2023-07 firmware and a Linux 6.1 kernel (linux-oem-22.04c) —
deep
versus s2idle
sleep did matter a
lot: I could leave the USB-A cards connected (with one mouse
dongle) and get about 520mW discharge rate with s2idle
; a
clear winner.
2023-07-03 - viewing unencrypted traffic / ltrace / bpftrace
Can we view TLS-encrypted traffic on the originating or terminating host, without having to decode the data from the wire?
This is a question that comes up every now and then when trying to debug a service by looking at how it communicates.
For the most insight, we should capture the encrypted traffic and use the (logged!) pre-master secret keys. See example details in make-master-secret-log discussing how to have HAProxy log them. Oftentimes this involves a lot of work: capturing the traffic, logging the keys, collecting both and running them through Wireshark on a local machine.
If we have access to the encryption libraries, there can be easier ways.
Using ltrace?
We should be able to use ltrace which traces the invocation of library calls. ltrace is the library counterpart of strace which traces system calls. Looking at the network system calls (recvfrom or sendto) would be useless: we'd see the encrypted traffic. Encryption is done in userland, so we need to be one step closer to the application.
By using ltrace, we can trace function calls before data gets encrypted and after data gets decrypted. In this case we're interested in SSL_read and SSL_write from the OpenSSL library — assuming that that is the TLS library our application is using.
Let's try it on cURL. This cURL build is compiled against OpenSSL, as is common:
$ curl --version | grep -i ssl curl 7.81.0 (x86_64-pc-linux-gnu) libcurl/7.81.0 OpenSSL/3.0.2 ...
Invoke it from ltrace:
$ ltrace curl --http1.1 -I https://www.google.com -sSfLo/dev/null +++ exited (status 0) +++
Nothing! No library calls whatsoever?
It turns out that this version of ltrace (0.7.3-6.1ubuntu6 on Ubuntu/Jammy) is too old and suffers from ltrace produces no debug output at all.
The problem is fixed by ec56370 (Add Intel CET support). After fetching and building that, we can try the newer ltrace:
$ ltrace-0.8.x -e SSL_read -e SSL_write \ curl --http1.1 -I https://www.google.com -sSfLo/dev/null libcurl.so.4->SSL_write(0x55ad69380010, 0x55ad693a0f30, 79, 1) = 79 libcurl.so.4->SSL_read(0x55ad69380010, 0x55ad691eaa80, 0x19000, 1) = 1215 libcurl.so.4->SSL_read(0x55ad69380010, 0x7ffd2590e4e0, 32, 0x55ad691e5828) = 0xffffffff +++ exited (status 0) +++
Good! We can see the calls to SSL_write and SSL_read.
(Ignore the mention of libcurl.so.4
here. We are in fact seeing the libssl
calls.)
Unfortunately ltrace doesn't know that the second argument to both functions is a character buffer that often contains readable strings. We see the memory locations of the buffers but we do not get to see their contents. So, using plain ltrace does not help us.
Using latrace?
There exists a similar application that we can use: latrace.
We can convince latrace to print arguments (see -A
and
-a
). But there we'll get the arguments before the
call only, not after: we can see the write data going into
SSL_write but we will only see uninitialized data going into
SSL_read, not the read data coming out of it.
Using bpftrace?
Maybe eBPF can come to the rescue?
First we need to know which library our application is using exactly:
$ ldd `which curl` | grep -E 'ssl|tls' libssl.so.3 => /lib/x86_64-linux-gnu/libssl.so.3 (0x00007feb84d20000) libgnutls.so.30 => /lib/x86_64-linux-gnu/libgnutls.so.30 (0x00007feb843a6000)
We can use bpftrace and attach BPF userland probes
on SSL_read and SSL_write in libssl.so.3
:
$ sudo bpftrace -e ' uprobe:/lib/x86_64-linux-gnu/libssl.so.3:SSL_read, uprobe:/lib/x86_64-linux-gnu/libssl.so.3:SSL_write { @ctx[pid] = arg0; @buf[pid] = arg1; @len[pid] = arg2; } uretprobe:/lib/x86_64-linux-gnu/libssl.so.3:SSL_read, uretprobe:/lib/x86_64-linux-gnu/libssl.so.3:SSL_write { printf("[%d/%s] %s(%p, %p, %d) = %d", pid, comm, probe, @ctx[pid], @buf[pid], @len[pid], retval); if ((int32)retval > 0) { @slen = retval; if (@slen >= 64) { printf(" [[\n%s\n]] (truncated)", str(@buf[pid], @slen)); } else { printf(" [[\n%s\n]]", str(@buf[pid], @slen)); } } printf("\n"); delete(@ctx[pid]); delete(@buf[pid]); delete(@len[pid]); } '
(Our cURL uses OpenSSL. Adapting this for GnuTLS encrypted reads/writes is left as an exercise for the reader.)
eBPF does not support spawning the process itself. We'll have to ensure the sniffed program is started elsewhere:
$ curl --http1.1 -I https://www.google.com -sSfLo/dev/null
The bpftrace probe displays this:
Attaching 4 probes... [2556865/curl] uretprobe:/lib/x86_64-linux-gnu/libssl.so.3:SSL_write(0x55ad2054d010, 0x55ad2056df30, 79) = 79 [[ HEAD / HTTP/1.1 Host: www.google.com User-Agent: curl/7.81.0 ]] (truncated) [2556865/curl] uretprobe:/lib/x86_64-linux-gnu/libssl.so.3:SSL_read(0x55ad2054d010, 0x55ad203b7a80, 102400) = 1214 [[ HTTP/1.1 200 OK Content-Type: text/html; charset=ISO-8859-1 C ]] (truncated) [2556865/curl] uretprobe:/lib/x86_64-linux-gnu/libssl.so.3:SSL_read(0x55ad2054d010, 0x7fff2f584060, 32) = -1 ^C
Looking better! Some contents. Not very conveniently formatted, but it's a start.
The biggest problem here is that we're limited by BPFTRACE_STRLEN
this time. We can get at most 64 characters, or maybe 200 if you push it.
Alternatives?
Are there alternatives?
We could attach a full fledged debugger (gdb), but that might feel a bit heavy.
We could LD_PRELOAD a library that wraps the
SSL_read and SSL_write calls, but this only
works if we start the application with this wrapper.
We could use ptrace(2)
and create a minimal debugger
that does what ltrace does, but then ensure that the buffers
are printed in full.
Options, options.
And then we still haven't tackled the issue of statically compiled programs: many services in today's ecosystem are written in golang. They use no (or very few) dynamic libraries. Trying to trap OpenSSL or GnuTLS function calls there is meaningless. I'm afraid we'll have to dig into golang debugging at some point...
2023-06-30 - removing auditd / disabling logging
After installing auditd for testing purposes and removing it again, my kernel logs got flooded with messages. How do I disable them?
If you happened to have installed auditd, it is likely that
the kernel audit subsystem was enabled. Even when there are no
rules left (auditctl -l
) you can still get more messages
in your kernel logs than before.
For instance, after uninstalling auditd, I still get the following ones:
$ sudo journalctl -o cat --dmesg --grep audit -n2 audit: type=1106 audit(1688134323.226:190): pid=2476506 uid=1000 auid=1000 ses=3 subj=unconfined msg='op=PAM:session_close grantors=pam_limits,.. acct="root" exe="/usr/bin/sudo" hostname=? addr=? terminal=/dev/pts/15 res=success' audit: type=1104 audit(1688134323.226:191): pid=2476506 uid=1000 auid=1000 ses=3 subj=unconfined msg='op=PAM:setcred grantors=pam_permit acct="root" exe="/usr/bin/sudo" hostname=? addr=? terminal=/dev/pts/15 res=success'
The audit rules might be gone, but userland can still log messages, and they are now forwarded through the kernel ring buffer. The pam subsystem does this, for instance:
int _pam_auditlog(pam_handle_t *pamh, int action, int retval, int flags, struct handler *h) /* ... */ case PAM_CLOSE_SESSION: message = "session_close"; /* ... */ if (_pam_audit_writelog(pamh, audit_fd, type, message, grantors ? grantors : "?", retval) < 0) /* ... */
Not expecting those logs after removing auditd?
Here's a quick way to disable audit logging again: compile the following C snippet, and run it to disable.
$ gcc -Wall -o audit_enable_disable audit_enable_disable.c $ sudo ./audit_enable_disable 0
audit_enable_disable.c
/* GPL2.1+ */ #include <asm/types.h> #include <errno.h> #include <fcntl.h> #include <linux/audit.h> #include <linux/netlink.h> #include <stdint.h> #include <stdio.h> #include <stdlib.h> #include <string.h> #include <sys/socket.h> #include <unistd.h> #define MAX_AUDIT_MESSAGE_LENGTH 8970 // PATH_MAX*2+CTX_SIZ*2+11+256+1 struct audit_message { struct nlmsghdr nlh; char data[MAX_AUDIT_MESSAGE_LENGTH]; }; struct audit_reply { int type; int len; struct nlmsghdr *nlh; struct audit_message msg; struct nlmsgerr *error; }; int audit_open(void) { int fd = socket(PF_NETLINK, SOCK_RAW, NETLINK_AUDIT); if (fd < 0) { perror("socket"); return -1; } if (fcntl(fd, F_SETFD, FD_CLOEXEC) == -1) { perror("fcntl"); close(fd); return -1; } return fd; } int audit_recv_error(int fd) { struct audit_message resp = {0}; struct sockaddr_nl nladdr; int ret; socklen_t nladdrlen = sizeof(nladdr); do { ret = recvfrom( fd, &resp, sizeof(resp), 0, (struct sockaddr*)&nladdr, &nladdrlen); } while (ret < 0 && errno == EINTR); if (ret < 0 || nladdrlen != sizeof(nladdr) || nladdr.nl_pid) { perror("recvfrom"); return -1; } if (resp.nlh.nlmsg_type != NLMSG_ERROR) { fprintf( stderr, "audit_recv_error: got unexpected %d\n", resp.nlh.nlmsg_type); return -1; } return ((struct nlmsgerr*)NLMSG_DATA(&resp.nlh))->error == 0 ? 0 : -1; } int audit_send(int fd, int type, const void *data, unsigned int size) { struct audit_message req = {0}; struct sockaddr_nl addr = {.nl_family = AF_NETLINK}; int ret; req.nlh.nlmsg_len = NLMSG_SPACE(size); req.nlh.nlmsg_type = type; req.nlh.nlmsg_flags = NLM_F_REQUEST|NLM_F_ACK; req.nlh.nlmsg_seq = 1; if (size && data) { memcpy(NLMSG_DATA(&req.nlh), data, size); } do { ret = sendto( fd, &req, req.nlh.nlmsg_len, 0, (struct sockaddr*)&addr, sizeof(addr)); } while (ret < 0 && errno == EINTR); if (ret < 0 || ret != (int)req.nlh.nlmsg_len) { perror("sendto"); return -1; } return audit_recv_error(fd); /* because NLM_F_ACK */ } int audit_set_enabled(int fd, uint32_t enabled) { struct audit_status s = { .mask = AUDIT_STATUS_ENABLED, .enabled = enabled}; int ret = audit_send(fd, AUDIT_SET, &s, sizeof(s)); if (ret < 0) { fprintf(stderr, "audit_set_enabled: failed, are you root?\n"); return -1; } return ret; } int main(int argc, const char *argv[]) { int ret = 0; int fd = audit_open(); if (fd < 0) { ret = -1; } else if (argc == 2) { uint32_t enable = atoi(argv[1]); /* allow 2 here to permanently enable until reboot */ if (enable != 1) { enable = 0; } if (audit_set_enabled(fd, enable) < 0) { ret = 2; } } else { printf("Usage: %s 0|1\n", argv[0]); ret = 1; } if (fd >= 0) { close(fd); } return ret; }
Or, if you still have auditctl, just do: auditctl -e 0
2023-06-15 - netplan / docker0 / bind on 172.17.0.1
If you want to bind your host-service to a the docker IP, exposing it to docker instances, means that that IP needs to exist first. If it doesn't, your log might look like this:
LOG: listening on IPv4 address "127.0.0.1", port 5432 LOG: could not bind IPv4 address "172.17.0.1": Cannot assign requested address WARNING: could not create listen socket for "172.17.0.1" LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
As you probaby know, you cannot bind to an IP that is not
configured on an interface anywhere — barring the
net.ipv4.ip_nonlocal_bind
sysctl setting.
So, if you want to expose, say, your postgresql to a
docker instance, you could try to set:
listen_addresses = '127.0.0.1,172.17.0.1'
postgresql will refuse to start
unless that 172.17.0.1
IP exists on a local interface.
You could reorder your dependencies to have postgresql depend on docker, but that makes no sense from a logical perspective. A better fix is to make sure that the docker IP exists.
Add this in /etc/netplan/60-docker.yaml
:
network: version: 2 renderer: networkd bridges: docker0: addresses: - 172.17.0.1/16 parameters: forward-delay: 0ms stp: false
postgresql will depend on the network being up, so now it should start on boot again without failing.
2023-04-13 - ansible / ipv6 addresses / without link local
Trying to get the IPv6 addresses as ansible fact, but getting
unwanted link_local
scope addresses?
Maybe you're landing here because you did not see a solution at “all_ipv6_addresses includes link local addresses”. (I find it most odd that they locked the GitHub ticket, eliminating the possibility for anyone to reply with a fix.)
The
ansible_all_ipv6_addresses | reject("ansible.utils.in_network", "fe80::/10")
construct works for me. For example:
local-address=127.0.0.1, ::1, {{ ( ansible_all_ipv4_addresses + ( ansible_all_ipv6_addresses|reject("ansible.utils.in_network","fe80::/10")) )|join(", ") }}
2023-03-31 - zabbix server / jammy upgrade / missing font
The other day, we upgraded the host OS for our Zabbix Server from Ubuntu/Focal to Ubuntu/Jammy. This caused all text to go missing from the rendered graphs.
The php (uwsgi) logs had the following to say:
PHP Warning: imagettfbbox(): Could not find/open font in /usr/share/zabbix/include/graphs.inc.php on line 600
At least that's a pretty clear message.
Through a quick php live hack we learned that it tried to
open /usr/share/zabbix/assets/fonts/graphfont.ttf
. This
file was a symlink to
/etc/alternatives/zabbix-frontend-font
and that was a
symlink to
/usr/share/fonts/truetype/ttf-dejavu/DejaVuSans.ttf
.
But that symlink target did not exist.
# update-alternatives --config zabbix-frontend-font update-alternatives: warning: /etc/alternatives/zabbix-frontend-font is dangling; it will be updated with best choice There is only one alternative in link group zabbix-frontend-font (providing /usr/share/zabbix/assets/fonts/graphfont.ttf): /usr/share/fonts/truetype/ttf-dejavu/DejaVuSans.ttf Nothing to configure.
That didn't fix anything.
Apparently the fonts-dejavu-core package has moved the fonts
from /usr/share/fonts/truetype/ttf-dejavu
to
/usr/share/fonts/truetype/dejavu
. There we could find
our trusted old DejaVuSans.ttf
.
The fix:
# ln -sf /usr/share/fonts/truetype/dejavu/DejaVuSans.ttf \ /etc/alternatives/zabbix-frontend-font
2023-03-02 - postfix / no system resources / proxy protocol
Connecting to Postfix and getting a "421 4.3.2 No system resources"? Maybe you forgot you're using the (HAProxy) Proxy Protocol...
If you're trying to connect to your Postfix mail daemon, and it looks like this:
$ nc localhost 25 ... wait for 5 seconds ... 421 4.3.2 No system resources
Then I bet you're using HAProxy as reverse proxy to your mailserver and you have the following configured:
$ postconf | grep ^postscreen_upstream postscreen_upstream_proxy_protocol = haproxy postscreen_upstream_proxy_timeout = 5s
To test a direct connection, you'll need to prefix your traffic with the
proxy protocol v1 handshake. That can be as simple as pasting
PROXY TCP4 127.1.1.1 127.0.0.1 12345 25
as first line:
$ nc localhost 25 PROXY TCP4 127.0.123.45 127.0.0.1 12345 25 220 the-mail-server ESMTP Postfix
After that, you can resume typing SMTP commands as you're used to.
An alternative solution is to use LD_PRELOAD
ed code that
does this for you. For example libproxyproto by
Michael Santos:
$ git clone https://github.com/msantos/libproxyproto.git ... $ cd libproxyproto $ make ...
That creates a libproxyproto.so
and a
libproxyproto_connect.so
shared library. And now you can do
this:
$ LD_PRELOAD=./libproxyproto_connect.so LIBPROXYPROTO_VERSION=1 nc localhost 25 220 the-mail-server ESMTP Postfix
Or — if you need this a lot — you can move them to
/usr/local/lib/
and add a small
/usr/local/bin/proxyproto_v1
shell wrapper:
#!/bin/sh # Invoke as: proxyproto_v1 nc [args...] LD_PRELOAD=/usr/local/lib/libproxyproto_connect.so \ LIBPROXYPROTO_VERSION=1 \ exec "$@"
2023-02-17 - oneliner / finding fixed kernel bugs
Recently we were bitten by an old kernel bug on Ubuntu that only rarely triggers. Finding out where the problem was is easier if you know where to look.
We had no kernel logs to go on. Only a hanging machine with no output. And a hunch that the running kernel version linux-image-5.4.0-122 was the likely culprit. Was there another way than meticulously reading all changelogs and changes to find out which bug we're dealing with?
For issues that are already fixed in a newer kernel, the following recipe seems to work. (Note that we would like to use commit ids instead of string matching, but this won't work because Ubuntu cherry-picks the commits, altering the commit ids.)
Step 1: get the Ubuntu kernel repo:
$ git clone git://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/focal \ linux-ubuntu-focal
Step 2: check which versions have been released:
$ git -c tag.sort=-v:refname tag | grep Ubuntu-5.4.0 Ubuntu-5.4.0-144.161 Ubuntu-5.4.0-143.160 ... Ubuntu-5.4.0-123.139 Ubuntu-5.4.0-122.138 Ubuntu-5.4.0-121.137 Ubuntu-5.4.0-120.136 ...
Step 3: pick which versions we want to check. In this case, we're
assuming that the bug was introduced in Ubuntu-5.4.0-122.138
, so
we'll plug in Ubuntu-5.4.0-121.137..Ubuntu-5.4.0-122.138
.
And we'll plug in
Ubuntu-5.4.0-122.138..Ubuntu-5.4.0-144.161
so we get all
versions to the end.
$ git log Ubuntu-5.4.0-121.137..Ubuntu-5.4.0-122.138 --format=%s | grep -v ^UBUNTU: | while read -r l; do res=$( git log Ubuntu-5.4.0-122.138..Ubuntu-5.4.0-144.161 \ --grep "Fixes:.*$(printf '%s' "$l" | sed -e 's/[^A-Za-z0-9 ]/./g')" \ --oneline) test -n "$res" && echo "$res (fixes: $l)" done 97842ea930e0 tcp: make sure treq->af_specific is initialized (fixes: tcp: md5: incorrect tcp_header_len for incoming connections)
$ git show 97842ea930e0 commit 97842ea930e0eb94bfdb87beaf87d56224c1e8ad Author: Eric Dumazet <edumazet@google.com> Date: Sun Apr 24 13:35:09 2022 -0700 tcp: make sure treq->af_specific is initialized ... Fixes: 5b0b9e4c2c89 ("tcp: md5: incorrect tcp_header_len for incoming connections")
Nice. One result. Seems to be the one we're looking for.
$ git -c tag.sort=-v:refname tag --contains 97842ea930e0 | tail -n1 Ubuntu-5.4.0-123.139
And it was included in the next release already.
2023-01-27 - windows openvpn / unexpected default route
The other day, I was looking into a VPN client issue. The user could connect, they would get their routes pushed, but they would then proceed to use the VPN for all traffic instead of just the routes we provided them.
We did not push a default route, because this VPN server exposed a small internal network only. Any regular internet surfing should be done directly. So, when I looked at a tcpdump I was baffled when I saw that DNS lookups were attempted through the OpenVPN tunnel:
12:50:45.992684 IP 10.8.8.11.51953 > 8.8.8.8.53: 51928+ A? kv601.prod.do.dsp.mp.microsoft.com. (52)
The server in question runs OpenVPN 2.4.
The client that exhibited this behaviour was OpenVPN Connect v3 for Windows, with the following peer info, according to the server logs:
peer info: IV_VER=3.git::d3f8b18b peer info: IV_PLAT=win peer info: IV_NCP=2 peer info: IV_TCPNL=1 peer info: IV_PROTO=30 peer info: IV_CIPHERS=AES-256-GCM:AES-128-GCM:CHACHA20-POLY1305 peer info: IV_AUTO_SESS=1 peer info: IV_GUI_VER=OCWindows_3.3.6-2752 peer info: IV_SSO=webauth,openurl,crtext
The users with Linux OpenVPN clients had no issues with this VPN. Was there an extra setting in the Windows OpenVPN Connect that we could change? A “Send all traffic over this tunnel” option to uncheck, perhaps? There seemed to be very few settings.
One thing we had changed recently was the DNS.
We had begun pushing 8.8.8.8
as DNS to the
users (to solve a different issue) using the following rules:
push "dhcp-option DNS 8.8.8.8" push "dhcp-option DNS 8.8.4.4" push "dhcp-option DOMAIN-ROUTE one-specific-domain.tld"
This rule was supposed to force lookups for
*.one-specific-domain.tld
to go through the aforementioned
Google DNS servers. Maybe the VPN client secretly added a route
for this under the assumption that if you want a specific DNS
server for VPN, it should be routed through the VPN as well.
This was easy enough to test. I allowed traffic to
8.8.8.8
and 8.8.4.4
to go through the
VPN.
Did that fix the problem? Well, no. DNS resolving worked for the user, and now actual (non-DNS) traffic would be attempted through the VPN as well:
13:02:14.618777 IP 10.8.8.11.52040 > 23.2.214.66.443: Flags [S], seq 932856193, win 64240, options [mss 1289,nop,wscale 8,nop,nop,sackOK], length 0
What is up with this? A route print
on the
Windows side showed nothing out of the ordinary:
============================================================ Active Routes: Network Destination Netmask Gateway Interface 0.0.0.0 0.0.0.0 192.168.1.1 192.168.1.100 <- default 10.55.55.55 255.255.255.255 10.8.8.1 10.8.8.11 <- vpn 10.8.8.0 255.255.255.0 On-link 10.8.8.11 <- vpn 10.8.8.11 255.255.255.255 On-link 10.8.8.11 <- vpn 10.8.8.255 255.255.255.255 On-link 10.8.8.11 <- vpn 127.0.0.0 255.0.0.0 On-link 127.0.0.1 127.0.0.1 255.255.255.255 On-link 127.0.0.1 127.255.255.255 255.255.255.255 On-link 127.0.0.1 192.168.1.0 255.255.255.0 On-link 192.168.1.100 192.168.1.100 255.255.255.255 On-link 192.168.1.100 192.168.1.255 255.255.255.255 On-link 192.168.1.100 123.123.123.123 255.255.255.255 192.168.1.1 192.168.1.100 <- vpn-server 224.0.0.0 240.0.0.0 On-link 127.0.0.1 224.0.0.0 240.0.0.0 On-link 192.168.1.100 224.0.0.0 240.0.0.0 On-link 10.8.8.11 <- vpn 255.255.255.255 255.255.255.255 On-link 127.0.0.1 255.255.255.255 255.255.255.255 On-link 192.168.1.100 255.255.255.255 255.255.255.255 On-link 10.8.8.11 <- vpn ============================================================
Ignoring broadcast and multicast addresses, only
10.55.55.55
and 10.8.8.*
should go through the VPN interface.
The default route 0.0.0.0
is clearly marked to
go through the regular internet via the
192.168.1.1
gateway.
This does not explain at all why traffic to 8.8.8.8
or
23.2.214.66
goes to the VPN.
In a last ditch attempt to fix things, I tried what happens if we
did push 8.8.8.8
and 8.8.4.4
as
routes that should go through the VPN:
push "route 8.8.8.8 255.255.255.255 vpn_gateway" push "route 8.8.4.4 255.255.255.255 vpn_gateway"
Lo and behold! Things started working properly. Traffic to
10.55.55.55
(and to the nameservers) now goes through the
tunnel, but traffic to the rest of the internet properly takes the default
route.
I cannot explain why OpenVPN Connect on Windows would not use the routes it prints. Maybe there is a “Use default gateway on remote network” setting somewhere that got enabled when it received a DNS server IP that was not pushed over the same tunnel. One would think that this would be visible on the routing table though. If anyone reading this can explain this phenomenon, please drop us a line.
2023-01-17 - django 1.8 / python 3.10
After upgrading a machine to Ubuntu/Jammy there was an old Django 1.8 project that refused to run with the newer Python 3.10.
... File "django/db/models/sql/query.py", line 11, in <module> from collections import Iterator, Mapping, OrderedDict ImportError: cannot import name 'Iterator' from 'collections' (/usr/lib/python3.10/collections/__init__.py)
This was relatively straight forward to fix, by using the following patch. Some parts were stolen from a stackoverflow response by Elias Prado.
--- a/django/core/paginator.py 2023-01-11 14:09:04.915505171 +0100 +++ b/django/core/paginator.py 2023-01-11 14:09:29.407130151 +0100 @@ -1,4 +1,4 @@ -import collections +from collections.abc import Sequence from math import ceil from django.utils import six @@ -103,7 +103,7 @@ class Paginator(object): QuerySetPaginator = Paginator # For backwards-compatibility. -class Page(collections.Sequence): +class Page(Sequence): def __init__(self, object_list, number, paginator): self.object_list = object_list --- a/django/db/migrations/writer.py 2023-01-11 14:13:07.507799080 +0100 +++ b/django/db/migrations/writer.py 2023-01-11 14:14:36.978436145 +0100 @@ -1,6 +1,6 @@ from __future__ import unicode_literals -import collections +from collections.abc import Iterable import datetime import decimal import math @@ -434,7 +434,7 @@ class MigrationWriter(object): % (value.__name__, module_name, get_docs_version())) return "%s.%s" % (module_name, value.__name__), {"import %s" % module_name} # Other iterables - elif isinstance(value, collections.Iterable): + elif isinstance(value, Iterable): imports = set() strings = [] for item in value: --- a/django/db/models/base.py 2023-01-11 14:17:13.471982572 +0100 +++ b/django/db/models/base.py 2023-01-11 14:19:38.337720520 +0100 @@ -80,7 +80,12 @@ class ModelBase(type): # Create the class. module = attrs.pop('__module__') - new_class = super_new(cls, name, bases, {'__module__': module}) + new_attrs = {'__module__': module} + classcell = attrs.pop('__classcell__', None) + if classcell is not None: + new_attrs['__classcell__'] = classcell + new_class = super_new(cls, name, bases, new_attrs) + attr_meta = attrs.pop('Meta', None) abstract = getattr(attr_meta, 'abstract', False) if not attr_meta: --- a/django/db/models/fields/__init__.py 2023-01-11 14:12:50.780054102 +0100 +++ b/django/db/models/fields/__init__.py 2023-01-11 14:14:02.290964344 +0100 @@ -1,7 +1,7 @@ # -*- coding: utf-8 -*- from __future__ import unicode_literals -import collections +from collections.abc import Iterable, Iterator import copy import datetime import decimal @@ -417,7 +417,7 @@ class Field(RegisterLookupMixin): for name, default in possibles.items(): value = getattr(self, attr_overrides.get(name, name)) # Unroll anything iterable for choices into a concrete list - if name == "choices" and isinstance(value, collections.Iterable): + if name == "choices" and isinstance(value, Iterable): value = list(value) # Do correct kind of comparison if name in equals_comparison: @@ -852,7 +852,7 @@ class Field(RegisterLookupMixin): return smart_text(self._get_val_from_obj(obj)) def _get_choices(self): - if isinstance(self._choices, collections.Iterator): + if isinstance(self._choices, Iterator): choices, self._choices = tee(self._choices) return choices else: --- a/django/db/models/sql/query.py 2023-01-11 14:07:45.900716653 +0100 +++ b/django/db/models/sql/query.py 2023-01-11 14:08:08.724366450 +0100 @@ -8,7 +8,8 @@ all about the internals of models in ord """ import copy import warnings -from collections import Iterator, Mapping, OrderedDict +from collections import OrderedDict +from collections.abc import Iterator, Mapping from itertools import chain, count, product from string import ascii_uppercase --- a/django/db/models/sql/where.py 2023-01-11 14:13:01.595889201 +0100 +++ b/django/db/models/sql/where.py 2023-01-11 14:14:25.322613605 +0100 @@ -2,7 +2,7 @@ Code to manage the creation and SQL rendering of 'where' constraints. """ -import collections +from collections.abc import Iterator import datetime import warnings from itertools import repeat @@ -59,7 +59,7 @@ class WhereNode(tree.Node): if not isinstance(data, (list, tuple)): return data obj, lookup_type, value = data - if isinstance(value, collections.Iterator): + if isinstance(value, Iterator): # Consume any generators immediately, so that we can determine # emptiness and transform any non-empty values correctly. value = list(value)
And to avoid the following warnings, the Django included six can be patched.
<frozen importlib._bootstrap>:914: ImportWarning: _SixMetaPathImporter.find_spec() not found; falling back to find_module() <frozen importlib._bootstrap>:671: ImportWarning: _SixMetaPathImporter.exec_module() not found; falling back to load_module()
These changes were taken from six 1.16:
--- a/django/utils/six.py 2023-01-17 11:08:00.267645405 +0100 +++ b/django/utils/six.py 2023-01-17 11:12:13.993813451 +0100 @@ -71,6 +71,7 @@ else: MAXSIZE = int((1 << 63) - 1) del X +from importlib.util import spec_from_loader def _add_doc(func, doc): """Add documentation to a function.""" @@ -186,6 +187,11 @@ class _SixMetaPathImporter(object): return self return None + def find_spec(self, fullname, path, target=None): + if fullname in self.known_modules: + return spec_from_loader(fullname, self) + return None + def __get_module(self, fullname): try: return self.known_modules[fullname] @@ -223,6 +229,12 @@ class _SixMetaPathImporter(object): return None get_source = get_code # same as get_code + def create_module(self, spec): + return self.load_module(spec.name) + + def exec_module(self, module): + pass + _importer = _SixMetaPathImporter(__name__) @@ -679,11 +691,15 @@ if PY3: exec_ = getattr(moves.builtins, "exec") def reraise(tp, value, tb=None): - if value is None: - value = tp() - if value.__traceback__ is not tb: - raise value.with_traceback(tb) - raise value + try: + if value is None: + value = tp() + if value.__traceback__ is not tb: + raise value.with_traceback(tb) + raise value + finally: + value = None + tb = None else: def exec_(_code_, _globs_=None, _locs_=None): @@ -699,19 +715,19 @@ else: exec("""exec _code_ in _globs_, _locs_""") exec_("""def reraise(tp, value, tb=None): - raise tp, value, tb + try: + raise tp, value, tb + finally: + tb = None """) -if sys.version_info[:2] == (3, 2): - exec_("""def raise_from(value, from_value): - if from_value is None: - raise value - raise value from from_value -""") -elif sys.version_info[:2] > (3, 2): +if sys.version_info[:2] > (3,): exec_("""def raise_from(value, from_value): - raise value from from_value + try: + raise value from from_value + finally: + value = None """) else: def raise_from(value, from_value): @@ -788,11 +804,10 @@ _add_doc(reraise, """Reraise an exceptio if sys.version_info[0:2] < (3, 4): def wraps(wrapped, assigned=functools.WRAPPER_ASSIGNMENTS, updated=functools.WRAPPER_UPDATES): - def wrapper(f): - f = functools.wraps(wrapped, assigned, updated)(f) - f.__wrapped__ = wrapped - return f - return wrapper + return functools.partial(_update_wrapper, wrapped=wrapped, + assigned=assigned, updated=updated) + wraps.__doc__ = functools.wraps.__doc__ + else: wraps = functools.wraps @@ -802,10 +817,22 @@ def with_metaclass(meta, *bases): # This requires a bit of explanation: the basic idea is to make a dummy # metaclass for one level of class instantiation that replaces itself with # the actual metaclass. - class metaclass(meta): + class metaclass(type): def __new__(cls, name, this_bases, d): - return meta(name, bases, d) + if sys.version_info[:2] >= (3, 7): + # This version introduced PEP 560 that requires a bit + # of extra care (we mimic what is done by __build_class__). + resolved_bases = types.resolve_bases(bases) + if resolved_bases is not bases: + d['__orig_bases__'] = bases + else: + resolved_bases = bases + return meta(name, resolved_bases, d) + + @classmethod + def __prepare__(cls, name, this_bases): + return meta.__prepare__(name, bases) return type.__new__(metaclass, 'temporary_class', (), {}) @@ -821,6 +848,8 @@ def add_metaclass(metaclass): orig_vars.pop(slots_var) orig_vars.pop('__dict__', None) orig_vars.pop('__weakref__', None) + if hasattr(cls, '__qualname__'): + orig_vars['__qualname__'] = cls.__qualname__ return metaclass(cls.__name__, cls.__bases__, orig_vars) return wrapper
Patching is done using patch -p1
— you should know how.
2023-01-10 - sysctl / modules / load order / nf_conntrack
Recently we ran into an issue where connections were unexpectedly aborted. Connections from a NAT-ed client (a K8S pod) to a server would suddently get an old packet (according to the sequence number) in the middle of the data. This triggered the Linux NAT-box to issue a reset packet (RST). Setting the kernel flag to mitigate this behaviour required some knowledge of module load order during boot.
Spurious retransmits causing connection teardown
To start off: we observed that traffic from a pod to a server got disconnected.
We enabled debug logging on the Kubernetes host where the pod resides.
After enabling modprobe nf_log_ipv4
and
net.netfilter.nf_conntrack_log_invalid=255
, we saw this:
kernel: nf_ct_proto_6: SEQ is under the lower bound (already ACKed data retransmitted) IN= OUT= SRC=10.x.x.x DST=10.x.x.x LEN=1480 TOS=0x00 PREC=0x00 TTL=61 ID=53534 DF PROTO=TCP SPT=6379 DPT=26110 SEQ=4213094653 ACK=3402842193 WINDOW=509 RES=0x00 ACK PSH URGP=0 OPT (0101080A084C76F30D12DCAA)
In the middle of a sequence of several packets of data from the
server, an apparently unrelated packet — it had data, but not
intended for this stream — but with the same
source/destination tuples and yet a sequence number that was more than 80K too low.
(Wireshark flags this packet as invalid with a TCP
Spurious Retransmission
message.)
This triggered a reset (RST) by the Linux connection tracking module. And that in turned caused (unexpected) RSTs from the server.
POD <-> NAT <-> SRV -------------------- <-- TCP seq 2000000 ack 5555 len 1400 <-- TCP seq 2000000 ack 5555 len 1400 <-- TCP seq 1200000 ack 5555 len 1234 (seq is _way_ off) --> TCP RST seq 5555 len 0 <-- TCP seq 2001400 ack 5555 len 1000 <-- TCP seq 2001400 ack 5555 len 1000
(Made up numbers in the above table, but they illustrate the problem.)
At this point, the non-rejected traffic still got forwarded back to the pod. Its ACKs back to the server were now however rejected by the server with an RST of its own — that end of the connection thinks it was tore down already after all.
kernel: nf_ct_proto_6: invalid rst IN= OUT= SRC=10.x.x.x DST=10.x.x.x LEN=40 TOS=0x00 PREC=0x00 TTL=61 ID=0 DF PROTO=TCP SPT=6379 DPT=26110 SEQ=4213164625 ACK=0 WINDOW=0 RES=0x00 RST URGP=0
The next packet (sequence 2001400 in the above example), was fine though. So if we could convince the Linux kernel to ignore the packet with the unexpected sequence number, our connections might survive.
Luckily there is such a flag: net.netfilter.nf_conntrack_tcp_be_liberal=1
While this does not explain the root cause, setting said flag mitigates the problem. It makes the the kernel ignore all spurious retransmits.
So, we placed net.netfilter.nf_conntrack_tcp_be_liberal =
1
in /etc/sysctl.conf
and assumed the symptoms would be gone.
... or so we thought. Because after a reboot, the flag was unset again.
sysctl.conf not picked up?
That's odd. Setting kernel parameters during boot should be done
in sysctl.conf
(or sysctl.d
). Why did it not get picked up?
The cause turned out to be this: this particular setting is not
built-in. It belongs to a module; the nf_conntrack
module.
And that module is not necessarily loaded before sysctl
settings are applied.
nf_conntrack
was loaded on demand, and not in a
particular well-defined order. Luckily, loading modules through
/etc/modules-load.d
is well defined, as you can see:
# systemd-analyze critical-chain systemd-sysctl.service The time when unit became active or started is printed after the "@" character. The time the unit took to start is printed after the "+" character. systemd-sysctl.service +44ms └─systemd-modules-load.service @314ms +90ms └─systemd-journald.socket @242ms └─system.slice @220ms └─-.slice @220ms
Indeed, it sysctl settings are applied after
systemd-modules-load.service
:
# systemctl show systemd-sysctl.service | grep ^After After=systemd-journald.socket system.slice systemd-modules-load.service
So, we can use the systemd-modules-load.service to ensure that the conntrack module is loaded before we attempt to set its parameters:
# cat /etc/modules-load.d/modules.conf # /etc/modules: kernel modules to load at boot time. # # This file contains the names of kernel modules that should be loaded # at boot time, one per line. Lines beginning with "#" are ignored. nf_conntrack
And that works. Now all settings are properly set during boot.
As for the spurious retransmissions: the last word has not yet been said on that...