on
The tail of an off-by-one error in GHC's linker
This is the story of an odd off-by-one error in GHC’s internal static linker for the Mach-O file format on AArch64. This is our beloved macOS/iOS platform!
For a while we have been observing odd non-deterministic failures in
CI. We’d see a bunch of test for the aarch64-darwin runner fail, but
rarely the same, and if re-run different tests would fail. The only
observable correlation was that the failing tests all somehow had to
do with iserv
, ghc’s external interpreter. This did hint towards an
issue in the linker, but no clear indication what was really the
fundamental problem.
While Matthew Pickering and I eventually managed to isolate tests from the test-suite that had slightly higher failure rates (e.g. they’d fail every other or so time), getting the failure into the debugger didn’t help much. We’d just have a crash in invalid memory. SIGBUS. The program counter would have advanced too far to be useful, we didn’t know where we came from. This is especially the case for code GHC generates for Haskell. We don’t really “return”, but have knowledge of where we’d eventually want to continue, thus assembly generated by GHC makes lots of use of (un)conditional branch instructions, but rarely branch and link instructions that would record the caller in the link register; and permit to return.
We don’t really have rr
yet on aarch64-darwin to my knowledge, nor
does lldb
support something akin to gdb
’s record feature. And of
course we don’t have gcc/gdb
on aarch64-darwin just yet.
So what we really want is to be able to record where we come from, but
we also do not want to mess with the link register, otherwise we’d
likely break unrelated code. Let’s go swimming. While meditating on
this issue for a bit, I did remember that we reserved a register r16
for spill/reload offset computations that were too large to
handle. (AArch64 can’t do arbitrary offsets as immediate, so we
occasionally need to add two registers together to obtain the offset
value we want). We only care about the registers value for a branch
instruction, thus it’s outside of any spill/reload computations.
Looks like we can use r16
. After modifying the aarch64 codegen to
write the current program location into r16
prior to any branch
instruction via ADR x16 .
, we should now be able to inspect r16
when
we hit SIGBUS in the debugger again. After a rebuild, and a few tries
to get the crash into the debugger, we are presented with the
following debug session:
(lldb) p/x $x16
(unsigned long) $11 = 0x000000010ff99d54
(lldb) p/x $pc
(unsigned long) $12 = 0x0000000108384658
(lldb) dis -s $x16-0x20 -c 20
0x10ff99d34: cmp x18, x28
0x10ff99d38: adr x16, #0x0
0x10ff99d3c: b.lo 0x10ff99dbc
0x10ff99d40: adrp x18, 0
0x10ff99d44: add x18, x18, #0xde0 ; =0xde0
0x10ff99d48: stur x18, [x20, #-0x10]
0x10ff99d4c: stur x24, [x20, #-0x8]
0x10ff99d50: sub x20, x20, #0x10 ; =0x10
0x10ff99d54: adr x16, #0x0
0x10ff99d58: b 0x108384658
0x10ff99d5c: mov w18, #0x10
0x10ff99d60: str x18, [x19, #0x388]
0x10ff99d64: adr x16, #0x0
0x10ff99d68: b 0x10ff9e57c
0x10ff99d6c: udf #0x1
0x10ff99d70: udf #0x0
0x10ff99d74: udf #0x1e
0x10ff99d78: udf #0x0
0x10ff99d7c: add x21, x21, #0x10 ; =0x10
0x10ff99d80: ldr x18, [x19, #0x358]
We can see we ended up in 0x108384658
, which is where the
unconditional branch instruction in 0x10ff99d58
, which is following
our adr x16, 0x0
instruction as expected.
I’ve ran the experiment a few more times to make sure all the crashes looked sufficiently similar and we’d unlikely have to deal with a variety of odd crashes that just happened to present themselves quite similar.
Now we have an address, but we don’t know where this actually came
from. GHC’s Run Time System has a function called printLoadedObjects
,
which when invoked will print a table of all object files the internal
linker loaded into memory. (Again I couldn’t get lldb
to execute that
function, so I ended up just printing all loaded objects after each
symbol resolution pass…)
The output of printLoadedObjects
looks like this (just a lot more)
_build/stage1/lib/aarch64-osx-ghc-9.3.20210621/template-haskell-2.18.0.0/libHStemplate-haskell-2.18.0.0.a(Internal.o)
sec 0[alloc: 2; kind: 0]: 0x10ff70000 - 0x10ff9e538; mmaped: 0x10ff70000 - 0x10ffa8000
sec 1[alloc: 2; kind: 1]: 0x10efe0000 - 0x10efe1098; mmaped: 0x10efe0000 - 0x10efe4000
sec 2[alloc: 2; kind: 4]: 0x10efe4000 - 0x10efe40b6; mmaped: 0x10efe4000 - 0x10efe8000
We know we called from 0x10ff99d58
, which is in the first (section 0),
of the Internal.o
file from template-haskell
. Disassembling
Internal.o
to verify we found the right location yields the following:
_templatezmhaskell_LanguageziHaskellziTHziLibziInternal_litE_info:
0000000000029d30 sub x18, x20, #0x10
0000000000029d34 cmp x18, x28 ; Latency: 2
0000000000029d38 adr x16, #0x0
0000000000029d3c b.lo 0x29dbc
0000000000029d40 adrp x18, 0 ; 0x29000
0000000000029d44 add x18, x18, #0x0
0000000000029d48 stur x18, [x20, #-0x10] ; Latency: 4
0000000000029d4c stur x24, [x20, #-0x8] ; Latency: 4
0000000000029d50 sub x20, x20, #0x10
0000000000029d54 adr x16, #0x0
0000000000029d58 b 0x29d58
0000000000029d5c mov w18, #0x10
0000000000029d60 str x18, [x19, #0x388] ; Latency: 4
0000000000029d64 adr x16, #0x0
0000000000029d68 b 0x29d68
0000000000029d6c udf #0x1
0000000000029d70 udf #0x0
0000000000029d74 udf #0x1e
0000000000029d78 udf #0x0
This looks rather promising and is exactly where we had our crash. So what happened?
Interlude: Relocations
Relocations are records in object files that the linker can use to resolve symbols. This is important as we do not know where the final symbol might end up in memory, and thus need a way to reference a symbol for which we do not know its resting place. We therefore record symbolic pointers. The linkers job is then to resolve those pointers. The pointer contains a location where we need to patch up an address, the name of the symbol that we want to point to, and one of a few pre-defined relocation types. The linker has to iterate over all relocation after it loaded an object, for each object it loads.
We can ask otool
for relocations, and providing -v
we’ll get some
easier to read nameing as well.
address pcrel length extern type scattered symbolnum/value
00029d58 True long True BR26 False _stg_gc_unpt_r1
So for 0x29d58
(relative to the resting place of the object file in
memory), there is a BR26 (Branch 26) relocation, that points to the
Extern symbol _stg_gc_unpt_r1
. And this is what we’d end up patching
up in the object file. This somehow goes wrong.
Let’s look at the broken relocation again: We jump to 0x108384658
from
0x10ff99d58
, so the relative value in the branch instruction is
0x108384658 - 0x10ff99d58 = 0xf83EA900
. Note the leading 0xf
?, yes
that’s a negative value. Let’s look at it in binary and we see
1111 1000 0011 1110 1010 1001 0000 0000
| | | | |
31 23 15 7 0
The BR26 relocation can encode 26 bits of relocation information
(e.g. encode them right in the Branch instruction). However due to
guaranteed alignment on 4 bytes, we do not need to encode the last
two bits as they will always be zero. This gives us a range
of 28. Which however needs to include the sign bit. So we can really
only encode values in the range of +-2^27
. For values outside of this
range we’ll create a forward jump to the final destination, that is
reachable. At this point we might already have a good idea why this
failed. And yes, the linker did check for +-2^28
, and thus when we
just happened to hit a positive value with the 28th bit set, we’d
write it into the relocation, and call it a day. Just for the CPU to
read it as a negative jump and jump to a random address.
Thus the final fix was just an additional -1
, to the range check for
relocations of the BR26
type. And thus we can conclude the mystery of
the random crashes on aarch64-darwin (macOS), and the story of an odd
off-by-one error in GHC’s in-memory linker.
On a final note: if you like this write up and want to support this work as well as GHC’s Continuous Integration, and you also happen hold Cardanos Crypto Currency ADA, I’d like to invite you to stake your ADA with the ZW3RK pool. It is a very competitive 1% pool, and operational rewards will go towards operating part of GHC’s CI infrastructure and write ups like these.