Template Haskell and Cross Compilation

Over the last two weeks we have seen how to build a Haskell cross compiler for the Raspberry Pi. We have also seen a high level overview of Template Haskell. Today we will look at what Template Haskell means in the context of cross compilation.

As pointed out yesterday, Template Haskell requires that GHC is able to run the object code for the Template Haskell function. This in turn requires GHC to be able to load this code into memory and make it executable. Therein leis the first problem; our compiler is a cross compiler and produces code to run on the host, say Raspberry Pi, and not on the machine we build it on. Thus we can not natively load and run the Template Haskell function on the same machine that GHC runs on, because our host is different from our build machine.

Existing Solutions

In The Haskell Cabal and Cross Compilation post, I’ve mentioned evil splicer and ZeroTH. Both use a strategy where the Template Haskell splices are extracted from the source, compiled on the build machine, and the results are then spliced back into their places. This however implies that the splices are evaluated on the build machine with libraries build for the build machine. Subtle differences between build and host can potentially lead to incorrect results (e.g. 32 vs 64bit).

GHCJS is a Haskell to JavaScript compiler that uses the GHC API. As such it is also a cross compiler as its target is a JavaScript runtime, while GHCJS itself does not run on the JavaScript runtime. GHCJS however supports Template Haskell via its out of process Template Haskell solution for some time now. GHCJS does this by running a node.js server. GHCJS transfers the compiled Template Haskell code to the server process, links and evaluates it there and ship the result back to the GHCJS process.

Since GHC 8.0.1, GHC has The External Interpreter, which provides support for a very similar mechanism. If GHC is provided with the -fexternal-interpreter flag, it will evaluate interpreted code in a separate process.

If we tried a similar approach to GHCJS with our Raspberry Pi (and we will tomorrow), we would run into an issue that is transparent to GHCJS, but not in our case. GHCJS runs the node.js server on the same machine. This is important, because it means that the node.js process sees the same file system that GHCJS sees. It also can shell out to any process GHCJS could shell out to, because the run on the same machine.

Therefore, if we transfer, link and run a Template Haskell function on our host (here: Raspberry Pi), it will not see the same file system or be able to shell out to the same processes that GHC could. The Template Haskell function will see the hosts and not the build machines environment. This is contrary to what many packages that use Template Haskell assume; they assume to operate in the same environment, can access the same file system, and call the same commands the GHC process can. For example the gitrev package, which allows among others to embed the git repository hash into the program, natively assumes that it can operate in the same environment in which GHC is operating; the one that has git available and contains the module that is being compiled, and into which the git hash is supposed to be embedded.

File and Process IO Example

An example where both file IO, and process IO might be used would be an application that wants to reproduce the licenses of the libraries it uses, and provide a version with a git hash to identify the exact commit from which it was built. While we could use a separate file containing the licenses, we opt to actually include the licenses in the binary. We’ll assume that our Main.hs is in a git repository and our licenses are aggregated in a LICENSES file.

{-# LANGUAGE TemplateHaskell, LambdaCase #-}
module Main where
import Development.GitRev (gitHash)
import Data.FileEmbed     (embedStringFile)
import System.Environment (getProgName, getArgs)
import System.Exit        (die)
main :: IO ()
main = getArgs >>= \case
  ["--licenses"] -> putStrLn $(embedStringFile "LICENSES")
  ["--version"]  -> putStrLn ("Version " ++ $gitHash)
  _otherwise     -> do
    prog <- getProgName
    die $ "usage: " ++ prog ++ " [--licenses] [--version]"

While this is clearly some rather contrived code, this rather simple, innocent looking snipped can not trivially be cross compiled. When GHC encounters the $(embedStringFile "LICENSES") splice, it will build a bytecode object (BCO), that calls embedStringFile with "LICENSES". Thus loading the file-embed library, and evaluating the BCO. The BCO requests a handle on the embedStringFile function, and invokes it with "LICENSES". While this could execute perfectly fine on the host, the function runIO is opaque to GHC, and GHC can’t tell if the IO action will read/write files or call a program. GHC would simply try to read the LICENSES file on the host. This was not our intention! The same holds for the $gitHash splice, from which we expect to run git on the build machine and obtain the git hash of the source directory.

Summary

We have now seen that Template Haskell poses some interesting problems for cross compilation that require GHC to be able to load, link, and evaluate object code on the host. Due to the environmental differences (file system, available programs, …) the inability to execute Template Haskell functions on the build machine makes file and process IO nonsensical.