on
Template Haskell and Cross Compilation
Over the last two weeks we have seen how to build a Haskell cross compiler for the Raspberry Pi. We have also seen a high level overview of Template Haskell. Today we will look at what Template Haskell means in the context of cross compilation.
As pointed out yesterday, Template Haskell requires that GHC is able to run the object code for the Template Haskell function. This in turn requires GHC to be able to load this code into memory and make it executable. Therein leis the first problem; our compiler is a cross compiler and produces code to run on the host, say Raspberry Pi, and not on the machine we build it on. Thus we can not natively load and run the Template Haskell function on the same machine that GHC runs on, because our host is different from our build machine.
Existing Solutions
In The Haskell Cabal and Cross Compilation post, I’ve mentioned evil splicer and ZeroTH. Both use a strategy where the Template Haskell splices are extracted from the source, compiled on the build machine, and the results are then spliced back into their places. This however implies that the splices are evaluated on the build machine with libraries build for the build machine. Subtle differences between build and host can potentially lead to incorrect results (e.g. 32 vs 64bit).
GHCJS is a Haskell to JavaScript compiler that uses the GHC API. As such it is also a cross compiler as its target is a JavaScript runtime, while GHCJS itself does not run on the JavaScript runtime. GHCJS however supports Template Haskell via its out of process Template Haskell solution for some time now. GHCJS does this by running a node.js server. GHCJS transfers the compiled Template Haskell code to the server process, links and evaluates it there and ship the result back to the GHCJS process.
Since GHC 8.0.1, GHC has The External Interpreter, which provides
support for a very similar mechanism. If GHC is provided with the
-fexternal-interpreter
flag, it will evaluate interpreted code in a
separate process.
If we tried a similar approach to GHCJS with our Raspberry Pi (and we will tomorrow), we would run into an issue that is transparent to GHCJS, but not in our case. GHCJS runs the node.js server on the same machine. This is important, because it means that the node.js process sees the same file system that GHCJS sees. It also can shell out to any process GHCJS could shell out to, because the run on the same machine.
Therefore, if we transfer, link and run a Template Haskell function on
our host (here: Raspberry Pi), it will not see the same file system or
be able to shell out to the same processes that GHC could. The
Template Haskell function will see the hosts and not the build
machines environment. This is contrary to what many packages that use
Template Haskell assume; they assume to operate in the same
environment, can access the same file system, and call the same
commands the GHC process can. For example the gitrev package, which
allows among others to embed the git repository hash into the program,
natively assumes that it can operate in the same environment in which
GHC is operating; the one that has git
available and contains the
module that is being compiled, and into which the git hash is supposed
to be embedded.
File and Process IO Example
An example where both file IO, and process IO might be used would be
an application that wants to reproduce the licenses of the libraries
it uses, and provide a version with a git hash to identify the exact
commit from which it was built. While we could use a separate file
containing the licenses, we opt to actually include the licenses in
the binary. We’ll assume that our Main.hs
is in a git repository and
our licenses are aggregated in a LICENSES
file.
{-# LANGUAGE TemplateHaskell, LambdaCase #-}
module Main where
import Development.GitRev (gitHash)
import Data.FileEmbed (embedStringFile)
import System.Environment (getProgName, getArgs)
import System.Exit (die)
main :: IO ()
main = getArgs >>= \case
["--licenses"] -> putStrLn $(embedStringFile "LICENSES")
["--version"] -> putStrLn ("Version " ++ $gitHash)
_otherwise -> do
prog <- getProgName
die $ "usage: " ++ prog ++ " [--licenses] [--version]"
While this is clearly some rather contrived code, this rather simple,
innocent looking snipped can not trivially be cross compiled. When GHC
encounters the $(embedStringFile "LICENSES")
splice, it will build a
bytecode object (BCO), that calls embedStringFile
with
"LICENSES"
. Thus loading the file-embed
library, and evaluating the
BCO. The BCO requests a handle on the embedStringFile
function, and
invokes it with "LICENSES"
. While this could execute perfectly fine
on the host, the function runIO
is opaque to GHC, and GHC can’t tell
if the IO
action will read/write files or call a program. GHC would
simply try to read the LICENSES
file on the host. This was not our
intention! The same holds for the $gitHash
splice, from which we
expect to run git
on the build machine and obtain the git hash of the
source directory.
Summary
We have now seen that Template Haskell poses some interesting problems for cross compilation that require GHC to be able to load, link, and evaluate object code on the host. Due to the environmental differences (file system, available programs, …) the inability to execute Template Haskell functions on the build machine makes file and process IO nonsensical.