04 Feb 2019
In my last post I first introduced hnock, a little interpreter
for Nock, and then demonstrated it on a hand-rolled decrement function.
In this post I’ll look at how one can handle the same (contrived, but
illustrative) task in Hoon.
Hoon is the higher- or application-level programming language for working with
Arvo, the operating system of Urbit. The best way I can
describe it is something like “Haskell meets C meets J meets the environment is
always explicit.”
As a typed, functional language, Hoon feels surprisingly low-level. One is
never allocating or deallocating memory explicitly when programming in Hoon,
but the experience somehow feels similar to working in C. The idea is that the
language should be simple and straightforward and support a fairly limited
level of abstraction. There are the usual low-level functional idioms (map,
reduce, etc.), as well as a structural type system to keep the programmer
honest, but at its core, Hoon is something of a functional Go (a language
which, I happen to think, is not good).
It’s not a complex language, like Scala or Rust, nor a language that overtly
supports sky-high abstraction, like Haskell or Idris. Hoon is supposed to
exist at a sweet spot for getting work done. And I am at least willing to buy
the argument that it is pretty good for getting work done in Urbit.
Recall our naïve decrement function in Haskell. It looked like this:
dec :: Integer -> Integer
dec m =
let loop n
| succ n == m = n
| otherwise = loop (succ n)
in loop 0
Let’s look at a number of ways to write this in Hoon, showing off some of the
most important Hoon programming concepts in the process.
Cores
Here’s a Hoon version of decrement. Note that to the uninitiated, Hoon looks
gnarly:
|= m=@
=/ n=@ 0
=/ loop
|%
++ recur
?: =(+(n) m)
n
recur(n +(n))
--
recur:loop
We can read it as follows:
- Define a function that takes an argument, ‘m’, having type atom (recall
that an atom is an unsigned integer).
- Define a local variable called ‘n’, having type atom and value 0, and add it
to the environment (or, if you recall our Nock terminology, to the
subject).
- Define a local variable called ‘loop’, with precise definition to follow, and
add it to the environment.
- ‘loop’ is a core, i.e. more or less a named collection of functions.
Define one such function (or arm), ‘recur’, that checks to see if the
increment of ‘n’ is equal to ‘m’, returning ‘n’ if so, and calling itself,
except with the value of ‘n’ in the environment changed to ‘n + 1’, if not.
- Evaluate ‘recur’ as defined in ‘loop’.
(To test this, you can enter the Hoon line-by-line into the Arvo dojo.
Just preface it with something like =core-dec
to give it a name, and call it
via e.g. (core-dec 20)
.)
Hoon may appear to be a write-only language, though I’ve found this to not
necessarily be the case (just to note, at present I’ve read more Hoon code than
I’ve written). Good Hoon has a terse and very vertical style. The principle
that keeps it readable is that, roughly, each line should contain one important
logical operation. These operations are denoted by runes, the =/
and ?:
and similar ASCII digraphs sprinkled along the left hand columns of the above
example. This makes it look similar to e.g. J – a language I have
long loved, but never mastered – although in J the rough ‘one operator per
line’ convention is not typically in play.
In addition to the standard digraph runes, there is also a healthy dose of
‘irregular’ syntax in most Hoon code for simple operations that one uses
frequently. Examples used above include =(a b)
for equality testing, +(n)
for incrementing an atom, and foo(a b)
for evaluating ‘foo’ with the value of
‘a’ in the environment changed to ‘b’. Each of these could be replaced with a
more standard rune-based expression, though for such operations the extra
verbosity is not usually warranted.
Cores like ‘loop’ seem, to me, to be the mainstay workhorse of Hoon
programming. A core is more or less a structure, or object, or dictionary, or
whatever, of functions. One defines them liberally, constructs a subject (i.e.
environment) to suit, and then evaluates them, or some part of them, against
the subject.
To be more precise, a core is a Nock expression; like every non-atomic value in
Nock, it is a tree. Starting from the cell [l r]
, the left subtree, ‘l’, is
a tree of Nock formulas (i.e. the functions, like ‘recur’, defined in the
core). The right subtree, ‘r’ is all the data required to evaluate those Nock
formulas. The traditional name for the left subtree, ‘l’, is the battery of
the core; the traditional name for the right subtree is the payload.
One is always building up a local environment in Hoon and then evaluating some
value against it. Aside from the arm ‘recur’, the core ‘loop’ also contains in
its payload the values ‘m’ and ‘n’. The expression ‘recur:loop’ – irregular
syntax for =< recur loop
– means “use ‘loop’ as the environment and
evaluate ‘recur’.” Et voilà, that’s how we get our decrement.
You’ll note that this should feel very similar to the way we defined decrement
in Nock. Our hand-assembled Nock code, slightly cleaned up, looked like this:
[8
[1 0]
8
[1
6
[5 [4 0 6] [0 7]]
[0 6]
2 [[0 2] [4 0 6] [0 7]] [0 2]
]
2 [0 1] [0 2]
]
This formula, when evaluated against an atom subject, creates another subject
from it, defining a ‘loop’ analogue that looks in specific addresses in the
subject for itself, as well as the ‘m’ and ‘n’ variables, such that it produces
the decrement of the original subject. Our Hoon code does much the same –
every ‘top-level’ rune expression adds something to the subject, until we get
to the final expression, ‘recur:loop’, which evaluates ‘recur’ against the
subject, ‘loop’.
The advantage of Hoon, in comparison to Nock, is that we can work with names,
instead of raw tree addresses, as well as with higher-level abstractions like
cores. The difference between Hoon and Nock really is like the difference
between C and assembly!
For what it’s worth, here is the compiled Nock corresponding to our above
decrement function:
[8
[1 0]
8
[8
[1
6
[5 [4 0 6] 0 30]
[0 6]
9 2 10 [6 4 0 6] 0 1
]
0 1
]
7 [0 2] 9 2 0 1
]
It’s similar, though not identical, to our hand-rolled Nock. In particular,
you can see that it is adding a constant conditional formula, including the
familiar equality check, to the subject (note that the equality check, using
Nock-5, refers to address 30 instead of 7 – presumably this is because I have
more junk floating around in my dojo subject). Additionally, the formulas
using Nock-9 and Nock-10 reduce to Nock-2 and Nock-0, just like our hand-rolled
code does.
But our Hoon is doing more than the bespoke Nock version did, so we’re not
getting quite the same code. Worth noting is the ‘extra’ use of Nock-8, which
is presumably required because I’ve defined both ‘recur’, the looping function,
and ‘loop’, the core to hold it, and the hand-rolled Nock obviously didn’t
involve a core.
Doors
Here’s another way to write decrement, using another fundamental Hoon
construct, the door:
|= m=@
=/ loop
|_ n=@
++ recur
?: =(+(n) m)
n
~(recur ..recur +(n))
--
~(recur loop 0)
A door is a core that takes an argument. Here we’ve used the |_
rune,
instead than |%
, to define ‘loop’, and note that it takes ‘n’ as an argument.
So instead of ‘n’ being defined external to the core, as it was in the previous
example, here we have to specify it explicitly when we call ‘recur’. Note that
this is more similar to our Haskell example, in which ‘loop’ was defined as a
function taking ‘n’ as an argument.
The two other novel things here are the ~(recur ..recur +(n))
and ~(recur
loop 0)
expressions, which actually turn out to be mostly the same thing. The
syntax:
is irregular, and means “evaluate ‘arm’ in ‘door’ using ‘argument’”. So in the
last line, ~(recur loop 0)
means “evaluate ‘recur’ in ‘loop’ with n set to 0.”
In the definition of ‘recur’, on the other hand, we need to refer to the door
that contains it, but are in the very process of defining that thing. The
‘..recur’ syntax means “the door that contains ‘recur’,” and is useful for
exactly this task, given we can’t yet refer to ‘loop’. The syntax ~(recur
..recur +(n))
means “evaluate ‘recur’ in its parent door with n set to n + 1.”
Let’s check the compiled Nock of this version:
[8
[8
[1 0]
[1
6
[5 [4 0 6] 0 30]
[0 6]
8
[0 1]
9 2 10 [6 7 [0 3] 4 0 6] 0 2
]
0 1
]
8
[0 2]
9 2 10 [6 7 [0 3] 1 0] 0 2
]
There’s even more going on here than in our core-implemented decrement, but
doors are a generalisation of cores, so that’s to be expected.
Hoon has special support, though, for one-armed doors. This is precisely how
functions (also called gates or traps, depending on the context) are
implemented in Hoon. The following is probably the most idiomatic version of
naïve decrement:
|= m=@
=/ n 0
|-
?: =(+(n) m)
n
$(n +(n))
The |=
rune that we’ve been using throughout these examples really defines a
door, taking the specified argument, with a single arm called ‘$’. The |-
rune here does the same, except it immediately calls the ‘$’ arm after defining
it. The last line, $(n +(n))
, is analogous to the recur(n +(n))
line in
our first example: it evaluates the ‘$’ arm, except changing the value of ‘n’
to ‘n + 1’ in the environment.
(Note that there are two ‘$’ arms defined in the above code – one via the use
of |=
, and one via the use of |-
. But there is no confusion as to which
one we mean, since the latter has been the latest to be added to the subject.
Additions to the subject are always prepended in Hoon – i.e. they are
placed at address 2. As the topmost ‘$’ in the subject is the one that
corresponds to |-
, it is resolved first.)
The compiled Nock for this version looks like the following:
[8
[1 0]
8
[1
6
[5 [4 0 6] 0 30]
[0 6]
9 2 10 [6 4 0 6] 0 1
]
9 2 0 1
]
And it is possible (see the appendix) to show that, modulo some different
addressing, this reduces exactly to our hand-rolled Nock code.
UPDATE: my colleague Ted Blackman, an actual Hoon programmer, recommended
the following as a slightly more idiomatic version of naïve decrement:
=| n=@
|= m=@
^- @
?: =(+(n) m)
n
$(n +(n))
Note that here we’re declaring ‘n’ outside of the gate itself by using another
rune, =|
, that gives the variable a default value based on its type (an
atom’s default value is 0). There’s also an explicit type cast via ^- @
,
indicating that the gate produces an atom (like type signatures in Haskell, it
is considered good practice to include these, even though they may not strictly
be required).
Declaring ‘n’ outside the gate is interesting. It has an imperative feel,
as if one were writing the code in Python, or were using a monad like ‘State’
or a ‘PrimMonad’ in Haskell. Like in the Haskell case, we aren’t actually
doing any mutation here, of course – we’re creating new subjects to evaluate
each iteration of our Nock formula against. And the resulting Nock is very
succinct:
[6
[5 [4 0 14] 0 6]
[0 14]
9 2 10 [14 4 0 14] 0 1
]
Basic Generators
If you tested the above examples, I instructed you to do so by typing them into
Arvo’s dojo. I’ve come to believe that, in general, this is a poor way to
teach Hoon. It shouldn’t be done for all but the most introductory examples
(such as the ones I’ve provided here).
If you’ve learned Haskell, you are familiar with the REPL provided by GHCi, the
Glasgow Haskell Compiler’s interpreter. Code running in GHCi is implicitly
running in the IO monad, and I think this leads to confusion amongst newcomers
who must then mentally separate “Haskell in GHC” from “Haskell in GHCi.”
I think there is a similar problem in Hoon. Expressions entered into the dojo
implicitly grow or shrink or otherwise manipulate the dojo’s subject, which is
not, in general, available to standalone Hoon programs. Such standalone Hoon
programs are called generators. In general, they’re what you will use when
working in Hoon and Arvo.
There are four kinds of generators: naked, %say
, %ask
, and %get
. In this
post we’ll just look at the first two; the last couple are out of scope, for
now.
Naked Generators
The simplest kind of generator is the ‘naked’ generator, which just exists in
a file somewhere in your Urbit’s “desk.” If you save the following as
naive-decrement.hoon
in an Urbit’s home/gen
directory, for example:
|= m=@
=/ n 0
|-
?: =(+(n) m)
n
$(n +(n))
Then you’ll be able to run it in a dojo via:
~zod:dojo> +naive-decrement 20
19
A naked generator can only be a simple function (technically, a gate) that
produces a noun. It has no access to any external environment – it’s
basically just a self-contained function in a file. It must have an argument,
and it must have only one argument; to pass multiple values to a naked
generator, one must use a cell.
Say Generators
Hoon is a purely functional language, but, unlike Haskell, it also has no IO
monad to demarcate I/O effects. Hoon programs do not produce effects on their
own at all – instead, they construct nouns that tell Arvo how to produce
some effect or other.
A %say
generator (where %say
is a symbol) produces a noun, but it can also
make use of provided environment data (e.g. date information, entropy, etc.).
The idea is that the generator has a specific structure that Arvo knows how to
handle, in order to supply it with the requisite information. Specifically,
%say
generators have the structure:
:- %say
|= [<environment data> <list of arguments> <list of optional arguments>]
:- %noun
<code>
I’ll avoid discussing what a list is in Hoon at the moment, and we won’t
actually use any environment data in any examples here. But if you dump the
following in home/gen/naive-decrement.hoon
, for example:
:- %say
|= [* [m=@ ~] ~]
:- %noun
=/ n 0
|-
?: =(+(n) m)
n
$(n +(n))
you can call it from the dojo via the mechanism as before:
~zod:dojo> +naive-decrement 20
19
The generator itself actually returns a particularly-structured noun; a cell
with the symbol %say
as its head, and a gate returning a pair of the symbol
%noun
and a noun as its tail. The %noun
symbol describes the data produced
by the generator. But note that this is not displayed when evaluating the
generator in the dojo – instead, we just get the noun itself, but this
behaviour is dojo-dependent.
I think one should almost get in the habit of writing %say
generators for
most Hoon code, even if a simple naked generator or throwaway dojo command
would do the trick. They are so important for getting things done in Hoon that
it helps to learn about & start using them sooner than later.
Fin
I’ve introduced Hoon and given a brief tour of what I think are some of the
most important tools for getting work done in the language. Cores, doors, and
gates will get you plenty far, and early exposure to generators, in the form of
the basic naked and %say
variants, will help you avoid the habit of
programming in the dojo, and get you writing more practically-structured Hoon
code from the get-go.
I haven’t had time in this post to describe Hoon’s type system, which is
another very important topic when it comes to getting work done in the
language. I’ll probably write one more to create a small trilogy of sorts –
stay tuned.
Appendix
Let’s demonstrate that the compiled Nock code from our door-implemented
decrement reduces to the same as our hand-rolled Nock, save different address
use. Recall that our compile Nock code was:
[8
[1 0]
8
[1
6
[5 [4 0 6] 0 30]
[0 6]
9 2 10 [6 4 0 6] 0 1
]
9 2 0 1
]
An easy reduction is from Nock-9 to Nock-2. Note that *[a 9 b c]
is the same
as *[*[a c] 2 [0 1] 0 b]
. When ‘c’ is [0 1]
, we have that *[a c] = a
,
such that *[a 9 b [0 1]]
is the same as *[a 2 [0 1] 0 b]
, i.e. that the
formula [9 b c]
is the same as the formula [2 [0 1] 0 b]
. We can thus
reduce the use of Nock-9 on the last line to:
[8
[1 0]
8
[1
6
[5 [4 0 6] 0 30]
[0 6]
9 2 10 [6 4 0 6] 0 1
]
2 [0 1] 0 2
]
The remaining formula involving Nock-9 evaluates [10 [6 4 0 6] 0 1]
against
the subject, and then evaluates [2 [0 1] [0 2]]
against the result. Note
that, for some subject ‘a’, we have:
*[a 10 [6 4 0 6] 0 1]
= #[6 *[a 4 0 6] *[a 0 1]]
= #[6 *[a 4 0 6] a]
= #[3 [*[a 4 0 6] /[7 a]] a]
= #[1 [/[2 a] [*[a 4 0 6] /[7 a]]] a]
= [/[2 a] [*[a 4 0 6] /[7 a]]]
= [*[a 0 2] [*[a 4 0 6] *[a 0 7]]]
= *[a [0 2] [4 0 6] [0 7]]
such that [10 [6 4 0 6] 0 1] = [[0 2] [4 0 6] [0 7]]
. And for
c = [[0 2] [4 0 6] [0 7]]
and some subject ‘a’, we have:
*[a 9 2 c]
= *[*[a c] 2 [0 1] 0 2]
and for b = [2 [0 1] 0 2]
:
*[*[a c] b]
= *[a 7 c b]
= *[a 7 [[0 2] [4 0 6] [0 7]] [2 [0 1] 0 2]]
such that:
[9 2 [0 2] [4 0 6] [0 7]] = [7 [[0 2] [4 0 6] [0 7]] [2 [0 1] 0 2]]
Now. Note that for any subject ‘a’ we have:
*[a 7 [[0 2] [4 0 6] [0 7]] [2 [0 1] 0 2]]
= *[a 7 [[0 2] [4 0 6] [0 7]] *[a 0 2]]
since *[a 2 [0 1] 0 2] = *[a *[a 0 2]]
. Thus, we can reduce:
*[a 7 [[0 2] [4 0 6] [0 7]] *[a 0 2]]
= *[*[a [0 2] [4 0 6] [0 7]] *[a 0 2]]
= *[a 2 [[0 2] [4 0 6] [0 7]] [0 2]]
such that
[7 [[0 2] [4 0 6] [0 7]] [2 [0 1] 0 2]] = [2 [[0 2] [4 0 6] [0 7]] [0 2]]
and, so that, finally, we can reduce the compiled Nock to:
[8
[1 0]
8
[1
6
[5 [4 0 6] 0 30]
[0 6]
2 [[0 2] [4 0 6] [0 7]] 0 2
]
2 [0 1] 0 2
]
which, aside from the use of the dojo-assigned address 30 (and any reduction
errors on this author’s part), is the same as our hand-rolled Nock.
31 Jan 2019
I wrote a little Nock interpreter called hnock some months ago
and just yesterday updated it to support the latest version of Nock, 4K. Nock
– the base layer VM of Urbit – is a very simple little “functional
assembly language” of sorts. It is of particular interest in that it is
capable of practical computation (indeed, it is Turing complete) but is not
defined in terms of the lambda calculus. So, no variable naming and
capture-avoidance and the like to deal with, which is kind of neat.
Nock (at 4K) itself has a small specification, repeated below. Famously, it
fits on a t-shirt:
nock(a) *a
[a b c] [a [b c]]
?[a b] 0
?a 1
+[a b] +[a b]
+a 1 + a
=[a a] 0
=[a b] 1
/[1 a] a
/[2 a b] a
/[3 a b] b
/[(a + a) b] /[2 /[a b]]
/[(a + a + 1) b] /[3 /[a b]]
/a /a
#[1 a b] a
#[(a + a) b c] #[a [b /[(a + a + 1) c]] c]
#[(a + a + 1) b c] #[a [/[(a + a) c] b] c]
#a #a
*[a [b c] d] [*[a b c] *[a d]]
*[a 0 b] /[b a]
*[a 1 b] b
*[a 2 b c] *[*[a b] *[a c]]
*[a 3 b] ?*[a b]
*[a 4 b] +*[a b]
*[a 5 b c] =[*[a b] *[a c]]
*[a 6 b c d] *[a *[[c d] 0 *[[2 3] 0 *[a 4 4 b]]]]
*[a 7 b c] *[*[a b] c]
*[a 8 b c] *[[*[a b] a] c]
*[a 9 b c] *[*[a c] 2 [0 1] 0 b]
*[a 10 [b c] d] #[b *[a c] *[a d]]
*[a 11 [b c] d] *[[*[a c] *[a d]] 0 3]
*[a 11 b c] *[a c]
*a *a
Perhaps you are a neophyte Haskeller and have never implemented an interpreter
for a language before. Nock makes for an excellent target to practice on.
Expressions
A noun in Nock is either an atom, i.e. an unsigned integer, or a cell,
i.e. an ordered pair of nouns:
data Noun =
Atom Integer
| Cell Noun Noun
deriving Eq
Very simple. A Nock expression is then a noun, or some operator applied to
a noun. Per the spec, the operators are denoted by ?
, +
, =
, /
, #
,
and *
, and the fun, convenient, and authentically Urbit-y way to pronounce
these are wut, lus, tis, net, hax, and tar respectively:
data Expr =
Noun Noun
| Wut Noun
| Lus Noun
| Tis Noun
| Net Noun
| Hax Noun
| Tar Noun
deriving Eq
So, equipped with the above definitions, you can write Nock expressions in
Haskell. For example, the following expression:
maps to:
Tar
(Cell (Atom 57)
(Cell (Atom 4)
(Cell (Atom 0) (Atom 1))))
Note per the spec that cells associate to the right, thus something like [4 0
1]
maps to [4 [0 1]]
, and thus Cell (Atom 4) (Cell (Atom 0) (Atom 1))
.
Evaluation and Semantics
For evaluation one can more or less copy out the production rules from the Nock
spec. We can use an Either type to denote a lack of defined semantics:
data Error = Error Noun
deriving Show
type Possibly = Either Error
and then simply define the various evaluation functions appropriately. ‘wut’,
for example, checks if something is a cell:
wut :: Noun -> Possibly Noun
wut noun = return $ case noun of
Cell {} -> Atom 0
Atom {} -> Atom 1
(note that in Nock and Urbit more generally, 0 denotes ‘true’).
‘tis’ checks if the elements of a cell are equal:
tis :: Noun -> Possibly Noun
tis noun = case noun of
Atom {} -> Left (Error noun)
Cell m n -> return $
if m == n
then Atom 0
else Atom 1
And so on. There are other operators for lookups, substitution, et cetera.
The most involved operator is ‘tar’, which constitutes the Nock interpreter
proper. One can simply match on the cases appropriately:
tar :: Noun -> Possibly Noun
tar noun = case noun of
Cell a (Cell (Cell b c) d) -> do
tard0 <- tar (Cell a (Cell b c))
tard1 <- tar (Cell a d)
return (Cell tard0 tard1)
Cell a (Cell (Atom 0) b) ->
net (Cell b a)
Cell _ (Cell (Atom 1) b) ->
return b
Cell a (Cell (Atom 2) (Cell b c)) -> do
tard0 <- tar (Cell a b)
tard1 <- tar (Cell a c)
tar (Cell tard0 tard1)
-- ... and so on
It is particularly useful to look at ‘tar’ to get an idea of what is going on
in Nock. One is always evaluating a noun to another noun. Semantics are only
defined when the noun being evaluated is a cell, and that cell is said to have
the structure:
The subject is essentially the environment, reified explicitly. The formula
is the code to execute against the subject.
The resulting noun is called a ‘product’. So, evaluation proceeds as:
[subject formula] -> product
In any valid application of ‘tar’, i.e. for some *[x y]
, ‘x’ is the subject,
and ‘y’ is the formula. The first case of ‘tar’, for example, is defined via:
Recall that cells in Nock are right-associative, so [a 0 b]
is really [a [0
b]]
. In other words: a
is the subject, [0 b]
is the formula. The formula
[0 b]
, read as ‘Nock-zero’, means “look up the value at address ‘b’ in the
subject,” the subject here being ‘a’. So *[a 0 b]
means “look up the value
at address ‘b’ in ‘a’.”
Other formulas in the various ‘tar’ cases denote additional useful language
features. A formula [1 b]
is the constant function, [6 b c d]
is a
conditional, [7 b c]
is function (viz. formula) composition, and so on.
There is a Turing-complete amount of power packed onto that t-shirt!
In any case, once one has constructed all of these requisite evaluation
functions, we can stitch them together in a single evaluator:
eval :: Expr -> Possibly Noun
eval expr = case expr of
Noun noun -> return noun
Wut e -> wut e
Lus e -> lus e
Tis e -> tis e
Net e -> net e
Hax e -> hax e
Tar e -> tar e
Parsing
Writing a combinator-based parser is an easy job. You can use parsec,
for example, and define your combinators as follows.
Given low-level ‘atom’ and ‘cell’ parsers, a noun is just an atom or a cell:
noun :: Monad m => P.ParsecT T.Text u m Noun
noun =
P.try cell
<|> atom
and an expression is either an operator, followed by an noun, or just a noun:
expr :: Monad m => P.ParsecT T.Text u m Expr
expr =
P.try operator
<|> fmap Noun noun
operator :: Monad m => P.ParsecT T.Text u m Expr
operator = do
op <- P.oneOf "?+=/#*"
case op of
'?' -> fmap Wut noun
'+' -> fmap Lus noun
'=' -> fmap Tis noun
'/' -> fmap Net noun
'#' -> fmap Hax noun
'*' -> fmap Tar noun
_ -> fail "op: bad token"
Note that we don’t allow expressions like the following, taken from the Nock
spec:
*[a *[[c d] 0 *[[2 3] 0 *[a 4 4 b]]]]
These are used to define the production rules, but are not valid Nock
expressions per se.
The end parser is simply:
parse :: T.Text -> Either P.ParseError Expr
parse = P.runParser expr [] "input"
Example
The final interpreter is pretty simple. Here we distinguish between cases in
order to report either parsing or evaluation errors in basic fashion,
respectively:
hnock :: T.Text -> Noun
hnock input = case parse input of
Left perr -> error (show perr)
Right ex -> case eval ex of
Left err -> error (show err)
Right e -> e
It’s easy to test. Take the simple Nock-zero example given previously, where
*[a 0 b]
means “look up the value at address ‘b’ in ‘a’”. Nock’s addressing
scheme is simple; address ‘2’ means the leftmost component of a cell, and ‘3’
means the rightmost component. For every additional cell you want to recurse
into, you just multiply by two and add one as is necessary. Address ‘1’ is
just the cell itself.
So, let’s test:
~ hnock "*[[[1 2] [3 4]] [0 1]"
[[1 2] [3 4]]
Again, the formula [0 1]
means look up at address one, which is the cell
itself.
~ hnock "*[[[1 2] [3 4]] [0 2]]"
[1 2]
The formula [0 2]
means look up at address 2, which is the leftmost component
of the cell.
~ hnock "*[[[1 2] [3 4]] [0 7]]"
4
To understand [0 7]
, we first look up the rightmost component [3 4]
at
address 3, then multiply by two and add one for the rightmost component of
that.
The most famous Nock learner’s problem is to implement a function for
decrementing an atom (if you could do this in ~2010 you could receive an
Urbit galaxy for it, which is now probably worth a serious chunk of change).
Nock’s only arithmetic operator is increment, so the trick is to decrement by
actually counting up from zero.
(N.b. in practice this is not how one actually evaluates decrement in Nock.
The compiler is ‘jetted’, such that alternate implementations, e.g. the CPU’s
decrement instruction directly, can be used to evaluate semantically-identical
expressions.)
But the idea is simple enough. To warm up, we can implement this kind of
decrement in Haskell. Don’t handle the underflow condition, just to keep
things simple:
dec :: Integer -> Integer
dec m =
let loop n
| succ n == m = n
| otherwise = loop (succ n)
in loop 0
Look at the language features we’re using to accomplish this. There’s:
- Variable naming / declaration, for the ‘m’, ‘loop’, and ‘n’ arguments
- Integer increment, via the built-in ‘succ’
- Equality testing, via ‘==’
- Conditional expressions, via the ‘|’ guards
- A function call, in ‘loop 0’
Let’s try and duplicate this in Nock, step by step. We’ll want analogues for
all of our declared variables, to start.
A Nock formula that simply returns the subject it’s evaluated against is easy:
[0 1]
. We used it above.
That’s the equivalent of our ‘m’ declaration in the Haskell code.
Similarly, we can just ignore the subject and return a ‘0’ via [1 0]
.
That’s the initial value, 0, that appears in ‘loop 0’.
We can use Nock-8 to pair these together. It evaluates a formula against the
subject, and then uses the original subject, augmented with the product, to
evaluate another one:
~ hnock "*[10 [8 [1 0] [0 1]]]"
[0 10]
The last thing we need define is the ‘loop’ function itself. We’re going to
place it at the head of the subject, i.e. at address two. Our ‘n’ and ‘m’
variables will then be at addresses six and seven, respectively. When we
define our ‘loop’ analogue, we just need to refer to those addresses, instead
of our variable names.
(N.b. Nock’s addressing scheme seems more or less equivalent to a
tree-flavoured variant of de Bruijn indices.)
The ‘loop’ function in the Haskell code tests to see the successor of ‘n’ is
equal to ‘m’, and if so, returns ‘n’. Otherwise it loops on the successor of
‘n’.
Nock-4 computes the successor. Remember that ‘n’ is at address six, so our
analogue to ‘succ n’ is [4 0 6]
. We want to check if it’s equal to ‘m’, at
address seven, gotten via [0 7]
. The equality check is then handled by
Nock-5:
Nock-6 is the conditional operator. If the equality check is true, we just
want to return ‘n’ via [0 6]
. The whole formula will look like this:
[6 [5 [4 0 6] [0 7]] [0 6] <loop on successor of n>]
To loop, we’ll use Nock-2. We’ll compute a new subject – consisting of our
‘loop’ analogue at address two, an updated ‘n’ at address six, and our
original ‘m’ at address seven – and then evaluate the same looping formula
against it. In Nock, the formula is:
[2 [[0 2] [4 0 6] [0 7]] [0 2]]
And our ‘loop’ analogue in full is thus:
[6 [5 [4 0 6] [0 7]] [0 6] [2 [[0 2] [4 0 6] [0 7]] [0 2]]]
Finally, we now need to actually store this in the subject. Again, we’ll
ignore the subject with Nock-1, and use Nock-8 to tuple the formula up with the
‘n’ and ‘m’ analogues:
(N.b. at this point we’re pretty much at line noise – spacing it out doesn’t
make it that much more readable, but it does avoid word wrap.)
[8
[1 0]
[8
[1
[6
[5 [4 0 6] [0 7]]
[0 6]
[2 [[0 2] [4 0 6] [0 7]] [0 2]]
]
]
[2 [0 1] [0 2]]
]
]
Let’s check if it works. We’re embedded in Haskell, after all, so let’s get
some support from our host, for convenience:
~ let dec = \subj -> "*[" <> subj <> " [8 [1 0] [8 [1 [6 [5 [4 0 6] [0 7]] [0 6] [2 [[0 2] [4 0 6] [0 7]] [0 2]]]] [2 [0 1] [0 2]]]]]"
~ hnock (dec "10")
9
~ hnock (dec "25")
24
~ hnock (dec "200")
199
Phew!
Fin
Nock makes for an interesting “functional assembly language” of sorts. Its
spec is so simple that writing a naïve interpreter is more or less a process of
copying out the production rules. You can do it without a lot of thought.
The time-honoured decrement-writing exercise, on the other hand, really makes
one think about Nock. That the environment is always explicit, you can access
it via a nice addressing scheme, that you can refer to functions via pointers
(viz. addresses), etc. It’s pretty intuitive, when you dig into the mechanics.
In any case – writing a decrement (or similar function) and writing your own
interpreter to test it is a fun exercise. From a practical perspective, it
also gives one a lot of intuition about programming in Hoon – it’s
probably not too far from the truth to compare the relationship between Nock
and Hoon to that between, say, x86 and C. If one is programming in Hoon, it
can’t hurt to to know your Nock!
07 Oct 2018
I recently needed a good cryptographically-secure and seedable pseudorandom
number generator for Javascript. This didn’t turn out to be as trivial a
procedure as I figured it’d be: most Javascript CSPRNGs I found didn’t appear
to be manually seedable, instead automatically seeding themselves
behind-the-scenes using a trusted high-quality entropy source like /dev/urandom
(as per the WebCryptoAPI spec).
But! I want to use my own entropy source. So I could either implement my own
cryptographically-secure PRNG, which I would obviously need to test rigorously,
or I could find an existing implementation that had already been vetted widely
by use. I settled on the ISAAC PRNG/stream cipher, both because it
seemed like a reasonable choice of PRNG (it is used in things like coreutils’
shred, and there are no known practical attacks against it), and also because
there was a Javascript implementation floating around on the internet. But the
interesting thing was that the implementation did not really seem to be
vetted widely – it hadn’t been updated in five years or so, and both the
Github repository and the npm package seem to show very little
activity around it in general.
I was going to be using this thing in an application that made some nontrivial
demands in terms of security. So while the PRNG itself seemed to fit the bill,
I wanted to be assured that the implementation satisfied at least some basic
criteria for pseudorandomness. I assumed that it probably works – I just
wanted to have some reasonable confidence in making that assumption.
There are a number of statistical suites for testing PRNGs out there: I am
aware of at least the NIST statistical test suite, the DieHard
suite, and TestU01, being most familiar with the latter (this is not
saying much). The TestU01 suite is implemented in C; you provide it with a
PRNG represented as a function pointer that returns 32-bit integers, and it
will run the desired battery of tests (SmallCrush, Crush, or BigCrush – each
of increasing size and scope) for you. These more or less consist of
frequentist tests involving the chi-square distribution, with more specialised
test statistics appearing on occasion. Results are provided in terms of
p-values on these statistics.
I found this page that gave some instructions for using TestU01, but
the ISAAC implementation I wanted to test is of course written in Javascript.
So I knew there was some FFI hackery to be done in order to get the two
codebases to play nicely together. I also discovered that Jan de Mooij did
some work on testing JS’s basic ‘Math.random’ generator with TestU01
using Emscripten, an LLVM-to-JS compiler, so these two resources seemed
a useful place to start.
After several hours of bashing my way through emcc compilation and linker
errors careful and methodical programming, I managed to get everything to
work. Since the documentation for these tools can be kind of sparse, I figured
I’d outline the steps to run the tests here, and hopefully save somebody else a
little bit of time and effort in the future. But that said, this is inevitably
a somewhat tricky procedure – when trying to reproduce my steps, I ran into a
few strange errors that required additional manual fiddling. The following
should be about right, but may require some tweaking on your end.
Emscripten and TestU01
Install and activate Emscripten in the current terminal. To go from
the official git repo:
$ git clone https://github.com/juj/emsdk.git
$ cd emsdk
$ ./emsdk install latest
$ ./emsdk activate latest
$ source ./emsdk_env.sh
Emscripten provides a couple of wrappers over existing tools; notably, there’s
emconfigure
and emmake
for specialising make builds for compilation via
emcc
, the Emscripten compiler itself.
In some other directory, grab the TestU01 suite from the Université de
Montréal website and extract it:
$ wget http://simul.iro.umontreal.ca/testu01/TestU01.zip
$ unzip -q TestU01.zip
This is some oldish, gnarly academic C code that uses a very wonky
autoconf/automake-based build system. There is probably a better way to do it,
but the easiest way to get this thing built without too much grief is to build
it twice – once as per normal, specifying the appropriate base directory,
and once again to specialise it for Emscripten’s use:
$ basedir=`pwd`
$ cd TestU01-1.2.3
$ ./configure --prefix="$basedir"
$ make -j 2
$ make -j 2 install
If all goes well you’ll see ‘bin’, ‘include’, ‘lib’, and ‘share’ directories
pop up in ‘basedir’. Repeat the analogous steps for emscripten; note that
you’ll get some “no input files” errors here when the configure script checks
dynamic linker characteristics, but these are inconsequential:
$ emconfigure ./configure --prefix="$basedir"
$ emmake make -j 2
$ emmake make -j 2 install
Similarly, you’ll notice some warnings re: dynamic linking when building.
We’ll handle these later. In the meantime, you can return to your ‘basedir’ to
continue working:
A Test Shim for ISAAC
Check out the raw ISAAC code. It’s structured in a sort-of-object-y
way; the state of the PRNG is held in a bunch of opaque internal variables, and
the whole thing is initialised by calling the ‘isaac’ function and binding the
result as a variable. One then iterates the PRNG by calling either the ‘rand’
or ‘random’ property of that variable for a random integer or double,
respectively.
We need to write the actual testing code in C. You can get away with the
following, which I’ve adapted from M.E. O’Neill’s code – call it
something like ‘test-isaac.c’:
#include <emscripten.h>
#include "TestU01.h"
extern void isaac_init(void);
extern unsigned int isaac_rand(void);
int main()
{
isaac_init();
unif01_Gen* gen = unif01_CreateExternGenBits("ISAAC", isaac_rand);
bbattery_SmallCrush(gen);
unif01_DeleteExternGenBits(gen);
return 0;
}
Note the two external functions I’m declaring here: the first mimics calling
the ‘isaac’ function in Javascript and binding it to a variable, ‘isaac’, and
the second mimics a call to isaac.rand()
. The testing code follows the same
pattern: isaac_init()
initialises the generator state, and isaac_rand
produces a value from it. The surrounding code passes isaac_rand
in as the
generator to use for the SmallCrush battery of tests.
C to LLVM Bitcode
We can compile this to LLVM IR as it is, via Emscripten. But first, recall
those dynamic linker warnings from the initial setup step. Emscripten deals
with a lot of files, compile targets, etc. based on file extension. We thus
need to rename all the .dylib files in the ‘lib’ directory, which are in fact
all LLVM bitcode, to have the appropraite .bc extension. From the ‘lib’
directory itself, one can just do the following in bash:
$ for old in *.dylib; do mv $old `basename $old .dylib`.bc; done
When that’s done, we can compile the C code to LLVM with emcc
:
$ emcc -O3 -o test-isaac.bc \
-I/$basedir/include \
-L/$basedir/lib \
-ltestu01 -lprobdist -lmylib -lm -I/usr/local/include \
test-isaac.c
Again, Emscripten decides what its compile target should be via its file
extension – thus here, an output with the .bc
extension means we’re
compiling to LLVM IR.
LLVM to Javascript and WASM
Now, to provide the requisite isaac_init
and isaac_rand
symbols to the
compiled LLVM bitcode, we need to pass the ISAAC library itself. This is
another finicky procedure, but there is a method to the madness, and one can
find a bit of documentation on it here.
Helpfully, Evan Wallace at Figma wrote an emscripten JS library generation
helper that makes this task a little less painful. Install that via npm
as follows:
$ npm install -g emscripten-library-generator
To wrap the ISAAC code up in the appropriate format, one needs to make a few
small modifications to it. I won’t elaborate on this too much, but one needs
to:
Then we can package it up in an emscripten-friendly format as follows:
$ emscripten-library-generator isaac-mod.js > isaac-lib.js
You’ll need to make one last adjustment. In the isaac-lib.js
file just
generated for us, add the following emscripten ‘postset’ instruction
above the ‘isaac’ property:
isaac__postset: 'var isaac_initialised = _isaac();', // add me
isaac: function () { // leave me alone
This makes sure that the isaac_initialised
variable referred to in
isaac_rand
is accessible.
Whew. Ok, with all that done, we’ll compile our LLVM bytecode to a Javascript
and wasm pair. You’ll need to bump up the total memory option in order
to run the resulting code – I think I grabbed the amount I used from Jan de
Mooij’s example:
$ emcc --js-library isaac-lib.js \
lib/libtestu01.0.0.1.bc \
-o test-isaac.js \
-s TOTAL_MEMORY=536870912 \
test-isaac.bc
Running SmallCrush
That’s about it. If all has gone well, you should have seen no compilation or
linking errors when running emcc, and you should have a couple of
‘test-isaac.js’ and ‘test-isaac.wasm’ files kicking around in your ‘basedir’.
To (finally) run the Small Crush suite, execute ‘test-isaac.js’ with node:
$ node test-isaac.js
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Starting SmallCrush
Version: TestU01 1.2.3
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
***********************************************************
HOST = emscripten, Emscripten
ISAAC
smarsa_BirthdaySpacings test:
-----------------------------------------------
N = 1, n = 5000000, r = 0, d = 1073741824, t = 2, p = 1
Number of cells = d^t = 1152921504606846976
Lambda = Poisson mean = 27.1051
----------------------------------------------------
Total expected number = N*Lambda : 27.11
Total observed number : 27
p-value of test : 0.53
-----------------------------------------------
CPU time used : 00:00:00.00
Generator state:
<more output omitted>
========= Summary results of SmallCrush =========
Version: TestU01 1.2.3
Generator: ISAAC
Number of statistics: 15
Total CPU time: 00:00:00.00
All tests were passed
Et voilà, your SmallCrush battery output is dumped to stdout. You need only to
tweak the C code shim and recompile if you want to run the more intensive Crush
or BigCrush suites. Similarly, you can dump generator output to stdout with
console.log()
if you want to reassure yourself that the generator is running
correctly.
Fin
So: the Javascript ISAAC PRNG indeed passes TestU01. Nice! It was satisfying
enough to get this hacky sequence of steps to actually run, but it was even
better to see the tests actually pass, as I’d both hoped and expected.
I did a few extra things to convince myself that ISAAC was really passing the
tests as it seemed to be doing. I ran TestU01 on a cheap little xorshift
generator (which failed several tests) and also did some ad-hoc analysis of
values that I had ISAAC log to stdout. And, I even looked at the code, and
compared it with a reference implementation by eye. At this point, I’m
convinced it operates as advertised, and I feel very comfortable using it in my
application.
04 Aug 2018
I recently picked up Appel’s classic Compiling with Continuations and
have been refreshing my continuation-fu more generally.
Continuation-passing style (CPS) itself is nothing uncommon to the
functional programmer; it simply involves writing in a manner such that
functions never return, instead passing control over to something else (a
continuation) to finish the job. The simplest example is just the identity
function, which in CPS looks like this:
id :: a -> (a -> b) -> b
id x k = k x
The first argument is the conventional identity function argument – the second
is the continuation. I wrote a little about continuations in the context of
the Giry monad, which is a somewhat unfamiliar setting, but one that
follows the same principles as anything else.
In this post I just want to summarise a few useful CPS transforms and related
techniques in one place.
Consider a binary tree type. We’ll keep things simple here:
data Tree a =
Leaf a
| Branch a (Tree a) (Tree a)
Calculating the depth of a tree is done very easily:
depth :: Tree a -> Int
depth = loop where
loop tree = case tree of
Leaf _ -> 1
Branch _ l r ->
let dl = loop l
dr = loop r
in succ (max dl dr)
Note however that this is not a tail-recursive function – that is, it does not
end with a call to itself (instead it ends with a call to something like ‘succ
. uncurry max’). This isn’t necessarily a big deal – the function is easy to
read and write and everything, and certainly has fine performance
characteristics in Haskell – but it is less easy to deal with for, say, an
optimising compiler that may want to handle evaluation in this or that
alternative way (primarily related to memory management).
One can construct a tail-recursive (depth-first) version of ‘depth’ via a
manual CPS transformation. The looping function is simply augmented to take an
additional continuation argument, like so:
depth :: Tree a -> Int
depth tree = loop tree id where
loop cons k = case cons of
Leaf _ -> k 1
Branch _ l r ->
loop l $ \dl ->
loop r $ \dr ->
k (succ (max dl dr))
Notice now that the ‘loop’ function terminates with a call to itself (or just
passes control to a supplied continuation), and is thus tail-recursive.
Due to the presence of the continuation argument, ‘loop’ is a higher-order
function. This is fine and dandy in Haskell, but there is a neat technique
called defunctionalisation that allows us to avoid the jump to
higher-order and makes sure things stay KILO (“keep it lower order”), which can
be simpler to deal with more generally.
The idea is just to reify the continuations as abstract syntax, and then
evaluate them as one would any embedded language. Note the continuation \dl
-> ..
, for example – the free parameters ‘r’ and ‘k’ occuring in the function
body correspond to a tree (the right subtree) and another continuation,
respectively. And in \dr -> ..
one has the free parameters ‘dl’ and ‘k’ –
now the depth of the left subtree, and the other continuation again. We also
have ‘id’ used on the initial call to ‘loop’. These can all be reified via the
following data type:
data DCont a =
DContL (Tree a) (DCont a)
| DContR Int (DCont a)
| DContId
Note that this is a very simple recursive type – it has a simple list-like
pattern of recursion, in which each ‘level’ of a value is either a constructor,
carrying both a field of some type and a recursive point, or is the ‘DContId’
constructor, which simply terminates the recursion. The reified continuations
are, on a suitable level of abstraction, more or less the sequential operations
to be performed in the computation. In other words: by reifying the
continuations, we also reify the stack of the computation.
Now ‘depth’ can be rewritten such that its looping function is not
higher-order; the cost is that another function is needed, one that lets us
evaluate items (again, reified continuations) on the stack:
depth :: Tree a -> Int
depth tree = loop tree DContId where
loop cons k = case cons of
Leaf _ -> eval k 1
Branch _ l r -> loop l (DContL r k)
eval cons d = case cons of
DContL r k -> loop r (DContR d k)
DContR dl k -> eval k (succ (max dl d))
DContId -> d
The resulting function is mutually tail-recursive in terms of both ‘loop’
and ‘eval’, neither of which are higher-order.
One can do a little better in this particular case and reify the stack using an
actual Haskell list, which simplifies evaluation somewhat – it just requires
that the list elements have a type along the lines of ‘(Tree a, Int)’ rather
than something like ‘Either (Tree a) Int’, which is effectively what we get
from ‘DCont a’. You can see an example of this in this StackOverflow
answer by Chris Taylor.
“Mechanical CPS transformation” might be translated as simply “compiling with
continuations.” Matt Might has quite a few posts on this topic; in
particular he has one very nice post on mechanical CPS conversion that
summarises various transformations described in Appel, etc.
Matt describes three transformations that I think illustrate the general
mechanical CPS business very well (he describes more, but they are more
specialised). The first is a “naive” transformation, which is simple, but
produces a lot of noisy “administrative redexes” that must be cleaned up in
another pass. The second is a higher-order transformation, which makes use of
the host language’s facilities for function definition and application – it
produces simpler code, but some unnecessary noise still leaks through. The
last is a “hybrid” transformation, which makes use of both the naive and
higher-order transformations, depending on which is more appropriate.
Let’s take a look at these in Haskell. First let’s get some imports out of the
way:
{-# LANGUAGE OverloadedStrings #-}
import Data.Monoid
import Data.Text (Text)
import qualified Data.Text as T
import Data.Unique
import qualified Text.PrettyPrint.Leijen.Text as PP
I’ll also make use of a simple, Racket-like ‘gensym’ function:
gensym :: IO Text
gensym = fmap render newUnique where
render u =
let hu = hashUnique u
in T.pack ("$v" <> show hu)
We’ll use a bare-bones lambda calculus as our input language. Many examples –
Appel’s especially – use significantly more complex languages when
illustrating CPS transforms, but I think this distracts from the meat of the
topic. Lambda does just fine:
data Expr =
Lam Text Expr
| Var Text
| App Expr Expr
I want to render expressions in my input and output languages in a Lisp-like
manner. This is very easy to do using a good pretty-printing library;
here I’m using the excellent wl-pprint-text, and will omit the ‘Pretty’
instances in the body of my post. But I’ll link to a gist including them at
the bottom.
When performing a mechanical CPS transform, one targets both “atomic”
expressions – i.e., variables and lambda abstractions – and “complex”
expressions, i.e. function application. The target language is thus a
combination of the ‘AExpr’ and ‘CExpr’ types:
data AExpr =
AVar Text
| ALam [Text] CExpr
data CExpr =
CApp AExpr [AExpr]
All the mechanical CPS transformations use variants on two functions going by
the cryptic names m and t. m is responsible for converting
atomic expressions in the input languages (i.e., variables and lambda
abstractions) into atomic expressions in the target language (an atomic CPS
expression). t is the actual CPS transformation; it converts an expression
in the input language into CPS, invoking a specified continuation (already in
the target language) on the result.
Let’s look at the naive transform. Here are m and t, prefixed by ‘n’
to indicate that they are naive. First, m:
nm :: Expr -> IO AExpr
nm expr = case expr of
Lam var cexpr0 -> do
k <- gensym
cexpr1 <- nt cexpr0 (AVar k)
return (ALam [var, k] cexpr1)
Var var -> return (AVar var)
App {} -> error "non-atomic expression"
(N.b. you almost never want to use ‘error’ in a production implementation of
anything. It’s trivial to wrap e.g. ‘MaybeT’ around the appropriate
functions to handle the bogus pattern match on ‘App’ totally, but I just want
to keep the types super simple here.)
The only noteworthy thing that m does here is in the case of a lambda
abstraction: a new abstract continuation is generated, and the body of the
abstraction is converted to CPS via t, such that the freshly-generated
continuation is called on the result. Remember, m is really just mapping
atomic expressions in the input language to atomic expressions in the target
language.
Here’s t for the naive transform. Remember, t is responsible for
converting expressions to continuation-passing style:
nt :: Expr -> AExpr -> IO CExpr
nt expr cont = case expr of
Lam {} -> do
aexpr <- m expr
return (CApp cont [aexpr])
Var _ -> do
aexpr <- m expr
return (CApp cont [aexpr])
App f e -> do
fs <- gensym
es <- gensym
let aexpr0 = ALam [es] (CApp (AVar fs) [AVar es, cont])
cexpr <- nt e aexpr0
let aexpr1 = ALam [fs] cexpr
nt f aexpr1
For both kinds of atomic expressions (lambda and variable), the expression is
converted to the target language via m, and then the supplied continuation
is applied to it. Very simple.
In the case of function application (a “complex”, or non-atomic expression),
both the function to be applied, and the argument it is to be applied to, must
be converted to CPS. This is done by generating two fresh continuations,
transforming the argument, and then transforming the function. The control
flow here is always handled by stitching continuations together; notice when
transforming the function ‘f’ that the continuation to be applied has already
handled its argument.
Next, the higher-order transform. Here are m and t:
hom :: Expr -> IO AExpr
hom expr = case expr of
Lam var e -> do
k <- gensym
ce <- hot e (\rv -> return (CApp (AVar k) [rv]))
return (ALam [var, k] ce)
Var n -> return (AVar n)
App {} -> error "non-atomic expression"
hot :: Expr -> (AExpr -> IO CExpr) -> IO CExpr
hot expr k = case expr of
Lam {} -> do
aexpr <- m expr
k aexpr
Var {} -> do
aexpr <- m expr
k aexpr
App f e -> do
rv <- gensym
xformed <- k (AVar rv)
let cont = ALam [rv] xformed
cexpr fs = hot e (\es -> return (CApp fs [es, cont]))
hot f cexpr
Both of these have the same form as they do in the naive transform – the
difference here is simply that the continuation to be applied to a transformed
expression is expressed in the host language – i.e., here, Haskell. Thus the
transform is “higher-order,” in exactly the same sense that higher-order
abstract syntax is higher-order.
The final transformation I’ll illustrate here, the hybrid transform, applies
the naive transformation to lambda and variable expressions, and applies the
higher-order transformation to function applications. Here t is split up
into tc and tk to handle these cases accordingly:
m :: Expr -> IO AExpr
m expr = case expr of
Lam var cexpr -> do
k <- gensym
xformed <- tc cexpr (AVar k)
return (ALam [var, k] xformed)
Var n -> return (AVar n)
App {} -> error "non-atomic expression"
tc :: Expr -> AExpr -> IO CExpr
tc expr c = case expr of
Lam {} -> do
aexpr <- m expr
return (CApp c [aexpr])
Var _ -> do
aexpr <- m expr
return (CApp c [aexpr])
App f e -> do
let cexpr fs = tk e (\es -> return (CApp fs [es, c]))
tk f cexpr
tk :: Expr -> (AExpr -> IO CExpr) -> IO CExpr
tk expr k = case expr of
Lam {} -> do
aexpr <- m expr
k aexpr
Var {} -> do
aexpr <- m expr
k aexpr
App f e -> do
rv <- gensym
xformed <- k (AVar rv)
let cont = ALam [rv] xformed
cexpr fs = tk e (\es -> return (CApp fs [es, cont]))
tk f cexpr
Matt illustrates these transformations on a simple expression: (g a)
. We can
do the same:
test :: Expr
test = App (Var "g") (Var "a")
First, the naive transform. Note all the noisy administrative redexes that
come along with it:
> cexpr <- nt test (AVar "halt")
> PP.pretty cexpr
((λ ($v1).
((λ ($v2).
($v1 $v2 halt)) a)) g)
The higher-order transform does better, containing only one such redex (an
eta-expansion). Note that since the supplied continuation must be expressed
in terms of a Haskell function, we need to write it in a more HOAS-y style:
> cexpr <- hot test (\ans -> return (CApp (AVar "halt") [ans]))
> PP.pretty cexpr
(g a (λ ($v3).
(halt $v3)))
Finally the hybrid transform, which, here, is literally perfect. We don’t even
need to deal with the minor annoyance of the HOAS-style continuation when
calling it:
> cexpr <- tc test (AVar "halt")
> PP.pretty cexpr
(g a halt)
Matt goes on to describe a “partioned CPS transform” that can be used to
recover a stack, in (seemingly) much the same manner that the defunctionalised
manual CPS transform worked in the previous section. Very neat, but something
I’ll have to look at in another post.
Fin
CPS is pretty gnarly. My experience in compiling with continuations is not
substantial, but I dig learning it. Appel’s book, in particular, is meaty –
expect more posts on the subject here eventually, probably.
‘Til next time! I’ve dumped the code from the latter section into a
gist.
02 Jul 2018
Why does my blog often feature its typical motley mix of probability,
functional programming, and computer science anyway?
From 2011 through 2017 I slogged through a Ph.D. in statistics, working on it
full time in 2012, and part-time in every other year. It was an interesting
experience. Although everything worked out for me in the end – I managed to
do a lot of good and interesting work in industry while still picking up a
Ph.D. on the side – it’s not something I’d necessarily recommend to others.
The smart strategy is surely to choose one thing and give it one’s maximum
effort; by splitting my time between work and academia, both obviously suffered
to some degree.
That said, at the end of the day I was pretty happy with the results on both
fronts. On the academic side of things, the main product was a dissertation,
Embedded Domain-Specific Languages for Bayesian Modelling and
Inference, supporting my thesis: that novel and useful DSLs for solving
problems in Bayesian statistics can be embedded in statically-typed, purely
functional programming languages.
It helps to remember that in this day and age, one can still typically graduate
by, uh, “merely” submitting and defending a dissertation. Publishing in
academic venues certainly helps focus one’s work, and is obviously necessary
for a career in academia (or, increasingly, industrial research). But it’s
optional when it comes to getting your degree, so if it doesn’t help you
achieve your goals, you may want to reconsider it, as I did.
The problem with the dissertation-first approach, of course, is that nobody
reads your work. To some extent I think I’ve mitigated that; most of the
content in my dissertation is merely a fleshed-out version of various ideas
I’ve written about on this blog. Here I’ll continue that tradition and write a
brief, informal summary of my dissertation and Ph.D. more broadly – what I
did, how I approached it, and what my thoughts are on everything after the
fact.
The Idea
Following the advice of Olin Shivers (by way of Matt Might), I
oriented my work around a concrete thesis, which wound up more or less
being that embedding DSLs in a Haskell-like language can be a useful technique
for solving statistical problems. This thesis wasn’t born into the world
fully-formed, of course – it began as quite a vague (or misguided) thing, but
matured naturally over time. Using the tools of programming languages and
compilers to do statistics and machine learning is the motivation behind
probabilistic programming in general; what I was interested in was exploring
the problem in the setting of languages embedded in a purely functional host.
Haskell was the obvious choice of host for all of my implementations.
It may sound obvious that putting together a thesis is a good strategy for a
Ph.D. But here I’m talking about a thesis in the original (Greek) sense
of a proposition, i.e. a falsifiable idea or claim (in contrast to a
dissertation, from the Latin disserere, i.e. to examine or to discuss).
Having a central idea to orient your work around can be immensely useful in
terms of focus. When you read a dissertation with a clear thesis, it’s easy to
know what the writer is generally on about – without one it can (increasingly)
be tricky.
My thesis is pretty easy to defend in the abstract. A DSL really exposes the
structure of one’s problem while also constraining it appropriately, and
embedding one in a host language means that one doesn’t have to implement an
entire compiler toolchain to support it. I reckoned that simply pointing the
artillery of “language engineering” at the statistical domain would lead to
some interesting insight on structure, and maybe even produce some useful
tools. And it did!
The Contributions
Of course, one needs to do a little more defending than that to satisfy his or
her examination committee. Doctoral research is supposed to be substantial and
novel. In my experience, reviewers are concerned with your answers to the
following questions:
- What, specifically, are your claims?
- Are they novel contributions to your field?
- Have you backed them up sufficiently?
At the end of the day, I claimed the following advances from my work.
-
Novel probabilistic interpretations of the Giry monad’s algebraic
structure. The Giry monad (Lawvere, 1962; Giry, 1981) is the
“canonical” probability monad, in a meaningful sense, and I demonstrated that
one can characterise the measure-theoretic notion of image measure by its
functorial structure, as well as the notion of product measure by its
monoidal structure. Having the former around makes it easy to transform the
support of a probability distribution while leaving its density structure
invariant, and the latter lets one encode probabilistic independence, enabling
things like measure convolution and the like. What’s more, the analogous
semantics carry over to other probability monads – for example the well-known
sampling monad, or more abstract variants.
-
A novel characterisation of the Giry monad as a restricted continuation
monad. Ramsey & Pfeffer (2002) discussed an “expectation monad,”
and I had independently come up with my own “measure monad” based on
continuations. But I showed both reduce to a restricted form of the
continuation monad of Wadler (1994) – and that indeed, when the
return type of Wadler’s continuation monad is restricted to the reals, it is
the Giry monad.
To be precise it’s actually somewhat more general – it permits integration
with respect to any measure, not only a probability measure – but that
definition strictly subsumes the Giry monad. I also showed that product
measure, via the applicative instance, yields measure convolution and
associated operations.
-
A novel technique for embedding a statically-typed probabilistic
programming language in a purely functional language. The general idea
itself is well-known to those who have worked with DSLs in Haskell: one
constructs a base functor and wraps it in the free monad. But the reason
that technique is appropriate in the probabilistic programming domain is that
probabilistic models are fundamentally monadic constructs – merely recall
the existence of the Giry monad for proof!
To construct the requisite base functor, one maps some core set of concrete
probability distributions denoted by the Giry monad to a collection of
abstract probability distributions represented only by unique names. These
constitute the branches of one’s base functor, which is then wrapped in the
familiar ‘Free’ machinery that gives one access to the functorial,
applicative, and monadic structure that I talked about above. This abstract
representation of a probabilistic model allows one to implement other
probability monads, such as the well-known sampling monad (Ramsey & Pfeffer,
2002; Park et al., 2008) or the Giry monad, by way of interpreters.
(N.b. Ścibior et al. (2015) did some very similar work to this,
although the monad they used was arguably more operational in its flavour.)
-
A novel characterisation of execution traces as cofree comonads. The
idea of an “execution trace” is that one runs a probabilistic program
(typically generating a sample) and then records how it executed – what
randomness was used, the execution path of the program, etc. To do Bayesian
inference, one then runs a Markov chain over the space of possible execution
traces, calculating statistics about the resulting distribution in trace
space (Wingate et al., 2011).
Remarkably, a cofree comonad over the same abstract probabilistic base
functor described above allows us to represent an execution trace at the
embedded language level itself. In practical terms, that means one can
denote a probabilistic model, and then run a Markov chain over the space of
possible ways it could have executed, without leaving GHCi. You can
alternatively examine and perturb the way the program executes, stepping
through it piece by piece, as I believe was originally a feature in Venture
(Mansinghka et al., 2014).
(N.b. this really blew my mind when I first started toying with it,
converting programs into execution traces and then manipulating them as
first-class values, defining other probabilistic programs over spaces of
execution traces, etc. Meta.)
-
A novel technique for statically encoding conditional independence of
terms in this kind of embedded probabilistic programming language. If you
recall that I previously demonstrated the monoidal (i.e. applicative)
structure of the Giry monad encodes the notion of product measure, it will
not be too surprising to hear that I used the free applicative functor
(Capriotti & Kaposi, 2014) (again, over the same kind of abstract
probabilistic base functor) to reify applicative expressions such that they can
be identified statically.
-
A novel shallowly-embedded language for building custom transition
operators for use in Markov chain Monte Carlo. MCMC is the de-facto
standard way to perform inference on Bayesian models (although it is not
limited to Bayesian models in particular). By wrapping a simple state monad
transformer around a probability monad, one can denote Markov transition
operators, combine them, and transform them in a few ways that are useful for
doing MCMC.
The framework here was inspired by the old parallel “strategies” idea of
Trinder et al. (1998). The idea is that you want to “evaluate” a
posterior via MCMC, and want to choose a strategy by which to do so – e.g.
Metropolis (Metropolis, 1953), slice sampling (Neal, 2003),
Hamiltonian (Neal, 2011), etc. Since Markov transition operators
are closed under composition and convex combinations, it is easy to write a
little shallowly-embedded combinator language for working with them –
effectively building evaluation strategies in a manner familiar to those
who’ve worked with Haskell’s parallel library.
(N.b. although this was the most trivial part of my research, theoretical or
implementation-wise, it remains the most useful for day-to-day practical
work.)
The Execution
One needs to stitch his or her contributions together in some kind of
over-arching narrative that supports the underlying thesis. Mine went
something like this:
The Giry monad is appropriate for denoting probabilistic semantics in
languages with purely-functional hosts. Its functorial, applicative, and
monadic structure denote probability distributions, independence, and
marginalisation, respectively, and these are necessary and sufficient for
encoding probabilistic models. An embedded language based on the Giry monad is
type-safe and composable.
Probabilistic models in an embedded language, semantically denoted in terms of
the Giry monad, can be made abstract and interpretation-independent by defining
them in terms of a probabilistic base functor and a free monad instead. They
can be forward-interpreted using standard free monad recursion schemes in order
to compute probabilities (via a measure intepretation) or samples (via a
sampling interpretation); the latter interpretation is useful for performing
limited forms of Bayesian inference, in particular. These free-encoded models
can also be transformed into cofree-encoded models, under which they represent
execution traces that can be perturbed arbitrarily by standard comonadic
machinery. This representation is amenable to more elaborate forms of Bayesian
inference. To accurately denote conditional independence in the embedded
language, the free applicative functor can also be used.
One can easily construct a shallowly-embedded language for building custom
Markov transitions. Markov chains that use these compound transitions can
outperform those that use only “primitive” transitions in certain settings.
The shallowly embedded language guarantees that transitions can only be
composed in well-defined, type-safe ways that preserve the properties desirable
for MCMC. What’s more, one can implement “transition transformers” for
implementing still more complex inference techniques, e.g. annealing or
tempering, over existing transitions.
Thus: novel and useful domain-specific languages for solving problems in
Bayesian statistics can be embedded in statically-typed, purely-functional
programming languages.
I used the twenty-minute talk period of my defence to go through this
narrative and point out my claims, after which I was grilled on them for an
hour or two. The defence was probably the funnest part of my whole Ph.D.
The Product
In the end, I mainly produced a dissertation, a few blog posts, and
some code. By my count, the following repos came out of the work:
If any of this stuff is or was useful to you, that’s great! I still use the
declarative libraries, flat-mcmc, mwc-probability, and sampling pretty
regularly. They’re fast and convenient for practical work.
Some of the other stuff, e.g. measurable, is useful for building intuition,
but not so much in practice, and deanie, for example, is a work-in-progress
that will probably not see much more progress (from me, at least). Continuing
from where I left off might be a good idea for someone who wants to explore
problems in this kind of setting in the future.
General Thoughts
When I first read about probabilistic (functional) programming in Dan Roy’s
2011 dissertation I was absolutely blown away by the idea. It seemed
that, since there was such an obvious connection between the structure of
Bayesian models and programming languages (via the underlying semantic graph
structure, something that has been exploited to some degree as far back as
BUGS), it was only a matter of time until someone was able to really
create a tool that would revolutionize the practice of Bayesian statistics.
Now I’m much more skeptical. It’s true that probabilistic programming tends to
expose some beautiful structure in statistical models, and that a probabilistic
programming language that was easy to use and “just worked” for inference would
be a very useful tool. But putting something expressive and usable together
that also “just works” for that inference step is very, very difficult. Very
difficult indeed.
Almost every probabilistic programming framework of the past ten years, from
Church down to my own stuff, has more or less wound up as “thesisware,” or
remains the exclusive publication-generating mechanism of a single research
group. The exceptions are almost unexceptional in of themselves: JAGS and Stan
are probably the most-used such frameworks, certainly in statistics (I will
mention the very honourable PyMC here as well), but they innovate little, if at
all, over the original BUGS in terms of expressiveness. Similarly it’s very
questionable whether the fancy MCMC algo du jour is really any better than
some combination of Metropolis-Hastings (even plain Metropolis), Gibbs (or its
approximate variant, slice sampling), or nested sampling in anything
outside of favourably-engineered examples (I will note that Hamiltonian Monte
Carlo could probably be counted in there too, but it can still be quite a pain
to use, its variants are probably overrated, and it is comparatively
expensive).
Don’t get me wrong. I am a militant Bayesian. Bayesian statistics, i.e., as
far as I’m concerned, probability theory, describes the world accurately.
And there’s nothing wrong with thesisware, either. Research is research, and
this is a very thorny problem area. I hope to see more abandoned, innovative
software that moves the ball up the field, or kicks it into another stadium
entirely. Not less. The more ingenious implementations and sampling schemes
out there, the better.
But more broadly, I often find myself in the camp of Leo Breiman, who
in 2001 characterised the two predominant cultures in statistics as those of
data modelling and algorithmic modelling respectively, the latter now known
as machine learning, of course. The crux of the data modelling argument,
which is of course predominant in probabilistic programming research and
Bayesian statistics more generally, is that a practitioner, by means of his or
her ingenuity, is able to suss out the essence of a problem and distill it into
a useful equation or program. Certainly there is something to this: science is
a matter of creating hypotheses, testing them against the world, and iterating
on that, and the “data modelling” procedure is absolutely scientific in
principle. Moreover, with a hat tip to Max Dama, one often wants to
impose a lot of structure on a problem, especially if the problem is in a
domain where there is a tremendous amount of noise. There are many areas where
this approach is just the thing one is looking for.
That said, it seems to me that a lot of the data modelling-flavoured side of
probabilistic programming, Bayesian nonparametrics, etc., is to some degree
geared more towards being, uh, “research paper friendly” than anything else.
These are extremely seductive areas for curious folks who like to play at the
intersection of math, statistics, and computer science (raises hand), and one
can easily spend a lifetime chasing this or that exquisite theoretical
construct into any number of rabbit holes. But at the end of the day, the data
modelling culture, per Breiman:
.. has at its heart the belief that a statistician, by imagination
and by looking at the data, can invent a reasonably good parametric class of
models for a complex mechanism devised by nature.
Certainly the traditional statistics that Breiman wrote about in 2001 was very
different from probabilistic programming and similar fields in 2018. But I
think there is the same element of hubris in them, and to some extent, a
similar dissociation from reality. I have cultivated some of the applied bent
of a Breiman, or a Dama, or a Locklin, so perhaps this should not be
too surprising.
I feel that the 2012-ish resurgence of neural networks jolted the machine
learning community out of a large-scale descent into some rather dubious
Bayesian nonparametrics research, which, much as I enjoy that subject area,
seemed more geared towards generating fun machine learning summer school
lectures and NIPS papers than actually getting much practical work done. I
can’t help but feel that probabilistic programming may share a few of those
same characteristics. When all is said and done, answering the question is
this stuff useful? often feels like a stretch.
So: onward & upward and all that, but my enthusiasm has been tempered somewhat,
is all.
Fini
Administrative headaches and the existential questions associated with grad
school aside, I had a great time working in this area for a few years, if in my
own aloof and eccentric way.
If you ever interacted with this area of my work, I hope you got some utility
out of it: research ideas, use of my code, or just some blog post that you
thought was interesting during a slow day at the office. If you’re working in
the area, or are considering it, I wish you success, whether your goal is to
build practical tools, or to publish sexy papers. :-)