A Nock Interpreter

31 Jan 2019

I wrote a little Nock interpreter called hnock some months ago and just yesterday updated it to support the latest version of Nock, 4K. Nock – the base layer VM of Urbit – is a very simple little “functional assembly language” of sorts. It is of particular interest in that it is capable of practical computation (indeed, it is Turing complete) but is not defined in terms of the lambda calculus. So, no variable naming and capture-avoidance and the like to deal with, which is kind of neat.

Nock (at 4K) itself has a small specification, repeated below. Famously, it fits on a t-shirt:

nock(a)             *a
[a b c]             [a [b c]]

?[a b]              0
?a                  1
+[a b]              +[a b]
+a                  1 + a
=[a a]              0
=[a b]              1

/[1 a]              a
/[2 a b]            a
/[3 a b]            b
/[(a + a) b]        /[2 /[a b]]
/[(a + a + 1) b]    /[3 /[a b]]
/a                  /a

#[1 a b]            a
#[(a + a) b c]      #[a [b /[(a + a + 1) c]] c]
#[(a + a + 1) b c]  #[a [/[(a + a) c] b] c]
#a                  #a

*[a [b c] d]        [*[a b c] *[a d]]

*[a 0 b]            /[b a]
*[a 1 b]            b
*[a 2 b c]          *[*[a b] *[a c]]
*[a 3 b]            ?*[a b]
*[a 4 b]            +*[a b]
*[a 5 b c]          =[*[a b] *[a c]]

*[a 6 b c d]        *[a *[[c d] 0 *[[2 3] 0 *[a 4 4 b]]]]
*[a 7 b c]          *[*[a b] c]
*[a 8 b c]          *[[*[a b] a] c]
*[a 9 b c]          *[*[a c] 2 [0 1] 0 b]
*[a 10 [b c] d]     #[b *[a c] *[a d]]

*[a 11 [b c] d]     *[[*[a c] *[a d]] 0 3]
*[a 11 b c]         *[a c]

*a                  *a

Perhaps you are a neophyte Haskeller and have never implemented an interpreter for a language before. Nock makes for an excellent target to practice on.

Expressions

A noun in Nock is either an atom, i.e. an unsigned integer, or a cell, i.e. an ordered pair of nouns:

data Noun =
    Atom Integer
  | Cell Noun Noun
  deriving Eq

Very simple. A Nock expression is then a noun, or some operator applied to a noun. Per the spec, the operators are denoted by ?, +, =, /, #, and *, and the fun, convenient, and authentically Urbit-y way to pronounce these are wut, lus, tis, net, hax, and tar respectively:

data Expr =
    Noun Noun
  | Wut Noun
  | Lus Noun
  | Tis Noun
  | Net Noun
  | Hax Noun
  | Tar Noun
  deriving Eq

So, equipped with the above definitions, you can write Nock expressions in Haskell. For example, the following expression:

*[57 [4 0 1]]

maps to:

Tar
  (Cell (Atom 57)
  (Cell (Atom 4)
  (Cell (Atom 0) (Atom 1))))

Note per the spec that cells associate to the right, thus something like [4 0 1] maps to [4 [0 1]], and thus Cell (Atom 4) (Cell (Atom 0) (Atom 1)).

Evaluation and Semantics

For evaluation one can more or less copy out the production rules from the Nock spec. We can use an Either type to denote a lack of defined semantics:

data Error = Error Noun
  deriving Show

type Possibly = Either Error

and then simply define the various evaluation functions appropriately. ‘wut’, for example, checks if something is a cell:

wut :: Noun -> Possibly Noun
wut noun = return $ case noun of
  Cell {} -> Atom 0
  Atom {} -> Atom 1

(note that in Nock and Urbit more generally, 0 denotes ‘true’).

‘tis’ checks if the elements of a cell are equal:

tis :: Noun -> Possibly Noun
tis noun = case noun of
  Atom {}  -> Left (Error noun)
  Cell m n -> return $
    if   m == n
    then Atom 0
    else Atom 1

And so on. There are other operators for lookups, substitution, et cetera. The most involved operator is ‘tar’, which constitutes the Nock interpreter proper. One can simply match on the cases appropriately:

tar :: Noun -> Possibly Noun
tar noun = case noun of
  Cell a (Cell (Cell b c) d) -> do
    tard0 <- tar (Cell a (Cell b c))
    tard1 <- tar (Cell a d)
    return (Cell tard0 tard1)

  Cell a (Cell (Atom 0) b) ->
    net (Cell b a)

  Cell _ (Cell (Atom 1) b) ->
    return b

  Cell a (Cell (Atom 2) (Cell b c)) -> do
    tard0 <- tar (Cell a b)
    tard1 <- tar (Cell a c)
    tar (Cell tard0 tard1)

  -- ... and so on

It is particularly useful to look at ‘tar’ to get an idea of what is going on in Nock. One is always evaluating a noun to another noun. Semantics are only defined when the noun being evaluated is a cell, and that cell is said to have the structure:

[subject formula]

The subject is essentially the environment, reified explicitly. The formula is the code to execute against the subject.

The resulting noun is called a ‘product’. So, evaluation proceeds as:

[subject formula] -> product

In any valid application of ‘tar’, i.e. for some *[x y], ‘x’ is the subject, and ‘y’ is the formula. The first case of ‘tar’, for example, is defined via:

*[a 0 b]            /[b a]

Recall that cells in Nock are right-associative, so [a 0 b] is really [a [0 b]]. In other words: a is the subject, [0 b] is the formula. The formula [0 b], read as ‘Nock-zero’, means “look up the value at address ‘b’ in the subject,” the subject here being ‘a’. So *[a 0 b] means “look up the value at address ‘b’ in ‘a’.”

Other formulas in the various ‘tar’ cases denote additional useful language features. A formula [1 b] is the constant function, [6 b c d] is a conditional, [7 b c] is function (viz. formula) composition, and so on. There is a Turing-complete amount of power packed onto that t-shirt!

In any case, once one has constructed all of these requisite evaluation functions, we can stitch them together in a single evaluator:

eval :: Expr -> Possibly Noun
eval expr = case expr of
  Noun noun -> return noun
  Wut  e    -> wut e
  Lus  e    -> lus e
  Tis  e    -> tis e
  Net  e    -> net e
  Hax  e    -> hax e
  Tar  e    -> tar e

Parsing

Writing a combinator-based parser is an easy job. You can use parsec, for example, and define your combinators as follows.

Given low-level ‘atom’ and ‘cell’ parsers, a noun is just an atom or a cell:

noun :: Monad m => P.ParsecT T.Text u m Noun
noun =
      P.try cell
  <|> atom

and an expression is either an operator, followed by an noun, or just a noun:

expr :: Monad m => P.ParsecT T.Text u m Expr
expr =
      P.try operator
  <|> fmap Noun noun

operator :: Monad m => P.ParsecT T.Text u m Expr
operator = do
  op <- P.oneOf "?+=/#*"
  case op of
    '?' -> fmap Wut noun
    '+' -> fmap Lus noun
    '=' -> fmap Tis noun
    '/' -> fmap Net noun
    '#' -> fmap Hax noun
    '*' -> fmap Tar noun
    _   -> fail "op: bad token"

Note that we don’t allow expressions like the following, taken from the Nock spec:

*[a *[[c d] 0 *[[2 3] 0 *[a 4 4 b]]]]

These are used to define the production rules, but are not valid Nock expressions per se.

The end parser is simply:

parse :: T.Text -> Either P.ParseError Expr
parse = P.runParser expr [] "input"

Example

The final interpreter is pretty simple. Here we distinguish between cases in order to report either parsing or evaluation errors in basic fashion, respectively:

hnock :: T.Text -> Noun
hnock input = case parse input of
  Left perr -> error (show perr)
  Right ex  -> case eval ex of
    Left err -> error (show err)
    Right e  -> e

It’s easy to test. Take the simple Nock-zero example given previously, where *[a 0 b] means “look up the value at address ‘b’ in ‘a’”. Nock’s addressing scheme is simple; address ‘2’ means the leftmost component of a cell, and ‘3’ means the rightmost component. For every additional cell you want to recurse into, you just multiply by two and add one as is necessary. Address ‘1’ is just the cell itself.

So, let’s test:

~ hnock "*[[[1 2] [3 4]] [0 1]"
[[1 2] [3 4]]

Again, the formula [0 1] means look up at address one, which is the cell itself.

~ hnock "*[[[1 2] [3 4]] [0 2]]"
[1 2]

The formula [0 2] means look up at address 2, which is the leftmost component of the cell.

~ hnock "*[[[1 2] [3 4]] [0 7]]"
4

To understand [0 7], we first look up the rightmost component [3 4] at address 3, then multiply by two and add one for the rightmost component of that.

The most famous Nock learner’s problem is to implement a function for decrementing an atom (if you could do this in ~2010 you could receive an Urbit galaxy for it, which is now probably worth a serious chunk of change). Nock’s only arithmetic operator is increment, so the trick is to decrement by actually counting up from zero.

(N.b. in practice this is not how one actually evaluates decrement in Nock. The compiler is ‘jetted’, such that alternate implementations, e.g. the CPU’s decrement instruction directly, can be used to evaluate semantically-identical expressions.)

But the idea is simple enough. To warm up, we can implement this kind of decrement in Haskell. Don’t handle the underflow condition, just to keep things simple:

dec :: Integer -> Integer
dec m =
  let loop n
        | succ n == m = n
        | otherwise   = loop (succ n)
  in  loop 0

Look at the language features we’re using to accomplish this. There’s:

Variable naming / declaration, for the ‘m’, ‘loop’, and ‘n’ arguments
Integer increment, via the built-in ‘succ’
Equality testing, via ‘==’
Conditional expressions, via the ‘|’ guards
A function call, in ‘loop 0’

Let’s try and duplicate this in Nock, step by step. We’ll want analogues for all of our declared variables, to start.

A Nock formula that simply returns the subject it’s evaluated against is easy: [0 1]. We used it above.

~ hnock "*[10 [0 1]]"
10

That’s the equivalent of our ‘m’ declaration in the Haskell code.

Similarly, we can just ignore the subject and return a ‘0’ via [1 0].

~ hnock "*[10 [1 0]]"
0

That’s the initial value, 0, that appears in ‘loop 0’.

We can use Nock-8 to pair these together. It evaluates a formula against the subject, and then uses the original subject, augmented with the product, to evaluate another one:

~ hnock "*[10 [8 [1 0] [0 1]]]"
[0 10]

The last thing we need define is the ‘loop’ function itself. We’re going to place it at the head of the subject, i.e. at address two. Our ‘n’ and ‘m’ variables will then be at addresses six and seven, respectively. When we define our ‘loop’ analogue, we just need to refer to those addresses, instead of our variable names.

(N.b. Nock’s addressing scheme seems more or less equivalent to a tree-flavoured variant of de Bruijn indices.)

The ‘loop’ function in the Haskell code tests to see the successor of ‘n’ is equal to ‘m’, and if so, returns ‘n’. Otherwise it loops on the successor of ‘n’.

Nock-4 computes the successor. Remember that ‘n’ is at address six, so our analogue to ‘succ n’ is [4 0 6]. We want to check if it’s equal to ‘m’, at address seven, gotten via [0 7]. The equality check is then handled by Nock-5:

[5 [4 0 6] [0 7]]

Nock-6 is the conditional operator. If the equality check is true, we just want to return ‘n’ via [0 6]. The whole formula will look like this:

[6 [5 [4 0 6] [0 7]] [0 6] <loop on successor of n>]

To loop, we’ll use Nock-2. We’ll compute a new subject – consisting of our ‘loop’ analogue at address two, an updated ‘n’ at address six, and our original ‘m’ at address seven – and then evaluate the same looping formula against it. In Nock, the formula is:

[2 [[0 2] [4 0 6] [0 7]] [0 2]]

And our ‘loop’ analogue in full is thus:

[6 [5 [4 0 6] [0 7]] [0 6] [2 [[0 2] [4 0 6] [0 7]] [0 2]]]

Finally, we now need to actually store this in the subject. Again, we’ll ignore the subject with Nock-1, and use Nock-8 to tuple the formula up with the ‘n’ and ‘m’ analogues:

(N.b. at this point we’re pretty much at line noise – spacing it out doesn’t make it that much more readable, but it does avoid word wrap.)

[8
  [1 0]
  [8
    [1
      [6
        [5 [4 0 6] [0 7]]
        [0 6]
        [2 [[0 2] [4 0 6] [0 7]] [0 2]]
      ]
    ]
    [2 [0 1] [0 2]]
  ]
]

Let’s check if it works. We’re embedded in Haskell, after all, so let’s get some support from our host, for convenience:

~ let dec = \subj -> "*[" <> subj <> " [8 [1 0] [8 [1 [6 [5 [4 0 6] [0 7]] [0 6] [2 [[0 2] [4 0 6] [0 7]] [0 2]]]] [2 [0 1] [0 2]]]]]"
~ hnock (dec "10")
9
~ hnock (dec "25")
24
~ hnock (dec "200")
199

Phew!

Fin

Nock makes for an interesting “functional assembly language” of sorts. Its spec is so simple that writing a naïve interpreter is more or less a process of copying out the production rules. You can do it without a lot of thought.

The time-honoured decrement-writing exercise, on the other hand, really makes one think about Nock. That the environment is always explicit, you can access it via a nice addressing scheme, that you can refer to functions via pointers (viz. addresses), etc. It’s pretty intuitive, when you dig into the mechanics.

In any case – writing a decrement (or similar function) and writing your own interpreter to test it is a fun exercise. From a practical perspective, it also gives one a lot of intuition about programming in Hoon – it’s probably not too far from the truth to compare the relationship between Nock and Hoon to that between, say, x86 and C. If one is programming in Hoon, it can’t hurt to to know your Nock!

jtobin.io

A Nock Interpreter

Expressions

Evaluation and Semantics

Parsing

Example

Fin