Kelvin Versioning

Long ago, in the distant past, Curtis introduced the idea of kelvin versioning in an informal blog post about Urbit. Imagining the idea of an ancient and long-frozen form of Martian computing, he described this versioning scheme as follows:

Some standards are extensible or versionable, but some are not. ASCII, for instance, is perma-frozen. So is IPv4 (its relationship to IPv6 is little more than nominal - if they were really the same protocol, they’d have the same ethertype). Moreover, many standards render themselves incompatible in practice through excessive enthusiasm for extensibility. They may not be perma-frozen, but they probably should be.

The true, Martian way to perma-freeze a system is what I call Kelvin versioning. In Kelvin versioning, releases count down by integer degrees Kelvin. At absolute zero, the system can no longer be changed. At 1K, one more modification is possible. And so on. For instance, Nock is at 9K. It might change, though it probably won’t. Nouns themselves are at 0K - it is impossible to imagine changing anything about those three sentences.

Understood in this way, kelvin versioning is very simple. One simply counts downwards, and at absolute zero (i.e. 0K) no other releases are legal. It is no more than a versioning scheme designed for abstract components that should eventually freeze.

Many years later, the Urbit blog described kelvin versioning once more in the post Towards a Frozen Operating System. This presented a significant refinement of the original scheme, introducing both recursive and so-called “telescoping” mechanics to it:

The right way for this trunk to approach absolute zero is to “telescope” its Kelvin versions. The rules of telescoping are simple:

If tool B sits on platform A, either both A and B must be at absolute zero, or B must be warmer than A.

Whenever the temperature of A (the platform) declines, the temperature of B (the tool) must also decline.

B must state the version of A it was developed against. A, when loading B, must state its own current version, and the warmest version of itself with which it’s backward-compatible.

Of course, if B itself is a platform on which some higher-level tool C depends, it must follow the same constraints recursively.

This is more or less a complete characterisation of kelvin versioning, but it’s still not quite precise enough. If one looks at other versioning schemes that try to communicate some specific semantic content (the most obvious example being semver), it’s obvious that they take great pains to be formal and precise about their mechanics.

Experience has demonstrated to me that such formality is necessary. Even the excerpt above has proven to be ambiguous or underspecified re: the details of various situations or corner cases that one might run into. These confusions can be resolved by a rigorous protocol specification, which, in this case isn’t very difficult to put together.

Kelvin versioning and its use in Urbit is the subject of the currently-evolving UP9, recent proposed updates to which have not yet been ratified. The following is my own personal take on and simple formal specification of kelvin versioning – I believe it resolves any major ambiguities that the original descriptions may have introduced.

Kelvin Versioning (Specification)

(The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in this document are to be interpreted as described in RFC 2119.)

For any component A following kelvin versioning,

  1. A’s version SHALL be a nonnegative integer.

  2. A, at any specific version, MUST NOT be modified after release.

  3. At version 0, new versions of A MUST NOT be released.

  4. New releases of A MUST be assigned a new version, and this version MUST be strictly less than the previous one.

  5. If A supports another component B that also follows kelvin versioning, then:

    • Either both A and B MUST be at version 0, or B’s version MUST be strictly greater than A’s version.
    • If a new version of A is released and that version supports B, then a new version of B MUST be released.

These rules apply recursively for any kelvin-versioned component C that is supported by B, and so on.

Examples

Examples are particularly useful here, so let me go through a few.

Let’s take the following four components, sitting in three layers, as a running example. Here’s our initial state:

A       10K
  B     20K
    C   21K
    D   30K

So we have A at 10K supporting B at 20K. B in turn supports both C at 21K and D at 30K.

State 1

Imagine we have some patches lying around for D and want to release a new version of it. That’s easy to do; we push out a new version of D. In this case it will have version one less than 30, i.e. 29K:

A       10K
  B     20K
    C   21K
    D   29K  <-- cools from 30K to 29K

Easy peasy. This is the most trivial example.

The only possible point of confusion here is: well, what kind of change warrants a version decrement? And the answer is: any (released) change whatsoever. Anything with an associated kelvin version is immutable after being released at that version, analogous to how things are done in any other versioning scheme.

State 2

For a second example, imagine that we now have completed a major refactoring of A and want to release a new version of that.

Since A supports B, releasing a new version of A obligates us to release a new version of B as well. And since B supports both C and D, we are obligated, recursively, to release new versions of those to boot.

The total effect of a new A release is thus the following:

A       9K   <-- cools from 10K to 9K
  B     19K  <-- cools from 20K to 19K
    C   20K  <-- cools from 21K to 20K
    D   28K  <-- cools from 29K to 28K

This demonstrates the recursive mechanic of kelvin versioning.

An interesting effect of the above mechanic, as described in Toward a Frozen Operating System is that anything that depends on (say) A, B, and C only needs to express its dependency on some version of C. Depending on C at e.g. 20K implicitly specifies a dependency on its supporting component, B, at 19K, and then A at 9K as well (since any change to A or B must also result in a change to C).

State 3

Now imagine that someone has contributed a performance enhancement to C, and we’d like to release a new version of that.

The interesting thing here is that we’re prohibited from releasing a new version of C. Recall our current state:

A       9K
  B     19K
    C   20K  <-- one degree K warmer than B
    D   28K

Releasing a new version of C would require us to cool it by at least one kelvin, resulting in the warmest possible version of 19K. But since its supporting component, B, is already at 19K, this would constitute an illegal state under kelvin versioning. A supporting component must always be strictly cooler than anything it supports, or be at absolute zero conjointly with anything it supports.

This illustrates the so-called telescoping mechanic of kelvin versioning – one is to imagine one of those handheld telescopes made of segments that flatten into each other when collapsed.

State 4

But now, say that we’re finally going to release our new API for B. We release a new version of B, this one at 18K, which obligates us to in turn release new versions of C and D:

A       9K
  B     18K  <-- cools from 19K to 18K
    C   19K  <-- cools from 20K to 19K
    D   27K  <-- cools from 28K to 27K

In particular, the new version of B gives us the necessary space to release a new version of C, and, indeed, obligates us to release a new version of it. In releasing C at 19K, presumably we’d include the performance enhancement that we were prohibited from releasing in State 3.

State 5

A final example that’s simple, but useful to illustrate explicitly, involves introducing a new component, or replacing a component entirely.

For example: say that we’ve decided to deprecate C and D and replace them with a single new component, E, supported by B. This is as easy as it sounds:

A       9K
  B     18K
    E   40K  <-- initial release at 40K

We just swap in E at the desired initial kelvin version. The initial kelvin can be chosen arbitrarily; the only restriction is that it be warmer than the the component that supports it (or be at absolute zero conjointly with it).

It’s important to remember that, in this component-resolution of kelvin versioning, there is no notion of the “total temperature” of the stack. Some third party could write another component, F, supported by E, with initial version at 1000K, for example. It doesn’t introduce any extra burden or responsibility on the maintainers of components A through E.

Collective Kelvin Versioning

So – all that is well and good for what I’ll call the component-level mechanics of kelvin versioning. But it’s useful to touch on a related construct, that of collectively versioning a stack of kelvin-versioned components. This minor innovation on Curtis’s original idea was put together by myself and my colleague Philip Monk.

If you have a collection of kelvin-versioned things, e.g. the things in our initial state from the prior examples:

A       10K
  B     20K
    C   21K
    D   30K

then you may want to release all these things, together, as some abstract thing. Notably, this happens in the case of the Urbit kernel, where the stack consists of a functional VM, an unapologetically amathematical purely functional programming language, special-purpose kernel modules, etc. It’s useful to be able to describe the whole kernel with a single version number.

To do this in a consistent way, you can select one component in your stack to serve as a primary index of sorts, and then capture everything it supports via a patch-like, monotonically decreasing “fractional temperature” suffix.

This is best illustrated via example. If we choose B as our primary index in the initial state above, for example, we could version the stack collectively as 20.9K. B provides the 20K, and everything it supports is just lumped into the “patch version” 9.

If we then consider the example given in State 1, i.e.:

A       10K
  B     20K
    C   21K
    D   29K

in which D has cooled by a degree kelvin, then we can version this stack collectively as 20.8K. If we were to then release a new version of C at 20K, then we could release the stack collectively as 20.7K. And so on.

There is no strictly prescribed schedule as to how to decrease the fractional temperature, but the following schedule is recommended:

.9, .8, .7, .., .1, .01, .001, .0001, ..

Similarly, the fractional temperature should reset to .9 whenever the primary index cools. If we consider the State 2, for example, where a new release of A led to every other component in the stack cooling, we had this:

A       9K
  B     19K
    C   20K
    D   28K

Note that B has cooled by a kelvin, so we would version this stack collectively as 19.9K. The primary index has decreased by a kelvin, and the fractional temperature has been reset to .9.

While I think examples illustrate this collective scheme most clearly, after my schpeel about the pitfalls of ambiguity it would be remiss of me not to include a more formal spec:

Collective Kelvin Versioning (Specification)

(The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in this document are to be interpreted as described in RFC 2119.)

For a collection of kelvin-versioned components K:

  1. K’s version SHALL be characterised by a primary index, chosen from a component in K, and and a real number in the interval [0, 1) (the “fractional temperature”), determined by all components that the primary index component supports.

    The fractional temperature MAY be 0 only if the primary index’s version is 0.

  2. K, at any particular version, MUST NOT be modified after release.

  3. At primary index version 0 and fractional temperature 0, new versions of K MUST NOT be released.

  4. New releases of K MUST be assigned a new version, and this version MUST be strictly less than the previous one.

  5. When a new release of K includes new versions of any component supported by the primary index, but not a new version of the primary index proper, its fractional temperature MUST be less than the previous version.

    Given constant primary index versions, fractional temperatures corresponding to new releases SHOULD decrease according to the following schedule:

    .9, .8, .7, .., .1, .01, .001, .0001, ..
    
  6. When a new release of K includes a new version of the primary index, the fractional temperature of SHOULD be reset to 9.

  7. New versions of K MAY be indexed by components other than the primary index (i.e., K may be “reindexed” at any point). However, the new chosen component MUST either be colder than the primary index it replaces, or be at version 0 conjointly with the primary index it replaces.

Etc.

In my experience, the major concern in adopting a kelvin versioning scheme is that one will accidentally initialise everything with a set of temperatures (i.e. versions) that are too cold (i.e. too close to 0), and thus burn through too many version numbers too quickly on the path to freezing. To alleviate this, it helps to remember that one has an infinite number of release candidates available for every component at every temperature.

The convention around release candidates is just to prepend a suffix to the next release version along the lines of .rc1, .rc2, etc. One should feel comfortable using these liberally, iterating through release candidates as necessary before finally committing to a new version at a properly cooler temperature.

The applications that might want to adopt kelvin versioning are probably pretty limited, and may indeed even be restricted to the Urbit kernel itself (Urbit has been described by some as “that operating system with kernel that eventually reaches absolute zero under kelvin versioning”). Nonetheless: I believe this scheme to certainly be more than a mere marketing gimmick or what have you, and, at minimum, it makes for an interesting change of pace from semver.

Email for l33t h4x0rz

A couple of people recently asked about my email setup, so I figured it might be best to simply document some of it here.

I run my own mail server for jtobin.io, plus another domain or two, and usually wind up interacting with gmail for work. I use offlineimap to fetch and sync mail with these remote servers, msmtp and msmtpq to send mail, mutt as my MUA, notmuch for search, and tarsnap for backups.

There are other details; vim for writing emails, urlview for dealing with URLs, w3m for viewing HTML, pass for password storage, etc. etc. But the mail setup proper is as above.

I’ll just spell out some of the major config below, and will focus on the configuration that works with gmail, since that’s probably of broader appeal. You can get all the software for it via the following in nixpkgs:

mutt offlineimap msmtp notmuch notmuch-mutt

offlineimap

offlineimap is used to sync local and remote email; I use it to manually grab emails occasionally throughout the day. You could of course set it up to run automatically as a cron job or what have you, but I like deliberately fetching my email only when I actually want to deal with it.

Here’s a tweaked version of one of my .offlineimaprc files:

[general]
accounts = work

[Account work]
localrepository = work-local
remoterepository = work-remote
postsynchook = notmuch new

[Repository work-local]
type = Maildir
localfolders = ~/mail/work
sep = /
restoreatime = no

[Repository work-remote]
type = Gmail
remoteuser = FIXME_user@domain.tld
remotepass = FIXME_google_app_password
realdelete = no
ssl = yes
sslcacertfile = /usr/local/etc/openssl/cert.pem
folderfilter = lambda folder: folder not in\
  ['[Gmail]/All Mail', '[Gmail]/Important', '[Gmail]/Starred']

You should be able to figure out the gist of this. Pay particular attention to the ‘remoteuser’, ‘remotepass’, and ‘folderfilter’ options. For ‘remotepass’ in particular you’ll want to generate an app-specific password from Google. The ‘folderfilter’ option lets you specify the gmail folders that you actually want to sync; folder in [..] and folder not in [..] are probably all you’ll want here.

If you don’t want to store your password in cleartext, and instead want to grab it from an encrypted store, you can use the ‘remotepasseval’ option. I don’t bother with this for Google accounts that have app-specific passwords, but do for others.

This involves a little bit of extra setup. First, you can make some Python functions available to the config file with ‘pythonfile’:

[general]
accounts = work
pythonfile = ~/.offlineimap.py

Here’s a version of that file that I keep, which grabs the desired from pass(1):

#! /usr/bin/env python2
from subprocess import check_output

def get_pass():
    return check_output("pass FIXME_PASSWORD", shell=True).strip("\n")

Then you can just call the get_pass function in ‘remotepasseval’ back in .offlineimaprc:

[Repository work-remote]
type = Gmail
remoteuser = FIXME_user@domain.tld
remotepasseval = get_pass()
realdelete = no
ssl = yes
sslcacertfile = /usr/local/etc/openssl/cert.pem
folderfilter = lambda folder: folder not in\
  ['[Gmail]/All Mail', '[Gmail]/Important', '[Gmail]/Starred']

When you’ve got this set up, you should just be able to run offlineimap to fetch your email. If you maintain multiple configuration files, it’s helpful to specify a specific one using -c, e.g. offlineimap -c .offlineimaprc-foo.

msmtp, msmtpq

msmtp is used to send emails. It’s a very simple SMTP client. Here’s a version of my .msmtprc:

defaults
auth           on
tls            on
tls_starttls   on
tls_trust_file /usr/local/etc/openssl/cert.pem
logfile        ~/.msmtp.log

account work
host           smtp.gmail.com
port           587
from           FIXME_user@domain.tld
user           FIXME_user@domain.tld
password       FIXME_google_app_password

account default: work

Again, very simple.

You can do a similar thing here if you don’t want to store passwords in cleartext. Just use ‘passwordeval’ and the desired shell command directly, e.g.:

account work
host           smtp.gmail.com
port           587
from           FIXME_user@domain.tld
user           FIXME_user@domain.tld
passwordeval   "pass FIXME_PASSWORD"

I occasionally like to work offline, so I use msmtpq to queue up emails to send later. Normally you don’t have to deal with any of this directly, but occasionally it’s nice to be able to check the queue. You can do that with msmtp-queue -d:

$ msmtp-queue -d

  no mail in queue

If there is something stuck in the queue, you can force it to send with msmtp-queue -r or -R. FWIW, this has happened to me while interacting with gmail under a VPN in the past.

mutt

Mutt is a fantastic MUA. Its tagline is “all mail clients suck, this one just sucks less,” but I really love mutt. It may come as a surprise that working with email can be a pleasure, especially if you’re accustomed to working with clunky webmail UIs, but mutt makes it so.

Here’s a pruned-down version of one of my .muttrc files:

set realname      = "MyReal Name"
set from          = "user@domain.tld"
set use_from      = yes
set envelope_from = yes

set mbox_type     = Maildir
set sendmail      = "~/.nix-profile/bin/msmtpq -a work"
set sendmail_wait = -1
set folder        = "~/mail/work"
set spoolfile     = "+INBOX"
set record        = "+[Gmail]/Sent Mail"
set postponed     = "+[Gmail]/Drafts"

set smtp_pass = "FIXME_google_app_password"
set imap_pass = "FIXME_google_app_password"

set signature = "~/.mutt/.signature-work"

set editor = "vim"

set sort     = threads
set sort_aux = reverse-last-date-received

set pgp_default_key          = "my_default_pgp@key"

set crypt_use_gpgme          = yes
set crypt_autosign           = yes
set crypt_replysign          = yes
set crypt_replyencrypt       = yes
set crypt_replysignencrypted = yes

bind index gg first-entry
bind index G last-entry
bind index B imap-fetch-mail

bind index - collapse-thread
bind index _ collapse-all

set alias_file = ~/.mutt/aliases
set sort_alias = alias
set reverse_alias = yes
source $alias_file

auto_view text/html
alternative_order text/plain text/enriched text/html

subscribe my_favourite@mailing.list

macro index <F8> \
"<enter-command>set my_old_pipe_decode=\$pipe_decode my_old_wait_key=\$wait_key nopipe_decode nowait_key<enter>\
<shell-escape>notmuch-mutt -r --prompt search<enter>\
<change-folder-readonly>`echo ${XDG_CACHE_HOME:-$HOME/.cache}/notmuch/mutt/results`<enter>\
<enter-command>set pipe_decode=\$my_old_pipe_decode wait_key=\$my_old_wait_key<enter>i" \
"notmuch: search mail"

macro index <F9> \
"<enter-command>set my_old_pipe_decode=\$pipe_decode my_old_wait_key=\$wait_key nopipe_decode nowait_key<enter>\
<pipe-message>notmuch-mutt -r thread<enter>\
<change-folder-readonly>`echo ${XDG_CACHE_HOME:-$HOME/.cache}/notmuch/mutt/results`<enter>\
<enter-command>set pipe_decode=\$my_old_pipe_decode wait_key=\$my_old_wait_key<enter>" \
"notmuch: reconstruct thread"

macro index l "<enter-command>unset wait_key<enter><shell-escape>read -p 'notmuch query: ' x; echo \$x >~/.cache/mutt_terms<enter><limit>~i \"\`notmuch search --output=messages \$(cat ~/.cache/mutt_terms) | head -n 600 | perl -le '@a=<>;chomp@a;s/\^id:// for@a;$,=\"|\";print@a'\`\"<enter>" "show only messages matching a notmuch pattern"

# patch rendering
# https://karelzak.blogspot.com/2010/02/highlighted-patches-inside-mutt.html
color body green default "^diff \-.*"
color body green default "^index [a-f0-9].*"
color body green default "^\-\-\- .*"
color body green default "^[\+]{3} .*"
color body cyan default "^[\+][^\+]+.*"
color body red  default "^\-[^\-]+.*"
color body brightblue default "^@@ .*"

# vim: ft=muttrc:

Some comments on all that:

set mbox_type     = Maildir
set sendmail      = "~/.nix-profile/bin/msmtpq -a work"
set sendmail_wait = -1
set folder        = "~/mail/work"
set spoolfile     = "+INBOX"
set record        = "+[Gmail]/Sent Mail"
set postponed     = "+[Gmail]/Drafts"

Note here that we’re specifying msmtpq as our sendmail program. The -a work command here refers to the account defined in your .msmtprc file, so if you change the name of it there, you have to do it here as well. Ditto for the folder.

(If you’re tweaking these config files for your own use, I’d recommend just substituting all instances of ‘work’ with your own preferred account name.)

The negative ‘sendmail_wait’ value handles queueing mails up appropriately when offline, IIRC.

set smtp_pass = "FIXME_google_app_password"
set imap_pass = "FIXME_google_app_password"

Here are the usual cleartext app passwords. If you want to store them encrypted, there’s a usual method for doing that: add the following to the top of your .muttrc:

source "gpg -d ~/.mutt/my-passwords.gpg |"

where .mutt/my-passwords.gpg should be the above smtp_pass and imap_pass assignments, encrypted with your desired private key.

Continuing with the file at hand:

set signature = "~/.mutt/.signature-work"

set editor = "vim"

These should be self-explanatory. The signature file should just contain the signature you want appended to your mails (it will be appended under a pair of dashes). And if you want to use some other editor to compose your emails, just specify it here.

set pgp_default_key          = "my_default_pgp@key"

set crypt_use_gpgme          = yes
set crypt_autosign           = yes
set crypt_replysign          = yes
set crypt_replyencrypt       = yes
set crypt_replysignencrypted = yes

Mutt is one of the few programs that has great built-in support for PGP. It can easily encrypt, decrypt, and sign messages, grab public keys, etc. Here you can see that I’ve set it to autosign messages, reply to encrypted messages with encrypted messages, and so on.

bind index gg first-entry
bind index G last-entry
bind index B imap-fetch-mail

bind index - collapse-thread
bind index _ collapse-all

These are a few key bindings that I find helpful. The first bunch are familiar to vim users and are useful for navigating around; the second two are really useful for compressing or expanding the view of your mailbox.

set alias_file = ~/.mutt/aliases
set sort_alias = alias
set reverse_alias = yes
source $alias_file

auto_view text/html
alternative_order text/plain text/enriched text/html

The alias file lets you define common shortcuts for single or multiple addresses. I get a lot of use out of multiple address aliases, e.g.:

alias chums socrates@ago.ra, plato@acade.my, aristotle@lyce.um

The MIME type stuff below the alias config is just a sane set of defaults for viewing common mail formats.

subscribe my_favourite@mailing.list

Mutt makes interacting with mailing lists very easy just by default, but you can also indicate addresses that you’re subscribed to, as above, to unlock a few extra features for them (‘list reply’ being a central one). To tell mutt that you’re subscribed to haskell-cafe, for example, you’d use:

subscribe haskell-cafe@haskell.org

The three longer macros that follow are for notmuch. I really only find myself using the last one, ‘l’, for search. FWIW, notmuch’s search functionality is fantastic; I’ve found it to be more useful than gmail’s web UI search, I think.

The patch rendering stuff at the end is just a collection of heuristics for rendering in-body .patch files well. This is useful if you subscribe to a patch-heavy mailing list, e.g. LKML or git@vger.kernel.org, or if you just want to be able to better-communicate about diffs in your day-to-day emails with your buddies.

Fin

There are obviously endless ways you can configure all this stuff, especially mutt, and common usage patterns that you’ll quickly find yourself falling into. But whatever you find those to be, the above should at least get you up and running pretty quickly with 80% of the desired feature set.

Basic Hoonery

In my last post I first introduced hnock, a little interpreter for Nock, and then demonstrated it on a hand-rolled decrement function. In this post I’ll look at how one can handle the same (contrived, but illustrative) task in Hoon.

Hoon is the higher- or application-level programming language for working with Arvo, the operating system of Urbit. The best way I can describe it is something like “Haskell meets C meets J meets the environment is always explicit.”

As a typed, functional language, Hoon feels surprisingly low-level. One is never allocating or deallocating memory explicitly when programming in Hoon, but the experience somehow feels similar to working in C. The idea is that the language should be simple and straightforward and support a fairly limited level of abstraction. There are the usual low-level functional idioms (map, reduce, etc.), as well as a structural type system to keep the programmer honest, but at its core, Hoon is something of a functional Go (a language which, I happen to think, is not good).

It’s not a complex language, like Scala or Rust, nor a language that overtly supports sky-high abstraction, like Haskell or Idris. Hoon is supposed to exist at a sweet spot for getting work done. And I am at least willing to buy the argument that it is pretty good for getting work done in Urbit.

Recall our naïve decrement function in Haskell. It looked like this:

dec :: Integer -> Integer
dec m =
  let loop n
        | succ n == m = n
        | otherwise   = loop (succ n)
  in  loop 0

Let’s look at a number of ways to write this in Hoon, showing off some of the most important Hoon programming concepts in the process.

Cores

Here’s a Hoon version of decrement. Note that to the uninitiated, Hoon looks gnarly:

|=  m=@
=/  n=@  0
=/  loop
|%
  ++  recur
    ?:  =(+(n) m)
      n
    recur(n +(n))
  --
recur:loop

We can read it as follows:

  • Define a function that takes an argument, ‘m’, having type atom (recall that an atom is an unsigned integer).
  • Define a local variable called ‘n’, having type atom and value 0, and add it to the environment (or, if you recall our Nock terminology, to the subject).
  • Define a local variable called ‘loop’, with precise definition to follow, and add it to the environment.
  • ‘loop’ is a core, i.e. more or less a named collection of functions. Define one such function (or arm), ‘recur’, that checks to see if the increment of ‘n’ is equal to ‘m’, returning ‘n’ if so, and calling itself, except with the value of ‘n’ in the environment changed to ‘n + 1’, if not.
  • Evaluate ‘recur’ as defined in ‘loop’.

(To test this, you can enter the Hoon line-by-line into the Arvo dojo. Just preface it with something like =core-dec to give it a name, and call it via e.g. (core-dec 20).)

Hoon may appear to be a write-only language, though I’ve found this to not necessarily be the case (just to note, at present I’ve read more Hoon code than I’ve written). Good Hoon has a terse and very vertical style. The principle that keeps it readable is that, roughly, each line should contain one important logical operation. These operations are denoted by runes, the =/ and ?: and similar ASCII digraphs sprinkled along the left hand columns of the above example. This makes it look similar to e.g. J – a language I have long loved, but never mastered – although in J the rough ‘one operator per line’ convention is not typically in play.

In addition to the standard digraph runes, there is also a healthy dose of ‘irregular’ syntax in most Hoon code for simple operations that one uses frequently. Examples used above include =(a b) for equality testing, +(n) for incrementing an atom, and foo(a b) for evaluating ‘foo’ with the value of ‘a’ in the environment changed to ‘b’. Each of these could be replaced with a more standard rune-based expression, though for such operations the extra verbosity is not usually warranted.

Cores like ‘loop’ seem, to me, to be the mainstay workhorse of Hoon programming. A core is more or less a structure, or object, or dictionary, or whatever, of functions. One defines them liberally, constructs a subject (i.e. environment) to suit, and then evaluates them, or some part of them, against the subject.

To be more precise, a core is a Nock expression; like every non-atomic value in Nock, it is a tree. Starting from the cell [l r], the left subtree, ‘l’, is a tree of Nock formulas (i.e. the functions, like ‘recur’, defined in the core). The right subtree, ‘r’ is all the data required to evaluate those Nock formulas. The traditional name for the left subtree, ‘l’, is the battery of the core; the traditional name for the right subtree is the payload.

One is always building up a local environment in Hoon and then evaluating some value against it. Aside from the arm ‘recur’, the core ‘loop’ also contains in its payload the values ‘m’ and ‘n’. The expression ‘recur:loop’ – irregular syntax for =< recur loop – means “use ‘loop’ as the environment and evaluate ‘recur’.” Et voilà, that’s how we get our decrement.

You’ll note that this should feel very similar to the way we defined decrement in Nock. Our hand-assembled Nock code, slightly cleaned up, looked like this:

[8
  [1 0]
  8
    [1
      6
        [5 [4 0 6] [0 7]]
        [0 6]
        2 [[0 2] [4 0 6] [0 7]] [0 2]
    ]
    2 [0 1] [0 2]
]

This formula, when evaluated against an atom subject, creates another subject from it, defining a ‘loop’ analogue that looks in specific addresses in the subject for itself, as well as the ‘m’ and ‘n’ variables, such that it produces the decrement of the original subject. Our Hoon code does much the same – every ‘top-level’ rune expression adds something to the subject, until we get to the final expression, ‘recur:loop’, which evaluates ‘recur’ against the subject, ‘loop’.

The advantage of Hoon, in comparison to Nock, is that we can work with names, instead of raw tree addresses, as well as with higher-level abstractions like cores. The difference between Hoon and Nock really is like the difference between C and assembly!

For what it’s worth, here is the compiled Nock corresponding to our above decrement function:

[8
  [1 0]
  8
    [8
      [1
        6
          [5 [4 0 6] 0 30]
          [0 6]
          9 2 10 [6 4 0 6] 0 1
      ]
      0 1
    ]
    7 [0 2] 9 2 0 1
]

It’s similar, though not identical, to our hand-rolled Nock. In particular, you can see that it is adding a constant conditional formula, including the familiar equality check, to the subject (note that the equality check, using Nock-5, refers to address 30 instead of 7 – presumably this is because I have more junk floating around in my dojo subject). Additionally, the formulas using Nock-9 and Nock-10 reduce to Nock-2 and Nock-0, just like our hand-rolled code does.

But our Hoon is doing more than the bespoke Nock version did, so we’re not getting quite the same code. Worth noting is the ‘extra’ use of Nock-8, which is presumably required because I’ve defined both ‘recur’, the looping function, and ‘loop’, the core to hold it, and the hand-rolled Nock obviously didn’t involve a core.

Doors

Here’s another way to write decrement, using another fundamental Hoon construct, the door:

|=  m=@
=/  loop
|_  n=@
  ++  recur
    ?:  =(+(n) m)
      n
    ~(recur ..recur +(n))
  --
~(recur loop 0)

A door is a core that takes an argument. Here we’ve used the |_ rune, instead than |%, to define ‘loop’, and note that it takes ‘n’ as an argument. So instead of ‘n’ being defined external to the core, as it was in the previous example, here we have to specify it explicitly when we call ‘recur’. Note that this is more similar to our Haskell example, in which ‘loop’ was defined as a function taking ‘n’ as an argument.

The two other novel things here are the ~(recur ..recur +(n)) and ~(recur loop 0) expressions, which actually turn out to be mostly the same thing. The syntax:

~(arm door argument)

is irregular, and means “evaluate ‘arm’ in ‘door’ using ‘argument’”. So in the last line, ~(recur loop 0) means “evaluate ‘recur’ in ‘loop’ with n set to 0.” In the definition of ‘recur’, on the other hand, we need to refer to the door that contains it, but are in the very process of defining that thing. The ‘..recur’ syntax means “the door that contains ‘recur’,” and is useful for exactly this task, given we can’t yet refer to ‘loop’. The syntax ~(recur ..recur +(n)) means “evaluate ‘recur’ in its parent door with n set to n + 1.”

Let’s check the compiled Nock of this version:

[8
  [8
    [1 0]
    [1
      6
        [5 [4 0 6] 0 30]
        [0 6]
        8
          [0 1]
          9 2 10 [6 7 [0 3] 4 0 6] 0 2
    ]
    0 1
  ]
  8
    [0 2]
    9 2 10 [6 7 [0 3] 1 0] 0 2
]

There’s even more going on here than in our core-implemented decrement, but doors are a generalisation of cores, so that’s to be expected.

Hoon has special support, though, for one-armed doors. This is precisely how functions (also called gates or traps, depending on the context) are implemented in Hoon. The following is probably the most idiomatic version of naïve decrement:

|=  m=@
=/  n  0
|-
?:  =(+(n) m)
  n
$(n +(n))

The |= rune that we’ve been using throughout these examples really defines a door, taking the specified argument, with a single arm called ‘$’. The |- rune here does the same, except it immediately calls the ‘$’ arm after defining it. The last line, $(n +(n)), is analogous to the recur(n +(n)) line in our first example: it evaluates the ‘$’ arm, except changing the value of ‘n’ to ‘n + 1’ in the environment.

(Note that there are two ‘$’ arms defined in the above code – one via the use of |=, and one via the use of |-. But there is no confusion as to which one we mean, since the latter has been the latest to be added to the subject. Additions to the subject are always prepended in Hoon – i.e. they are placed at address 2. As the topmost ‘$’ in the subject is the one that corresponds to |-, it is resolved first.)

The compiled Nock for this version looks like the following:

[8
  [1 0]
  8
    [1
      6
        [5 [4 0 6] 0 30]
        [0 6]
        9 2 10 [6 4 0 6] 0 1
    ]
    9 2 0 1
]

And it is possible (see the appendix) to show that, modulo some different addressing, this reduces exactly to our hand-rolled Nock code.

UPDATE: my colleague Ted Blackman, an actual Hoon programmer, recommended the following as a slightly more idiomatic version of naïve decrement:

=|  n=@
|=  m=@
^-  @
?:  =(+(n) m)
  n
$(n +(n))

Note that here we’re declaring ‘n’ outside of the gate itself by using another rune, =|, that gives the variable a default value based on its type (an atom’s default value is 0). There’s also an explicit type cast via ^- @, indicating that the gate produces an atom (like type signatures in Haskell, it is considered good practice to include these, even though they may not strictly be required).

Declaring ‘n’ outside the gate is interesting. It has an imperative feel, as if one were writing the code in Python, or were using a monad like ‘State’ or a ‘PrimMonad’ in Haskell. Like in the Haskell case, we aren’t actually doing any mutation here, of course – we’re creating new subjects to evaluate each iteration of our Nock formula against. And the resulting Nock is very succinct:

[6
  [5 [4 0 14] 0 6]
  [0 14]
  9 2 10 [14 4 0 14] 0 1
]

Basic Generators

If you tested the above examples, I instructed you to do so by typing them into Arvo’s dojo. I’ve come to believe that, in general, this is a poor way to teach Hoon. It shouldn’t be done for all but the most introductory examples (such as the ones I’ve provided here).

If you’ve learned Haskell, you are familiar with the REPL provided by GHCi, the Glasgow Haskell Compiler’s interpreter. Code running in GHCi is implicitly running in the IO monad, and I think this leads to confusion amongst newcomers who must then mentally separate “Haskell in GHC” from “Haskell in GHCi.”

I think there is a similar problem in Hoon. Expressions entered into the dojo implicitly grow or shrink or otherwise manipulate the dojo’s subject, which is not, in general, available to standalone Hoon programs. Such standalone Hoon programs are called generators. In general, they’re what you will use when working in Hoon and Arvo.

There are four kinds of generators: naked, %say, %ask, and %get. In this post we’ll just look at the first two; the last couple are out of scope, for now.

Naked Generators

The simplest kind of generator is the ‘naked’ generator, which just exists in a file somewhere in your Urbit’s “desk.” If you save the following as naive-decrement.hoon in an Urbit’s home/gen directory, for example:

|=  m=@
=/  n  0
|-
?:  =(+(n) m)
  n
$(n +(n))

Then you’ll be able to run it in a dojo via:

~zod:dojo> +naive-decrement 20
19

A naked generator can only be a simple function (technically, a gate) that produces a noun. It has no access to any external environment – it’s basically just a self-contained function in a file. It must have an argument, and it must have only one argument; to pass multiple values to a naked generator, one must use a cell.

Say Generators

Hoon is a purely functional language, but, unlike Haskell, it also has no IO monad to demarcate I/O effects. Hoon programs do not produce effects on their own at all – instead, they construct nouns that tell Arvo how to produce some effect or other.

A %say generator (where %say is a symbol) produces a noun, but it can also make use of provided environment data (e.g. date information, entropy, etc.). The idea is that the generator has a specific structure that Arvo knows how to handle, in order to supply it with the requisite information. Specifically, %say generators have the structure:

:-  %say
|=  [<environment data> <list of arguments> <list of optional arguments>]
:-  %noun
<code>

I’ll avoid discussing what a list is in Hoon at the moment, and we won’t actually use any environment data in any examples here. But if you dump the following in home/gen/naive-decrement.hoon, for example:

:-  %say
|=  [* [m=@ ~] ~]
:-  %noun
=/  n  0
|-
?:  =(+(n) m)
  n
$(n +(n))

you can call it from the dojo via the mechanism as before:

~zod:dojo> +naive-decrement 20
19

The generator itself actually returns a particularly-structured noun; a cell with the symbol %say as its head, and a gate returning a pair of the symbol %noun and a noun as its tail. The %noun symbol describes the data produced by the generator. But note that this is not displayed when evaluating the generator in the dojo – instead, we just get the noun itself, but this behaviour is dojo-dependent.

I think one should almost get in the habit of writing %say generators for most Hoon code, even if a simple naked generator or throwaway dojo command would do the trick. They are so important for getting things done in Hoon that it helps to learn about & start using them sooner than later.

Fin

I’ve introduced Hoon and given a brief tour of what I think are some of the most important tools for getting work done in the language. Cores, doors, and gates will get you plenty far, and early exposure to generators, in the form of the basic naked and %say variants, will help you avoid the habit of programming in the dojo, and get you writing more practically-structured Hoon code from the get-go.

I haven’t had time in this post to describe Hoon’s type system, which is another very important topic when it comes to getting work done in the language. I’ll probably write one more to create a small trilogy of sorts – stay tuned.

Appendix

Let’s demonstrate that the compiled Nock code from our door-implemented decrement reduces to the same as our hand-rolled Nock, save different address use. Recall that our compile Nock code was:

[8
  [1 0]
  8
    [1
      6
        [5 [4 0 6] 0 30]
        [0 6]
        9 2 10 [6 4 0 6] 0 1
    ]
    9 2 0 1
]

An easy reduction is from Nock-9 to Nock-2. Note that *[a 9 b c] is the same as *[*[a c] 2 [0 1] 0 b]. When ‘c’ is [0 1], we have that *[a c] = a, such that *[a 9 b [0 1]] is the same as *[a 2 [0 1] 0 b], i.e. that the formula [9 b c] is the same as the formula [2 [0 1] 0 b]. We can thus reduce the use of Nock-9 on the last line to:

[8
  [1 0]
  8
    [1
      6
        [5 [4 0 6] 0 30]
        [0 6]
        9 2 10 [6 4 0 6] 0 1
    ]
    2 [0 1] 0 2
]

The remaining formula involving Nock-9 evaluates [10 [6 4 0 6] 0 1] against the subject, and then evaluates [2 [0 1] [0 2]] against the result. Note that, for some subject ‘a’, we have:

*[a 10 [6 4 0 6] 0 1]
= #[6 *[a 4 0 6] *[a 0 1]]
= #[6 *[a 4 0 6] a]
= #[3 [*[a 4 0 6] /[7 a]] a]
= #[1 [/[2 a] [*[a 4 0 6] /[7 a]]] a]
= [/[2 a] [*[a 4 0 6] /[7 a]]]
= [*[a 0 2] [*[a 4 0 6] *[a 0 7]]]
= *[a [0 2] [4 0 6] [0 7]]

such that [10 [6 4 0 6] 0 1] = [[0 2] [4 0 6] [0 7]]. And for c = [[0 2] [4 0 6] [0 7]] and some subject ‘a’, we have:

*[a 9 2 c]
= *[*[a c] 2 [0 1] 0 2]

and for b = [2 [0 1] 0 2]:

*[*[a c] b]
= *[a 7 c b]
= *[a 7 [[0 2] [4 0 6] [0 7]] [2 [0 1] 0 2]]

such that:

[9 2 [0 2] [4 0 6] [0 7]] = [7 [[0 2] [4 0 6] [0 7]] [2 [0 1] 0 2]]

Now. Note that for any subject ‘a’ we have:

*[a 7 [[0 2] [4 0 6] [0 7]] [2 [0 1] 0 2]]
= *[a 7 [[0 2] [4 0 6] [0 7]] *[a 0 2]]

since *[a 2 [0 1] 0 2] = *[a *[a 0 2]]. Thus, we can reduce:

*[a 7 [[0 2] [4 0 6] [0 7]] *[a 0 2]]
= *[*[a [0 2] [4 0 6] [0 7]] *[a 0 2]]
= *[a 2 [[0 2] [4 0 6] [0 7]] [0 2]]

such that

[7 [[0 2] [4 0 6] [0 7]] [2 [0 1] 0 2]] = [2 [[0 2] [4 0 6] [0 7]] [0 2]]

and, so that, finally, we can reduce the compiled Nock to:

[8
  [1 0]
  8
    [1
      6
        [5 [4 0 6] 0 30]
        [0 6]
        2 [[0 2] [4 0 6] [0 7]] 0 2
    ]
    2 [0 1] 0 2
]

which, aside from the use of the dojo-assigned address 30 (and any reduction errors on this author’s part), is the same as our hand-rolled Nock.

A Nock Interpreter

I wrote a little Nock interpreter called hnock some months ago and just yesterday updated it to support the latest version of Nock, 4K. Nock – the base layer VM of Urbit – is a very simple little “functional assembly language” of sorts. It is of particular interest in that it is capable of practical computation (indeed, it is Turing complete) but is not defined in terms of the lambda calculus. So, no variable naming and capture-avoidance and the like to deal with, which is kind of neat.

Nock (at 4K) itself has a small specification, repeated below. Famously, it fits on a t-shirt:

nock(a)             *a
[a b c]             [a [b c]]

?[a b]              0
?a                  1
+[a b]              +[a b]
+a                  1 + a
=[a a]              0
=[a b]              1

/[1 a]              a
/[2 a b]            a
/[3 a b]            b
/[(a + a) b]        /[2 /[a b]]
/[(a + a + 1) b]    /[3 /[a b]]
/a                  /a

#[1 a b]            a
#[(a + a) b c]      #[a [b /[(a + a + 1) c]] c]
#[(a + a + 1) b c]  #[a [/[(a + a) c] b] c]
#a                  #a

*[a [b c] d]        [*[a b c] *[a d]]

*[a 0 b]            /[b a]
*[a 1 b]            b
*[a 2 b c]          *[*[a b] *[a c]]
*[a 3 b]            ?*[a b]
*[a 4 b]            +*[a b]
*[a 5 b c]          =[*[a b] *[a c]]

*[a 6 b c d]        *[a *[[c d] 0 *[[2 3] 0 *[a 4 4 b]]]]
*[a 7 b c]          *[*[a b] c]
*[a 8 b c]          *[[*[a b] a] c]
*[a 9 b c]          *[*[a c] 2 [0 1] 0 b]
*[a 10 [b c] d]     #[b *[a c] *[a d]]

*[a 11 [b c] d]     *[[*[a c] *[a d]] 0 3]
*[a 11 b c]         *[a c]

*a                  *a

Perhaps you are a neophyte Haskeller and have never implemented an interpreter for a language before. Nock makes for an excellent target to practice on.

Expressions

A noun in Nock is either an atom, i.e. an unsigned integer, or a cell, i.e. an ordered pair of nouns:

data Noun =
    Atom Integer
  | Cell Noun Noun
  deriving Eq

Very simple. A Nock expression is then a noun, or some operator applied to a noun. Per the spec, the operators are denoted by ?, +, =, /, #, and *, and the fun, convenient, and authentically Urbit-y way to pronounce these are wut, lus, tis, net, hax, and tar respectively:

data Expr =
    Noun Noun
  | Wut Noun
  | Lus Noun
  | Tis Noun
  | Net Noun
  | Hax Noun
  | Tar Noun
  deriving Eq

So, equipped with the above definitions, you can write Nock expressions in Haskell. For example, the following expression:

*[57 [4 0 1]]

maps to:

Tar
  (Cell (Atom 57)
  (Cell (Atom 4)
  (Cell (Atom 0) (Atom 1))))

Note per the spec that cells associate to the right, thus something like [4 0 1] maps to [4 [0 1]], and thus Cell (Atom 4) (Cell (Atom 0) (Atom 1)).

Evaluation and Semantics

For evaluation one can more or less copy out the production rules from the Nock spec. We can use an Either type to denote a lack of defined semantics:

data Error = Error Noun
  deriving Show

type Possibly = Either Error

and then simply define the various evaluation functions appropriately. ‘wut’, for example, checks if something is a cell:

wut :: Noun -> Possibly Noun
wut noun = return $ case noun of
  Cell {} -> Atom 0
  Atom {} -> Atom 1

(note that in Nock and Urbit more generally, 0 denotes ‘true’).

‘tis’ checks if the elements of a cell are equal:

tis :: Noun -> Possibly Noun
tis noun = case noun of
  Atom {}  -> Left (Error noun)
  Cell m n -> return $
    if   m == n
    then Atom 0
    else Atom 1

And so on. There are other operators for lookups, substitution, et cetera. The most involved operator is ‘tar’, which constitutes the Nock interpreter proper. One can simply match on the cases appropriately:

tar :: Noun -> Possibly Noun
tar noun = case noun of
  Cell a (Cell (Cell b c) d) -> do
    tard0 <- tar (Cell a (Cell b c))
    tard1 <- tar (Cell a d)
    return (Cell tard0 tard1)

  Cell a (Cell (Atom 0) b) ->
    net (Cell b a)

  Cell _ (Cell (Atom 1) b) ->
    return b

  Cell a (Cell (Atom 2) (Cell b c)) -> do
    tard0 <- tar (Cell a b)
    tard1 <- tar (Cell a c)
    tar (Cell tard0 tard1)

  -- ... and so on

It is particularly useful to look at ‘tar’ to get an idea of what is going on in Nock. One is always evaluating a noun to another noun. Semantics are only defined when the noun being evaluated is a cell, and that cell is said to have the structure:

[subject formula]

The subject is essentially the environment, reified explicitly. The formula is the code to execute against the subject.

The resulting noun is called a ‘product’. So, evaluation proceeds as:

[subject formula] -> product

In any valid application of ‘tar’, i.e. for some *[x y], ‘x’ is the subject, and ‘y’ is the formula. The first case of ‘tar’, for example, is defined via:

*[a 0 b]            /[b a]

Recall that cells in Nock are right-associative, so [a 0 b] is really [a [0 b]]. In other words: a is the subject, [0 b] is the formula. The formula [0 b], read as ‘Nock-zero’, means “look up the value at address ‘b’ in the subject,” the subject here being ‘a’. So *[a 0 b] means “look up the value at address ‘b’ in ‘a’.”

Other formulas in the various ‘tar’ cases denote additional useful language features. A formula [1 b] is the constant function, [6 b c d] is a conditional, [7 b c] is function (viz. formula) composition, and so on. There is a Turing-complete amount of power packed onto that t-shirt!

In any case, once one has constructed all of these requisite evaluation functions, we can stitch them together in a single evaluator:

eval :: Expr -> Possibly Noun
eval expr = case expr of
  Noun noun -> return noun
  Wut  e    -> wut e
  Lus  e    -> lus e
  Tis  e    -> tis e
  Net  e    -> net e
  Hax  e    -> hax e
  Tar  e    -> tar e

Parsing

Writing a combinator-based parser is an easy job. You can use parsec, for example, and define your combinators as follows.

Given low-level ‘atom’ and ‘cell’ parsers, a noun is just an atom or a cell:

noun :: Monad m => P.ParsecT T.Text u m Noun
noun =
      P.try cell
  <|> atom

and an expression is either an operator, followed by an noun, or just a noun:

expr :: Monad m => P.ParsecT T.Text u m Expr
expr =
      P.try operator
  <|> fmap Noun noun

operator :: Monad m => P.ParsecT T.Text u m Expr
operator = do
  op <- P.oneOf "?+=/#*"
  case op of
    '?' -> fmap Wut noun
    '+' -> fmap Lus noun
    '=' -> fmap Tis noun
    '/' -> fmap Net noun
    '#' -> fmap Hax noun
    '*' -> fmap Tar noun
    _   -> fail "op: bad token"

Note that we don’t allow expressions like the following, taken from the Nock spec:

*[a *[[c d] 0 *[[2 3] 0 *[a 4 4 b]]]]

These are used to define the production rules, but are not valid Nock expressions per se.

The end parser is simply:

parse :: T.Text -> Either P.ParseError Expr
parse = P.runParser expr [] "input"

Example

The final interpreter is pretty simple. Here we distinguish between cases in order to report either parsing or evaluation errors in basic fashion, respectively:

hnock :: T.Text -> Noun
hnock input = case parse input of
  Left perr -> error (show perr)
  Right ex  -> case eval ex of
    Left err -> error (show err)
    Right e  -> e

It’s easy to test. Take the simple Nock-zero example given previously, where *[a 0 b] means “look up the value at address ‘b’ in ‘a’”. Nock’s addressing scheme is simple; address ‘2’ means the leftmost component of a cell, and ‘3’ means the rightmost component. For every additional cell you want to recurse into, you just multiply by two and add one as is necessary. Address ‘1’ is just the cell itself.

So, let’s test:

~ hnock "*[[[1 2] [3 4]] [0 1]"
[[1 2] [3 4]]

Again, the formula [0 1] means look up at address one, which is the cell itself.

~ hnock "*[[[1 2] [3 4]] [0 2]]"
[1 2]

The formula [0 2] means look up at address 2, which is the leftmost component of the cell.

~ hnock "*[[[1 2] [3 4]] [0 7]]"
4

To understand [0 7], we first look up the rightmost component [3 4] at address 3, then multiply by two and add one for the rightmost component of that.

The most famous Nock learner’s problem is to implement a function for decrementing an atom (if you could do this in ~2010 you could receive an Urbit galaxy for it, which is now probably worth a serious chunk of change). Nock’s only arithmetic operator is increment, so the trick is to decrement by actually counting up from zero.

(N.b. in practice this is not how one actually evaluates decrement in Nock. The compiler is ‘jetted’, such that alternate implementations, e.g. the CPU’s decrement instruction directly, can be used to evaluate semantically-identical expressions.)

But the idea is simple enough. To warm up, we can implement this kind of decrement in Haskell. Don’t handle the underflow condition, just to keep things simple:

dec :: Integer -> Integer
dec m =
  let loop n
        | succ n == m = n
        | otherwise   = loop (succ n)
  in  loop 0

Look at the language features we’re using to accomplish this. There’s:

  • Variable naming / declaration, for the ‘m’, ‘loop’, and ‘n’ arguments
  • Integer increment, via the built-in ‘succ’
  • Equality testing, via ‘==’
  • Conditional expressions, via the ‘|’ guards
  • A function call, in ‘loop 0’

Let’s try and duplicate this in Nock, step by step. We’ll want analogues for all of our declared variables, to start.

A Nock formula that simply returns the subject it’s evaluated against is easy: [0 1]. We used it above.

~ hnock "*[10 [0 1]]"
10

That’s the equivalent of our ‘m’ declaration in the Haskell code.

Similarly, we can just ignore the subject and return a ‘0’ via [1 0].

~ hnock "*[10 [1 0]]"
0

That’s the initial value, 0, that appears in ‘loop 0’.

We can use Nock-8 to pair these together. It evaluates a formula against the subject, and then uses the original subject, augmented with the product, to evaluate another one:

~ hnock "*[10 [8 [1 0] [0 1]]]"
[0 10]

The last thing we need define is the ‘loop’ function itself. We’re going to place it at the head of the subject, i.e. at address two. Our ‘n’ and ‘m’ variables will then be at addresses six and seven, respectively. When we define our ‘loop’ analogue, we just need to refer to those addresses, instead of our variable names.

(N.b. Nock’s addressing scheme seems more or less equivalent to a tree-flavoured variant of de Bruijn indices.)

The ‘loop’ function in the Haskell code tests to see the successor of ‘n’ is equal to ‘m’, and if so, returns ‘n’. Otherwise it loops on the successor of ‘n’.

Nock-4 computes the successor. Remember that ‘n’ is at address six, so our analogue to ‘succ n’ is [4 0 6]. We want to check if it’s equal to ‘m’, at address seven, gotten via [0 7]. The equality check is then handled by Nock-5:

[5 [4 0 6] [0 7]]

Nock-6 is the conditional operator. If the equality check is true, we just want to return ‘n’ via [0 6]. The whole formula will look like this:

[6 [5 [4 0 6] [0 7]] [0 6] <loop on successor of n>]

To loop, we’ll use Nock-2. We’ll compute a new subject – consisting of our ‘loop’ analogue at address two, an updated ‘n’ at address six, and our original ‘m’ at address seven – and then evaluate the same looping formula against it. In Nock, the formula is:

[2 [[0 2] [4 0 6] [0 7]] [0 2]]

And our ‘loop’ analogue in full is thus:

[6 [5 [4 0 6] [0 7]] [0 6] [2 [[0 2] [4 0 6] [0 7]] [0 2]]]

Finally, we now need to actually store this in the subject. Again, we’ll ignore the subject with Nock-1, and use Nock-8 to tuple the formula up with the ‘n’ and ‘m’ analogues:

(N.b. at this point we’re pretty much at line noise – spacing it out doesn’t make it that much more readable, but it does avoid word wrap.)

[8
  [1 0]
  [8
    [1
      [6
        [5 [4 0 6] [0 7]]
        [0 6]
        [2 [[0 2] [4 0 6] [0 7]] [0 2]]
      ]
    ]
    [2 [0 1] [0 2]]
  ]
]

Let’s check if it works. We’re embedded in Haskell, after all, so let’s get some support from our host, for convenience:

~ let dec = \subj -> "*[" <> subj <> " [8 [1 0] [8 [1 [6 [5 [4 0 6] [0 7]] [0 6] [2 [[0 2] [4 0 6] [0 7]] [0 2]]]] [2 [0 1] [0 2]]]]]"
~ hnock (dec "10")
9
~ hnock (dec "25")
24
~ hnock (dec "200")
199

Phew!

Fin

Nock makes for an interesting “functional assembly language” of sorts. Its spec is so simple that writing a naïve interpreter is more or less a process of copying out the production rules. You can do it without a lot of thought.

The time-honoured decrement-writing exercise, on the other hand, really makes one think about Nock. That the environment is always explicit, you can access it via a nice addressing scheme, that you can refer to functions via pointers (viz. addresses), etc. It’s pretty intuitive, when you dig into the mechanics.

In any case – writing a decrement (or similar function) and writing your own interpreter to test it is a fun exercise. From a practical perspective, it also gives one a lot of intuition about programming in Hoon – it’s probably not too far from the truth to compare the relationship between Nock and Hoon to that between, say, x86 and C. If one is programming in Hoon, it can’t hurt to to know your Nock!

Crushing ISAAC

(UPDATE 2020/06/30: the good people at tweag.io have since published a Nix shell environment that appears to make testing arbitrary PRNGs much less of a pain. I recommend you check it out!)

I recently needed a good cryptographically-secure and seedable pseudorandom number generator for Javascript. This didn’t turn out to be as trivial a procedure as I figured it’d be: most Javascript CSPRNGs I found didn’t appear to be manually seedable, instead automatically seeding themselves behind-the-scenes using a trusted high-quality entropy source like /dev/urandom (as per the WebCryptoAPI spec).

But! I want to use my own entropy source. So I could either implement my own cryptographically-secure PRNG, which I would obviously need to test rigorously, or I could find an existing implementation that had already been vetted widely by use. I settled on the ISAAC PRNG/stream cipher, both because it seemed like a reasonable choice of PRNG (it is used in things like coreutils’ shred, and there are no known practical attacks against it), and also because there was a Javascript implementation floating around on the internet. But the interesting thing was that the implementation did not really seem to be vetted widely – it hadn’t been updated in five years or so, and both the Github repository and the npm package seem to show very little activity around it in general.

(Update: Stefano, the author of the fork I used, later emailed me to point out that the original version of this ISAAC code is located here. His fork, on the other hand, is node-friendly.)

I was going to be using this thing in an application that made some nontrivial demands in terms of security. So while the PRNG itself seemed to fit the bill, I wanted to be assured that the implementation satisfied at least some basic criteria for pseudorandomness. I assumed that it probably works – I just wanted to have some reasonable confidence in making that assumption.

There are a number of statistical suites for testing PRNGs out there: I am aware of at least the NIST statistical test suite, the DieHard suite, and TestU01, being most familiar with the latter (this is not saying much). The TestU01 suite is implemented in C; you provide it with a PRNG represented as a function pointer that returns 32-bit integers, and it will run the desired battery of tests (SmallCrush, Crush, or BigCrush – each of increasing size and scope) for you. These more or less consist of frequentist tests involving the chi-square distribution, with more specialised test statistics appearing on occasion. Results are provided in terms of p-values on these statistics.

I found this page that gave some instructions for using TestU01, but the ISAAC implementation I wanted to test is of course written in Javascript. So I knew there was some FFI hackery to be done in order to get the two codebases to play nicely together. I also discovered that Jan de Mooij did some work on testing JS’s basic ‘Math.random’ generator with TestU01 using Emscripten, an LLVM-to-JS compiler, so these two resources seemed a useful place to start.

After several hours of bashing my way through emcc compilation and linker errors careful and methodical programming, I managed to get everything to work. Since the documentation for these tools can be kind of sparse, I figured I’d outline the steps to run the tests here, and hopefully save somebody else a little bit of time and effort in the future. But that said, this is inevitably a somewhat tricky procedure – when trying to reproduce my steps, I ran into a few strange errors that required additional manual fiddling. The following should be about right, but may require some tweaking on your end.

Emscripten and TestU01

Install and activate Emscripten in the current terminal. To go from the official git repo:

$ git clone https://github.com/juj/emsdk.git
$ cd emsdk
$ ./emsdk install latest
$ ./emsdk activate latest
$ source ./emsdk_env.sh

Emscripten provides a couple of wrappers over existing tools; notably, there’s emconfigure and emmake for specialising make builds for compilation via emcc, the Emscripten compiler itself.

In some other directory, grab the TestU01 suite from the Université de Montréal website and extract it:

$ wget http://simul.iro.umontreal.ca/testu01/TestU01.zip
$ unzip -q TestU01.zip

This is some oldish, gnarly academic C code that uses a very wonky autoconf/automake-based build system. There is probably a better way to do it, but the easiest way to get this thing built without too much grief is to build it twice – once as per normal, specifying the appropriate base directory, and once again to specialise it for Emscripten’s use:

$ basedir=`pwd`
$ cd TestU01-1.2.3
$ ./configure --prefix="$basedir"
$ make -j 2
$ make -j 2 install

If all goes well you’ll see ‘bin’, ‘include’, ‘lib’, and ‘share’ directories pop up in ‘basedir’. Repeat the analogous steps for emscripten; note that you’ll get some “no input files” errors here when the configure script checks dynamic linker characteristics, but these are inconsequential:

$ emconfigure ./configure --prefix="$basedir"
$ emmake make -j 2
$ emmake make -j 2 install

Similarly, you’ll notice some warnings re: dynamic linking when building. We’ll handle these later. In the meantime, you can return to your ‘basedir’ to continue working:

$ cd $basedir

A Test Shim for ISAAC

Check out the raw ISAAC code. It’s structured in a sort-of-object-y way; the state of the PRNG is held in a bunch of opaque internal variables, and the whole thing is initialised by calling the ‘isaac’ function and binding the result as a variable. One then iterates the PRNG by calling either the ‘rand’ or ‘random’ property of that variable for a random integer or double, respectively.

We need to write the actual testing code in C. You can get away with the following, which I’ve adapted from M.E. O’Neill’s code – call it something like ‘test-isaac.c’:

#include <emscripten.h>
#include "TestU01.h"

extern void isaac_init(void);
extern unsigned int isaac_rand(void);

int main()
{
    isaac_init();

    unif01_Gen* gen = unif01_CreateExternGenBits("ISAAC", isaac_rand);

    bbattery_SmallCrush(gen);

    unif01_DeleteExternGenBits(gen);

    return 0;
}

Note the two external functions I’m declaring here: the first mimics calling the ‘isaac’ function in Javascript and binding it to a variable, ‘isaac’, and the second mimics a call to isaac.rand(). The testing code follows the same pattern: isaac_init() initialises the generator state, and isaac_rand produces a value from it. The surrounding code passes isaac_rand in as the generator to use for the SmallCrush battery of tests.

C to LLVM Bitcode

We can compile this to LLVM IR as it is, via Emscripten. But first, recall those dynamic linker warnings from the initial setup step. Emscripten deals with a lot of files, compile targets, etc. based on file extension. We thus need to rename all the .dylib files in the ‘lib’ directory, which are in fact all LLVM bitcode, to have the appropraite .bc extension. From the ‘lib’ directory itself, one can just do the following in bash:

$ for old in *.dylib; do mv $old `basename $old .dylib`.bc; done

When that’s done, we can compile the C code to LLVM with emcc:

$ emcc -O3 -o test-isaac.bc \
    -I/$basedir/include \
    -L/$basedir/lib \
    -ltestu01 -lprobdist -lmylib -lm -I/usr/local/include \
    test-isaac.c

Again, Emscripten decides what its compile target should be via its file extension – thus here, an output with the .bc extension means we’re compiling to LLVM IR.

LLVM to Javascript and WASM

Now, to provide the requisite isaac_init and isaac_rand symbols to the compiled LLVM bitcode, we need to pass the ISAAC library itself. This is another finicky procedure, but there is a method to the madness, and one can find a bit of documentation on it here.

Helpfully, Evan Wallace at Figma wrote an emscripten JS library generation helper that makes this task a little less painful. Install that via npm as follows:

$ npm install -g emscripten-library-generator

To wrap the ISAAC code up in the appropriate format, one needs to make a few small modifications to it. I won’t elaborate on this too much, but one needs to:

  • Change the String.prototype.toIntArray function declaration to a simple function toIntArray(string) declaration, and alter its use in the code appropriately,
  • Change the var isaac = ... function declaration/execution binding to a simple function isaac() declaration, and,
  • Declare the two functions used in our C shim:

    function isaac_init() {
      var isaac_initialised = isaac();
    }
    
    function isaac_rand() {
      return isaac_initialised.rand();
    }
    

Then we can package it up in an emscripten-friendly format as follows:

$ emscripten-library-generator isaac-mod.js > isaac-lib.js

You’ll need to make one last adjustment. In the isaac-lib.js file just generated for us, add the following emscripten ‘postset’ instruction above the ‘isaac’ property:

isaac__postset: 'var isaac_initialised = _isaac();',   // add me
isaac: function () {                                   // leave me alone

This makes sure that the isaac_initialised variable referred to in isaac_rand is accessible.

Whew. Ok, with all that done, we’ll compile our LLVM bytecode to a Javascript and wasm pair. You’ll need to bump up the total memory option in order to run the resulting code – I think I grabbed the amount I used from Jan de Mooij’s example:

$ emcc --js-library isaac-lib.js \
    lib/libtestu01.0.0.1.bc \
    -o test-isaac.js \
    -s TOTAL_MEMORY=536870912 \
    test-isaac.bc

Running SmallCrush

That’s about it. If all has gone well, you should have seen no compilation or linking errors when running emcc, and you should have a couple of ‘test-isaac.js’ and ‘test-isaac.wasm’ files kicking around in your ‘basedir’.

To (finally) run the Small Crush suite, execute ‘test-isaac.js’ with node:

$ node test-isaac.js
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
                 Starting SmallCrush
                 Version: TestU01 1.2.3
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx


***********************************************************
HOST = emscripten, Emscripten

ISAAC


smarsa_BirthdaySpacings test:
-----------------------------------------------
   N =  1,  n = 5000000,  r =  0,    d = 1073741824,    t = 2,    p = 1


      Number of cells = d^t = 1152921504606846976
      Lambda = Poisson mean =      27.1051


----------------------------------------------------
Total expected number = N*Lambda      :      27.11
Total observed number                 :      27
p-value of test                       :    0.53


-----------------------------------------------
CPU time used                    :  00:00:00.00

Generator state:

<more output omitted>

========= Summary results of SmallCrush =========

 Version:          TestU01 1.2.3
 Generator:        ISAAC
 Number of statistics:  15
 Total CPU time:   00:00:00.00

 All tests were passed

Et voilà, your SmallCrush battery output is dumped to stdout. You need only to tweak the C code shim and recompile if you want to run the more intensive Crush or BigCrush suites. Similarly, you can dump generator output to stdout with console.log() if you want to reassure yourself that the generator is running correctly.

Fin

So: the Javascript ISAAC PRNG indeed passes TestU01. Nice! It was satisfying enough to get this hacky sequence of steps to actually run, but it was even better to see the tests actually pass, as I’d both hoped and expected.

I did a few extra things to convince myself that ISAAC was really passing the tests as it seemed to be doing. I ran TestU01 on a cheap little xorshift generator (which failed several tests) and also did some ad-hoc analysis of values that I had ISAAC log to stdout. And, I even looked at the code, and compared it with a reference implementation by eye. At this point, I’m convinced it operates as advertised, and I feel very comfortable using it in my application.