Crushing ISAAC

07 Oct 2018

(UPDATE 2020/06/30: the good people at tweag.io have since published a Nix shell environment that appears to make testing arbitrary PRNGs much less of a pain. I recommend you check it out!)

I recently needed a good cryptographically-secure and seedable pseudorandom number generator for Javascript. This didn’t turn out to be as trivial a procedure as I figured it’d be: most Javascript CSPRNGs I found didn’t appear to be manually seedable, instead automatically seeding themselves behind-the-scenes using a trusted high-quality entropy source like /dev/urandom (as per the WebCryptoAPI spec).

But! I want to use my own entropy source. So I could either implement my own cryptographically-secure PRNG, which I would obviously need to test rigorously, or I could find an existing implementation that had already been vetted widely by use. I settled on the ISAAC PRNG/stream cipher, both because it seemed like a reasonable choice of PRNG (it is used in things like coreutils’ shred, and there are no known practical attacks against it), and also because there was a Javascript implementation floating around on the internet. But the interesting thing was that the implementation did not really seem to be vetted widely – it hadn’t been updated in five years or so, and both the Github repository and the npm package seem to show very little activity around it in general.

(Update: Stefano, the author of the fork I used, later emailed me to point out that the original version of this ISAAC code is located here. His fork, on the other hand, is node-friendly.)

I was going to be using this thing in an application that made some nontrivial demands in terms of security. So while the PRNG itself seemed to fit the bill, I wanted to be assured that the implementation satisfied at least some basic criteria for pseudorandomness. I assumed that it probably works – I just wanted to have some reasonable confidence in making that assumption.

There are a number of statistical suites for testing PRNGs out there: I am aware of at least the NIST statistical test suite, the DieHard suite, and TestU01, being most familiar with the latter (this is not saying much). The TestU01 suite is implemented in C; you provide it with a PRNG represented as a function pointer that returns 32-bit integers, and it will run the desired battery of tests (SmallCrush, Crush, or BigCrush – each of increasing size and scope) for you. These more or less consist of frequentist tests involving the chi-square distribution, with more specialised test statistics appearing on occasion. Results are provided in terms of p-values on these statistics.

I found this page that gave some instructions for using TestU01, but the ISAAC implementation I wanted to test is of course written in Javascript. So I knew there was some FFI hackery to be done in order to get the two codebases to play nicely together. I also discovered that Jan de Mooij did some work on testing JS’s basic ‘Math.random’ generator with TestU01 using Emscripten, an LLVM-to-JS compiler, so these two resources seemed a useful place to start.

After several hours of ~~bashing my way through emcc compilation and linker errors~~ careful and methodical programming, I managed to get everything to work. Since the documentation for these tools can be kind of sparse, I figured I’d outline the steps to run the tests here, and hopefully save somebody else a little bit of time and effort in the future. But that said, this is inevitably a somewhat tricky procedure – when trying to reproduce my steps, I ran into a few strange errors that required additional manual fiddling. The following should be about right, but may require some tweaking on your end.

Emscripten and TestU01

Install and activate Emscripten in the current terminal. To go from the official git repo:

$ git clone https://github.com/juj/emsdk.git
$ cd emsdk
$ ./emsdk install latest
$ ./emsdk activate latest
$ source ./emsdk_env.sh

Emscripten provides a couple of wrappers over existing tools; notably, there’s emconfigure and emmake for specialising make builds for compilation via emcc, the Emscripten compiler itself.

In some other directory, grab the TestU01 suite from the Université de Montréal website and extract it:

$ wget http://simul.iro.umontreal.ca/testu01/TestU01.zip
$ unzip -q TestU01.zip

This is some oldish, gnarly academic C code that uses a very wonky autoconf/automake-based build system. There is probably a better way to do it, but the easiest way to get this thing built without too much grief is to build it twice – once as per normal, specifying the appropriate base directory, and once again to specialise it for Emscripten’s use:

$ basedir=`pwd`
$ cd TestU01-1.2.3
$ ./configure --prefix="$basedir"
$ make -j 2
$ make -j 2 install

If all goes well you’ll see ‘bin’, ‘include’, ‘lib’, and ‘share’ directories pop up in ‘basedir’. Repeat the analogous steps for emscripten; note that you’ll get some “no input files” errors here when the configure script checks dynamic linker characteristics, but these are inconsequential:

$ emconfigure ./configure --prefix="$basedir"
$ emmake make -j 2
$ emmake make -j 2 install

Similarly, you’ll notice some warnings re: dynamic linking when building. We’ll handle these later. In the meantime, you can return to your ‘basedir’ to continue working:

$ cd $basedir

A Test Shim for ISAAC

Check out the raw ISAAC code. It’s structured in a sort-of-object-y way; the state of the PRNG is held in a bunch of opaque internal variables, and the whole thing is initialised by calling the ‘isaac’ function and binding the result as a variable. One then iterates the PRNG by calling either the ‘rand’ or ‘random’ property of that variable for a random integer or double, respectively.

We need to write the actual testing code in C. You can get away with the following, which I’ve adapted from M.E. O’Neill’s code – call it something like ‘test-isaac.c’:

#include <emscripten.h>
#include "TestU01.h"

extern void isaac_init(void);
extern unsigned int isaac_rand(void);

int main()
{
    isaac_init();

    unif01_Gen* gen = unif01_CreateExternGenBits("ISAAC", isaac_rand);

    bbattery_SmallCrush(gen);

    unif01_DeleteExternGenBits(gen);

    return 0;
}

Note the two external functions I’m declaring here: the first mimics calling the ‘isaac’ function in Javascript and binding it to a variable, ‘isaac’, and the second mimics a call to isaac.rand(). The testing code follows the same pattern: isaac_init() initialises the generator state, and isaac_rand produces a value from it. The surrounding code passes isaac_rand in as the generator to use for the SmallCrush battery of tests.

C to LLVM Bitcode

We can compile this to LLVM IR as it is, via Emscripten. But first, recall those dynamic linker warnings from the initial setup step. Emscripten deals with a lot of files, compile targets, etc. based on file extension. We thus need to rename all the .dylib files in the ‘lib’ directory, which are in fact all LLVM bitcode, to have the appropraite .bc extension. From the ‘lib’ directory itself, one can just do the following in bash:

$ for old in *.dylib; do mv $old `basename $old .dylib`.bc; done

When that’s done, we can compile the C code to LLVM with emcc:

$ emcc -O3 -o test-isaac.bc \
    -I/$basedir/include \
    -L/$basedir/lib \
    -ltestu01 -lprobdist -lmylib -lm -I/usr/local/include \
    test-isaac.c

Again, Emscripten decides what its compile target should be via its file extension – thus here, an output with the .bc extension means we’re compiling to LLVM IR.

LLVM to Javascript and WASM

Now, to provide the requisite isaac_init and isaac_rand symbols to the compiled LLVM bitcode, we need to pass the ISAAC library itself. This is another finicky procedure, but there is a method to the madness, and one can find a bit of documentation on it here.

Helpfully, Evan Wallace at Figma wrote an emscripten JS library generation helper that makes this task a little less painful. Install that via npm as follows:

$ npm install -g emscripten-library-generator

To wrap the ISAAC code up in the appropriate format, one needs to make a few small modifications to it. I won’t elaborate on this too much, but one needs to:

Change the String.prototype.toIntArray function declaration to a simple function toIntArray(string) declaration, and alter its use in the code appropriately,
Change the var isaac = ... function declaration/execution binding to a simple function isaac() declaration, and,

Declare the two functions used in our C shim:

function isaac_init() {
  var isaac_initialised = isaac();
}

function isaac_rand() {
  return isaac_initialised.rand();
}

Then we can package it up in an emscripten-friendly format as follows:

$ emscripten-library-generator isaac-mod.js > isaac-lib.js

You’ll need to make one last adjustment. In the isaac-lib.js file just generated for us, add the following emscripten ‘postset’ instruction above the ‘isaac’ property:

isaac__postset: 'var isaac_initialised = _isaac();',   // add me
isaac: function () {                                   // leave me alone

This makes sure that the isaac_initialised variable referred to in isaac_rand is accessible.

Whew. Ok, with all that done, we’ll compile our LLVM bytecode to a Javascript and wasm pair. You’ll need to bump up the total memory option in order to run the resulting code – I think I grabbed the amount I used from Jan de Mooij’s example:

$ emcc --js-library isaac-lib.js \
    lib/libtestu01.0.0.1.bc \
    -o test-isaac.js \
    -s TOTAL_MEMORY=536870912 \
    test-isaac.bc

Running SmallCrush

That’s about it. If all has gone well, you should have seen no compilation or linking errors when running emcc, and you should have a couple of ‘test-isaac.js’ and ‘test-isaac.wasm’ files kicking around in your ‘basedir’.

To (finally) run the Small Crush suite, execute ‘test-isaac.js’ with node:

$ node test-isaac.js
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
                 Starting SmallCrush
                 Version: TestU01 1.2.3
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx


***********************************************************
HOST = emscripten, Emscripten

ISAAC


smarsa_BirthdaySpacings test:
-----------------------------------------------
   N =  1,  n = 5000000,  r =  0,    d = 1073741824,    t = 2,    p = 1


      Number of cells = d^t = 1152921504606846976
      Lambda = Poisson mean =      27.1051


----------------------------------------------------
Total expected number = N*Lambda      :      27.11
Total observed number                 :      27
p-value of test                       :    0.53


-----------------------------------------------
CPU time used                    :  00:00:00.00

Generator state:

<more output omitted>

========= Summary results of SmallCrush =========

 Version:          TestU01 1.2.3
 Generator:        ISAAC
 Number of statistics:  15
 Total CPU time:   00:00:00.00

 All tests were passed

Et voilà, your SmallCrush battery output is dumped to stdout. You need only to tweak the C code shim and recompile if you want to run the more intensive Crush or BigCrush suites. Similarly, you can dump generator output to stdout with console.log() if you want to reassure yourself that the generator is running correctly.

Fin

So: the Javascript ISAAC PRNG indeed passes TestU01. Nice! It was satisfying enough to get this hacky sequence of steps to actually run, but it was even better to see the tests actually pass, as I’d both hoped and expected.

I did a few extra things to convince myself that ISAAC was really passing the tests as it seemed to be doing. I ran TestU01 on a cheap little xorshift generator (which failed several tests) and also did some ad-hoc analysis of values that I had ISAAC log to stdout. And, I even looked at the code, and compared it with a reference implementation by eye. At this point, I’m convinced it operates as advertised, and I feel very comfortable using it in my application.

jtobin.io