Much Faster Signatures on secp256k1

I’ve just released a new version of ppad-secp256k1 that supports dramatically faster Schnorr and ECDSA signature schemes, powered by faster elliptic curve arithmetic under the hood.

This follows on my previous posts in what are starting to constitute a sort of secp256k1 series. In the last post I had used the wNAF method for elliptic curve scalar multiplication to glean a large performance boost across the board. This time there have been two significant changes that have resulted in performance wins: first, the use of fixed-width, register-allocated words exclusively, in place of GHC’s heap-allocated ‘Integer’, and second, the use of Montgomery-form arithmetic to avoid performing expensive division operations when working over the secp256k1 fields.

Both of these features are provided by a new library I’ve released, ppad-fixed, which provides such stuff, as well as constant-time selection and arithmetic. ppad-fixed is essentially a Haskell translation of parts of Rust’s crypto-bigint library, except specialized to ‘wide’ (two-limb) and ‘wider’ (four-limb) words, as, unlike in Rust or Golang, we don’t have stack-allocated arrays in Haskell, and so everything is implemented in terms of fixed-size unboxed tuples. I’ve carefully verified that the assembly for many of the arithmetic operations is effectively optimal, at least for darwin-aarch64, when produced with GHC’s LLVM backend, so expect speediness – typically a few nanoseconds for addition and subtraction, a few more for multiplication, and a few microseconds for modular exponentiation, inversion, and square root.

I’d originally mentioned my desire for better fixed-width support in my first post on secp256k1:

Arbitrary-precision Integers are still slow, compared to fixed-width stuff that modern computers can positively chew through. I achieved a staggering speedup on some basic integer parsing by using a custom Word256 type (built from a bunch of Word64 values) under the hood, and converting to Integer only at the end.

They can also be annoying when one wants to achieve constant-time execution of cryptographically-sensitive functions, for obvious reasons. It would be nice to have good fixed-width support for stuff like Word128, Word256, Word512, and so on – I briefly considered implementing and using a custom Word256 type for everything, but this would be a ton of work, and I’m not sure I could beat GHC’s native bigint support for e.g. modular multiplication, exponentiation, and inversion anyway. We’ll stick with plain-old Integer for the time being – it’s still no slouch.

Well, at this point we beat Integer very handily, and provide solid guarantees of constant-time and constant-allocation execution to boot. We’re now getting into the realm of libsecp256k1-tier performance:

  benchmarking schnorr/sign_schnorr' (large)
  time                 43.34 μs   (43.21 μs .. 43.47 μs)
                       1.000 R²   (1.000 R² .. 1.000 R²)
  mean                 43.21 μs   (43.13 μs .. 43.30 μs)
  std dev              280.8 ns   (221.0 ns .. 357.9 ns)

  benchmarking schnorr/verify_schnorr'
  time                 94.57 μs   (94.27 μs .. 94.90 μs)
                       1.000 R²   (1.000 R² .. 1.000 R²)
  mean                 94.13 μs   (93.92 μs .. 94.38 μs)
  std dev              777.4 ns   (644.8 ns .. 925.8 ns)

  benchmarking ecdsa/sign_ecdsa' (large)
  time                 35.21 μs   (35.11 μs .. 35.32 μs)
                       1.000 R²   (1.000 R² .. 1.000 R²)
  mean                 35.14 μs   (35.09 μs .. 35.20 μs)
  std dev              184.4 ns   (135.9 ns .. 241.5 ns)

  benchmarking ecdsa/verify_ecdsa'
  time                 88.70 μs   (88.44 μs .. 88.93 μs)
                       1.000 R²   (1.000 R² .. 1.000 R²)
  mean                 88.93 μs   (88.75 μs .. 89.13 μs)
  std dev              635.0 ns   (517.3 ns .. 963.9 ns)

And yeah, I also implemented Montgomery arithmetic for two of the secp256k1 domains of interest, i.e. those with moduli defined by the secp256k1 field prime and scalar group order, respectively. This allowed me to cut out a bunch of expensive modP/modQ operations everywhere in the Schnorr and ECDSA implementations.

Haskell’s much-maligned ‘Num’ typeclass actually makes these Montgomery form numbers a joy to work with. To crib an illustrative GHCi session from the ppad-fixed README:

  > import qualified Numeric.Montgomery.Secp256k1.Curve as C
  >
  > let one = 1 :: C.Montgomery
  > one
  1
  > putStrLn (C.render one)
  (4294968273, 0, 0, 0)
  >
  > let two = one + one
  > putStrLn (C.render two)
  (8589936546, 0, 0, 0)
  >
  > let big = 2 ^ (128 :: Int) :: C.Montgomery
  > big
  340282366920938463463374607431768211456
  > putStrLn (C.render big)
  (0, 0, 4294968273, 0)
  >
  > let inv = C.inv big
  > inv
  85349562743316995932995116683053049354367560536510302240860302699983992117553
  > putStrLn (C.render inv)
  (0, 0, 1, 0)
  >
  > inv * big
  1

(N.b., I’ve been writing Haskell for something like fourteen years, but to this day I still become amazed by what you can do with Num. It’s great to be able to write in terms of plain integers, but get fixed-width Montgomery stuff automatically handled under the hood, similar to the application I described for automatic differentiation some years ago.)

There’s still room to squeeze out further performance gains here. Some stuff that comes to mind: using Barrett reduction for my modQ operation, in place of the cute Montgomery round-trip I presently do to calculate it; employing the GLV endomorphism that secp256k1 admits in wNAF scalar multiplication; probably going for Bernstein-Yang inversion instead of the fixed-schedule Fermat inversion that I presently use. I’ll get to all of them eventually, as the need or interest arises.

Check it out for all of your Haskell secp256k1 needs, and merely let me know if there are any additional features you could use!