On Measurability

.. this one is pretty dry, I’ll admit. David Williams said it best:

.. Measure theory, that most arid of subjects when done for its own sake, becomes amazingly more alive when used in probability, not only because it is then applied, but also because it is immensely enriched.

Unfortunately for you, dear reader, we won’t be talking about probability.

Moving on. What does it mean for something to be measurable in the mathematical sense? Take some arbitrary collection \(X\) and slap an appropriate algebraic structure \(\mathcal{X}\) on it - usually an algebra or \(\sigma\)-algebra, etc. Then we can refer to a few different objects as ‘measurable’, going roughly as follows.

The elements of the structure \(\mathcal{X}\) are called measurable sets. They’re called this because they can literally be assigned a notion of measure, whcih is a kind of generalized volume. If we’re just talking about some subset of \(X\) out of the context of its structure then we can be pedantic and call it measurable with respect to \(\mathcal{X}\), say. You could also call a set \(\mathcal{X}\)-measurable, to be similarly precise.

The product of the original collection and its associated structure \((X, \mathcal{X})\) is called a measurable space. It’s called that because it can be completed with a measuring function \(\mu\) - itself called a measure - that assigns notions of measure to measurable sets.

Now take some other measurable space \((Y, \mathcal{Y})\) and consider a function \(f\) from \(X\) to \(Y\). This is a measurable function if it satisfies the following technical requirement: that for any \(\mathcal{Y}\)-measurable set, its preimage under \(f\) is an element of \(\mathcal{X}\) (so: the preimage under \(f\) is \(\mathcal{X}\)-measurable).

The concept of measurability for functions probably feels the least intuitive of the three - like one of those dry taxonomical classifications that we just have to keep on the books. The ‘make sure your function is measurable and everything will be ok’ heuristic will get you pretty far. But there is some good intuition available, if you want to look for it.

Here’s an example: define a set \(X\) that consists of the elements \(A\), \(B\), and \(C\). To talk about measurable functions, we first need to define our measurable sets. The de-facto default structure used for this is a \(\sigma\)-algebra, and we can always generate one from some underlying class of sets. Let’s do that from the following plain old partition that splits the original collection into a couple of disjoint ‘slices’:

\[H = \{\{A, B\}, \{C\}\}\]

The \(\sigma\)-algebra \(\mathcal{X}\) generated from this partition will just be the partition itself, completed with the whole set \(X\) and the empty set. To be clear, it’s the following:

\[\mathcal{X} = \left\{\{A, B, C\}, \{A, B\}, \{C\}, \emptyset\right\}\]

The resulting measurable space is \((X, \mathcal{X})\). So we could assign a notion of generalized volume to any element of \(\mathcal{X}\), though I won’t actually worry about doing that here.

Now. Let’s think about some functions from \(X\) to the real numbers, which we’ll assume to be endowed with a suitable \(\sigma\)-algebra of their own (one typically assumes the standard topology on \(\mathbb{R}\) and the associated Borel \(\sigma\)-algebra).

How about this - a simple indicator function on the slice containing \(C\):

\[f(x) = \begin{cases} 0, \, x \in \{A, B\} \\ 1, \, x \in \{C\} \end{cases}\]

Is it measurable? That’s easy to check. The preimage of \(\{0\}\) is \(\{A, B\}\), the preimage of \(\{1\}\) is \(\{C\}\), and the preimage of \(\{0, 1\}\) is \(X\) itself. Those are all in \(\mathcal{X}\), and the preimage of the empty set is the empty set, so we’re good.

Ok. What about this one:

\[g(x) = \begin{cases} 0, \, x \in \{A\} \\ 1, \, x \in \{B\} \\ 2, \, x \in \{C\} \end{cases}\]

Check the preimage of \(\{1, 2\}\) and you’ll find it’s \(\{B, C\}\). But that’s not a member of \(\mathcal{X}\), so \(g\) is not measurable!

What happened here? Failing to satisfying technical requirements aside: what, intuitively, made \(f\) measurable where \(g\) wasn’t?

The answer is a problem of resolution. Look again at \(\mathcal{X}\):

\[\left\{\{A, B, C\}, \{A, B\}, \{C\}, \emptyset\right\}\]

The structure \(\mathcal{X}\) that we’ve endowed our collection \(X\) with is too coarse to permit distinguishing between elements of the slice \(\{A, B\}\). There is no measurable set \(A\), nor a measurable set \(B\) - just a measurable set \(\{A, B\}\). And as a result, if we define a function that says something about either \(A\) or \(B\) without saying the same thing about the other, that function won’t be measurable. The function \(f\) assigned the same value to both \(A\) and \(B\), so we didn’t have any problem there.

If we want to be able to distinguish between \(A\) and \(B\), we’ll need to equip \(X\) with some structure that has a finer resolution. You can check that if you make a measurable space out of \(X\) and its power set (the set of all subsets of \(X\)) then \(g\) will be measurable there, for example.

So if we’re using partitions to define our measurable sets, we get a neat little property: for any measurable function, elements in the same slice of the partition must have the same value when passed through the function. So if you have a function \(h : X \to H\) that takes an element to its respective slice in a partition, you know that \(h(x_{0}) = h(x_{1})\) for any \(x_{0}\), \(x_{1}\) in \(X\) implies that \(f(x_{0}) = f(x_{1})\) for any measurable function \(f\).

Addendum

Whipping together a measurable space using a \(\sigma\)-algebra generated by a partition of sets occurs naturally when we talk about correlated equilibrium, a solution concept in non-cooperative game theory. It’s common to say a function - in that context a correlated strategy - must be measurable ‘with respect to the partition’, which sort of elides the fact that we still need to generate a \(\sigma\)-algebra from it anyway.

Some oldschool authors (Halmos, at least) developed their measure theory using \(\sigma\)-rings, but this doesn’t seem very popular nowadays. Since a ring doesn’t require including the entire set \(X\), you need to go through an awkward extra hoop when defining measurability on functions. But regardless, it’s interesting to think about what happens when one uses different structures to define measurable sets!

Making a Market

Suppose you’re in the derivatives business. You are interested in making a market on some events; say, whether or not your friend Jay will win tomorrow night’s poker game, or that the winning pot will be at least USD 100. Let’s examine some rules about how you should do business if you want this venture to succeed.

What do I mean by ‘make a market’? I mean that you will be willing to buy and sell units of a particular security that will be redeemable from the seller at some particular value after tomorrow’s poker game has ended (you will be making a simple prediction market, in other words). You can make bid offers to buy securities at some price, and ask offers to sell securities at some price.

To keep things simple let’s say you’re doing this gratis; society rewards you extrinsically for facilitating the market - your friends will give you free pizza at the game, maybe - so you won’t levy any transaction fees for making trades. Also scarcity isn’t a huge issue, so you’re willing to buy or sell any finite number of securities.

Consider the possible outcomes of the game (one and only one of which must occur):

  1. (A) Jay wins and the pot is at least USD 100.
  2. (B) Jay wins and the pot is less than USD 100.
  3. (C) Jay loses and the pot is at least USD 100.
  4. (D) Jay loses and the pot is less than USD 100.

The securities you are making a market on pay USD 1 if an event occurs, and USD 0 otherwise. So: if I buy 5 securities on outcome \(A\) from you, and outcome \(A\) occurs, I’ll be able to go to you and redeem my securities for a total of USD 5. Alternatively, if I sell you 5 securities on outcome \(A\), and outcome \(A\) occurs, you’ll be able to come to me and redeem your securities for a total of USD 5.

Consider what that implies: as a market maker, you face the prospect of making hefty payments to customers who redeem valuable securities. For example, imagine the situation where you charge USD 0.50 for a security on outcome \(A\), but outcome \(A\) is almost certain to occur in some sense (Jay is a beast when it comes to poker and a lot of high rollers are playing); if your customers exclusively load up on 100 cheap securities on outcome \(A\), and outcome \(A\) occurs, then you stand to owe them a total payment of USD 100 against the USD 50 that they have paid for the securities. You thus have a heavy incentive to price your securities as accurately as possible - where ‘accurate’ means to minimize your expected loss.

It may always be the case, however, that it is difficult to price your securities accurately. For example, if some customer has more information than you (say, she privately knows that Jay is unusually bad at poker) then she potentially stands to profit from holding said information in lieu of your ignorance on the matter (and that of your prices). Such is life for a market maker. But there are particular prices you could offer - independent of any participant’s private information - that are plainly stupid or ruinous for you (a set of prices like this is sometimes called a Dutch book). Consider selling securities on outcome \(A\) for the price of USD -1; then anyone who buys one of these securities not only stands to redeem USD 1 in the event outcome \(A\) occurs, but also gains USD 1 simply from the act of buying the security in the first place.

Setting a negative price like this is irrational on your part; customers will realize an arbitrage opportunity on securities for outcome \(A\) and will happily buy as many as they can get their hands on, to your ruin. In other words - and to nobody’s surprise - by setting a negative price, you can be made a sure loser in the market.

There are other prices you should avoid setting as well, if you want to avoid arbitrage opportunities like these. For starters:

  • For any outcome \(E\), you must set the price of a security on \(E\) to be at least USD 0.
  • For any certain outcome \(E\), you must set the price of a security on \(E\) to be exactly USD 1.

The first condition rules out negative prices, and the second ensures that your books balance when it comes time to settle payment for securities on a certain event.

What’s more, the price that you set on any given security doesn’t exist in isolation. Given the outcomes \(A\), \(B\), \(C\), and \(D\) listed previously, at least one must occur. So as per the second rule, the price of a synthetic derivative on the outcome “Jay wins or loses, and the pot is any value” must be set to USD 1. This places constraints on the prices that you can set for individual securities. It suffices that:

  • For any countable set of mutually exclusive outcomes \(E_{1}, E_{2}, \ldots\), you must set the price of the security on outcome “\(E_{1}\) or \(E_{2}\) or..” to exactly the sum of the prices of the individual outcomes.

This eliminates the possibility that your customers will make you a certain loser by buying elaborate combinations of securities on different outcomes.

There are other rules that your prices must obey as well, but they fall out as corollaries of these three. If you broke any of them you’d also be breaking one of these.

It turns out that you cannot be made a sure loser if, and only if, your prices obey these three rules. That is:

  • If your prices follow these rules, then you will offer customers no arbitrage opportunities.
  • Any market absent of arbitrage opportunities must have prices that conform to these rules.

These prices are called coherent, and absence of coherence implies the existence of arbitrage opportunities for your customers.

But Why Male Models

The trick, of course, is that these prices correspond to probabilities, and the rules for avoiding arbitrage correspond to the standard Kolmogorov axioms of probability theory.

The consequence is that if your description of uncertain phenomena does not involve probability theory, or does not behave exactly like probability theory, then it is an incoherent representation of information you have about those phenomena.

As a result, probability theory should be your tool of choice when it comes to describing uncertain phenomena. Granted you may not have to worry about market making in return for pizza, but you’d like to be assured that there are no structural problems with your description.

Comments

This is a summary of the development of probability presented in Jay Kadane’s brilliant Principles of Uncertainty. The original argument was developed by de Finetti and Savage in the mid-20th century.

Kadane’s book makes for an exceptional read, and it’s free to boot. I recommend checking it out if it has flown under your radar.

An interesting characteristic of this development of probability is that there is no way to guarantee the nonexistence of arbitrage opportunities for a countably infinite number of purchased securities. That is: if you’re a market maker, you could be made a sure loser in the market when it came time for you to settle a countably infinite number of redemption claims. The quirk here is that you could also be made a sure winner as well; whether you win or lose with certainty depends on the order in which the claims are settled! (Fortunately this doesn’t tend to be an issue in practice.)

Thanks to Fredrik Olsen for reviewing a draft of this post.

References

flat-mcmc Update and v1.0.0 Release

I’ve updated my old flat-mcmc library for ensemble sampling in Haskell and have pushed out a v1.0.0 release.

History

I wrote flat-mcmc in 2012, and it was the first serious-ish size project I attempted in Haskell. It’s an implementation of Goodman & Weare’s affine invariant ensemble sampler, a Monte Carlo algorithm that works by running a Markov chain over an ensemble of particles. It’s easy to get started with (there are no tuning parameters, etc.) and is sufficiently robust for a lot of purposes. The algorithm became somewhat famous in the astrostatistics community, where some of its members implemented it via the very nice and polished Python library, emcee.

The library has become my second-most starred repo on Github, with a whopping 10 stars as of this writing (the Haskell MCMC community is pretty niche, bro). Lately someone emailed me and asked if I wouldn’t mind pushing it to Stackage, so I figured it was due for an update and gave it a little modernizing along the way.

I’m currently on sabbatical and am traveling through Vietnam; I started the rewrite in Hanoi and finished it in Saigon, so it was a kind of nice side project to do while sipping coffees and the like during downtime.

What Is It

I wrote a little summary of the library in 2012, which you can still find tucked away on my personal site. Check that out if you’d like a description of the algorithm and why you might want to use it.

Since I wrote the initial version my astrostatistics-inclined friends David Huijser and Brendon Brewer wrote a paper about some limitations they discovered when using this algorithm in high-dimensional settings. So caveat emptor, buyer beware and all that.

In general this is an extremely easy-to-use algorithm that will probably get you decent samples from arbitrary targets without tedious tuning/fiddling.

What’s New

I’ve updated and standardized the API in line with my other MCMC projects huddled around the declarative library. That means that, like the others, there are two primary ways to use the library: via an mcmc function that will print a trace to stdout, or a flat transition operator that can be used to work with chains in memory.

Regrettably you can’t use the flat transition operator with others in the declarative ecosystem as it operates over ensembles, whereas the others are single-particle algorithms.

The README over at the Github repo contains a brief usage example. If there’s some feature you’d like to see or documentation/examples you could stand to have added then don’t hestitate to ping me and I’ll be happy to whip something up.

In the meantime I’ve pushed a new version to Hackage and added the library to Stackage, so it should show up in an LTS release soon enough.

Cheers!