The Applicative Structure of the Giry Monad26 Feb 2017
In my last two posts about the Giry monad I derived the thing from its categorical and measure-theoretic foundations. I kind of thought that those posts wouldn’t be of much interest to people but they turned out to be a hit. I clearly can’t tell what the internet likes.
Anyway, something I left out was the theoretical foundation of the Giry monad’s Applicative instance, which seemed a bit odd. I also pointed out that applicativeness in the context of probability implies independence between probability measures.
In this article I’m going to look at each of these issues. After playing around with the foundations, it looks like the applicative instance for the Giry monad can be put on a sound footing in terms of the standard measure-theoretic concept of product measure. Also, it turns out that the claim I made of applicativeness independence is somewhat ill-posed. But, using the shiny new intuition we’ll glean from a better understanding of the applicative structure, we can put that on a solid footing too.
So let’s look at both of these things and see what’s up.
The foundational categorical concept behind applicative functors is the monoidal functor, which is a functor between monoidal categories that preserves monoidal structure.
Formally: for monoidal categories and , a monoidal functor is a functor and associated natural transformations and that satisfy some coherence conditions that I won’t mention here. Notably, if and are isomorphisms (i.e. are invertible) then is called a strong monoidal functor. Otherwise it’s called lax. Applicative functors in particular are lax monoidal functors.
This can be made much clearer for endofunctors on a monoidal category . Then you only have and to worry about. If we sub in the Giry monad from the last couple of posts, we’d want and .
Does the category of measurable spaces have a monoidal structure? Yup. Take measurable spaces and . From the Giry monad derivation we already have that the monoidal identity corresponds to a Dirac measure at a point, so that’s well and good. And we can define the tensor product between and as follows: let be the standard Cartesian product on and and let be the smallest -algebra generated by the Cartesian product of measurable sets and . Then is a measurable space, and so is monoidal.
Recall that and - the space of measures over and respectively - are themselves objects in . So, clearly is a measurable space, and if is monoidal then there must exist a natural transformation that can take us from there to . This is the space of measures over the product .
So the question is: does have the required monoidal structure?
Yes. It must, since is a monad, and any monad can generate the required natural transformation. Let be the monadic ‘join’ operator and be the monadic identity . We have, evaluating right-to-left:
Using makes this much easier to read:
or in code, just:
phi :: Monad m => (m a, m b) -> m (a, b) phi (m, n) = liftM2 (,) m n
So with that we have that is a (lax) monoidal
functor. And you can glean a monad-generated applicative operator from
that immediately (this leads to the function called ‘ap’ in
ap :: Monad m => m (a -> b) -> m a -> m b ap f x = fmap (\(g, z) -> g z) (phi f x)
(Note: I won’t refer to as the join operator from this point out in order to free it up for denoting measures.)
The correct probabilistic interpretation here is that takes a pair of probability measures to the product measure over the appropriate product space. For probability measures and on measurable spaces and respectively, the product measure is the (unique) measure on such that:
for a measurable set in .
Going through the monoidal functor route seems to put the notion of the Giry applicative instance on a more firm measure-theoretic foundation. Instead of considering the following from the Giry monad foundations article:
which is defined in terms of the dubious space of measures over measurable functions , we can better view things using the monoidal structure-preserving natural transformation . For measures and on and respectively, we have:
and then for we can use the functor structure of to do:
which corresponds to a standard applicative expression
g <$> mu <*> nu. I
suspect there’s then some sort of Yoneda argument or something that makes
currying and partial function application acceptable.
Now. What does this have to say about independence?
In particular, it’s too fast and loose to claim measures can be ‘independent’ at all. Independence is a property of measurable sets, measurable functions, and -algebras. Not of measures! But there is a really useful connection, so let’s illustrate that.
First, let’s define independence formally as follows. Take a probability space . Then any measurable sets and in are independent if
That’s the simplest notion.
Next, consider two sub--algebras and of (a sub--algebra is just a a subset of a -algebra that itself happens to be a algebra). Then and are independent if, for any in and any in , we have that and are independent.
The final example is independence of measurable functions. Take measurable functions and both from to the real numbers equipped with some appropriate -algebra . Then each of these functions generates a sub- algebra of as follows:
Then and are independent if the generated -algebras and are independent.
Note that in every case independence is defined in terms of a single measure, . We can’t talk about different measures being independent. To paraphrase Terry Tao here:
The notion of independence between [measurable functions] does not make sense if the [measurable functions] are being modeled by separate probability spaces; they have to be coupled together into a single probability space before independence becomes a meaningful notion.
To be pedantic and explicitly specify the measure by which some things are independent, some authors state that measurable functions and are -independent, for example.
We can see a connection to independence when we look at convolution and associated operators. Recall that for measures and on the same measurable space that supports some notion of addition, their convolution looks like:
The probabilistic interpretation here (see Terry Tao on this too) is that is the measure corresponding to the sum of independent measurable functions and with corresponding measures and respectively.
That looks weird though, since we clearly defined independence between measurable functions using a single probability measure. How is it we can talk about independent measurable functions and having different corresponding measures?
We first need to couple everything together into a single probability space as per Terry’s quote. Complete with some abstract probability measure to form the probability space . Now we have and measurable functions from to .
To say that and are independent is to say that their generated -algebras are -independent. And the measures that they correspond to are the pushforwards of under and respectively. So, and . The result is that the measurable functions correspond to different (pushforward) measures and , but are independent with respect to the same underlying probability measure .
The monoidal structure of then gets us to convolution. Given a product of measures and each on we can immediately retrieve their product measure via . And from there we can get to via the functor structure of - we just find the pushforward of with respect to a function that collapses a product via addition. So is defined as:
and then the convolution is thus:
Other operations can be defined similarly, e.g. for we get:
The crux of all this is whenever we apply a measurable function to a product measure, we can always extract notions of independent measurable functions from the result. And the measures corresponding to those independent measurable functions will be the components of the product measure respectively.
This is super useful and lets one claim something stronger than what the
monadic structure gives you. In an expression like
g <$> mu <*> nu <*> rho,
you are guaranteed that the corresponding random variables ,
, (for suitable projections) are independent. The same
cannot be said if you use the monadic structure to do something like
g mu nu
rho where the product structure is not enforced - in that case you’re not
guaranteed anything of the sort. This is why the applicative structure is
useful for encoding independence in a way that the monadic structure is
So there you have it. Applicativeness can seemingly be put on a straightforward measure-theoretic grounding and has some useful implications for independence.
It’s worth noting that, in the case of the Giry monad, we don’t need to go through its monadic structure in order to recover an applicative instance. We can do so entirely by hacking together continuations without using a single monadic bind. This is actually how I defined the applicative instance in the Giry monad implementation article previously:
instance Applicative Measure where pure x = Measure (\f -> f x) Measure g <*> Measure h = Measure $ \f -> g (\k -> h (f . k))
Teasing out the exact structure of this and its relation to the codensity monad is again something I’ll leave to others.