## Thursday, April 16, 2015

### Interpretations of Probability

I've been doing some reading (at Stanford's philosophy portal, among other places) and thinking about the meaning of probability — well, to some large degree on-and-off for at least 15 years, but a bit more "on" in the last month again.  The page I linked to groups concepts into three groups, which they describe as "a quasi-logical concept", "an agent's ... graded belief", and "an objective concept" that I will conflate with one of their examples, the "frequentist" idea.  My own interpretation of these ideas is that they form a nexus around "subjective" and "frequentist" ideas, with the formal mathematical calculus of probability connecting ideas to each other in important ways.  What follows are mostly my own thoughts, though clearly building on the ideas of others; that said, I'm sure there is a lot of relevant philosophical literature that I have never seen, and even much that I have seen that I have not understood the way it was meant.

I'll start by referencing a theorem related to rational agent behavior.  The upshot is that under reasonable assumptions, rational agents behave in such a way as to maximize "expected utility", where by "reasonable" I mean not that anybody behaves that way, but that if you could demonstrate to a reasonably intelligent person that they had not behaved that way, they would tend to agree that they had made a mistake somewhere. "Utility" is some numerical assignment of values to outcomes, and "expected utility" is its "expected value" under some mathematically consistent system of probabilities. The theorem, then, is that if a person's decisions are all in some normatively-appealing sense consistent with each other, there is some way of assigning probabilities and some way of assigning "utility" values such that those decisions maximized expected utility as calculated with those probabilities and utilities.

A related result that gets a lot of use in finance is that if "the market" isn't making any gross mistakes — again, in a normatively-appealing way, but also in a way that seems likely to at least approximately hold in practice — then there is some system of probabilities and payouts such that the price of an asset is the expected value of the discounted future cash flows associated with that asset.  In finance texts it is often emphasized that this system of probabilities — often called the "risk-neutral measure" — need not be the "physical measure", and indeed most practitioners expect that it will put a higher probability on "bad outcomes" than the "physical measure" would.  The "physical measure" here is often spoken of as an objective probability system in a way that perhaps sits closer to the "frequentist" idea, but if the market functions well and is made up mostly of rational agents whose behaviors are governed by similar probability measures, the "physical measure" used in models will tend to be similar to those.  The point I like to make is that the "physical measure", in a lot of applications, turns out not to matter for finance; the risk-neutral measure is all you need.  Further, the risk-neutral measure seems philosophically clearer; it's a way of describing the prices of assets in the market, and, implicitly, even prices of assets that aren't in the market.[1] It should be noted, though, that the "physical measure" is what people prefer for econometrics, so when one is doing financial econometrics one often needs both.

These contexts, in which a set of numbers on possible events has all of the mathematical properties of a probability system but need not correspond tightly to what we think of as "probability", play a role in my thinking.[2]

I think the most common definitions you would get for "probability" from the educated layman would fit into the frequentist school; the "probability" of an event is how often it would occur if you ran the same experiment many times.  Now, the law of large numbers is an inevitable mathematical consequence of just the mathematical axioms of probability; if a "draw" from a distribution has a particular value with an assigned probability, then enough independent draws will, with a probability as close to 1 as you like, give that particular value with a frequency as close to the assigned probability as you like.  If you and I assign different probabilities to the event but use the laws of probability correctly, then if we do the experiment enough times, I will think it "almost impossible" that the observed frequency will be close to your prediction, and you will think it "almost impossible" that it will be close to my prediction.  Unless one of us assigns a probability of 0 or 1, though, any result based on a finite number of repetitions cannot be completely ruled out; inferring that one of us was wrong requires at some point deciding that (say) 1×10-25 is "practically 0" or 1-1×10-25 is "practically 1". For any level of precision you want (but not perfect precision), and for as small a probability (but not actually 0) as you insist before declaring a probability "practically zero", there is some finite sample size that will allow you to "practically" determine the probability with that precision. So this is how I view the "frequentist" interpretation of probability: the laws of probability are augmented by a willingness to act as though events with sufficiently low probability are actually impossible.[3]

More often, my own way of thinking about probabilities is closer to the "subjective" probability; "a probability" is a measure of my uncertain belief, and the tools of probability are a set of tools for managing my ignorance.  It is necessarily a function of the information I do have; if you and I have different information, the "correct" probability to assign to an event will be different for me than for you.[4]  If one of us regularly has more (or more useful) information than the other, then one of us will almost certainly, over the course of many probability assessments, put a higher probability on the series of outcomes that actually occurs than the other will; that is to be expected, insofar as my ignorance as to whether it will rain is in part an ignorance of information that would allow me to make a better forecast.  There is a tie-in here to the frequentist interpretation as I cast it in the previous paragraph, related to Mark Twain's assertion that "history doesn't repeat itself, but it rhymes": not only is it impossible to take an infinite number of independent draws from a distribution, it is impossible to take more than one with any reliability. At least sometimes, however, we may do multiple experiments that are the same as far as we know — that is, we can't tell the difference between them, aside from the result. If we count as a "repetition" those events that looked that same in terms of the information we have[5], then we might have enough "repetitions" to declare that it is "practically impossible" that the probability of an observation, conditional on the known information, lies outside of a particular range.

One last interpretation of probability, though, is on some level not to interpret probability.  (One might call this the "nihilist interpretation".)  A fair amount of the "interpretations of probability" program seems oriented around the idea that whether an event "happens" or not, or whether something is "true" or not, is readily and cleanly understood, and there is some push to get probabilities close to 0 or 1, since we feel like we understand those special cases.  We know, though, that our senses and minds are unreliable; everything we know about the world outside ourselves is with a probability that is, in honesty, strictly between 0 and 1.  As we get close to 0, or close to 1, as a practical matter, the remaining distance will make no practical difference — it can't.  But those parts of the world that are practically described by probabilities are in reality on a continuum with those we can practically treat differently, and consistently follow the laws of mathematics and nature, with 0 and 1 as, at the very best, special cases.

[1] If there are a lot of relevant possible assets that "aren't in the market", the risk-neutral measure may not be unique, i.e. there may be several different systems of probability that are consistent with the mathematical rules of probability and market prices; the conditions for existence are more practically plausible than the conditions required for uniqueness. Sometimes you might wish to a price a hypothetical asset whose price depends on which of the available risk-neutral measures you use, in which case existing prices will not fully guide you.

[2] As is noted at the Stanford link, there is some sense in which mass and volume can be made to behave according to the laws of probability; it is probably important to my philosophical development that the systems of "probability" I give in the text are closer in ineffable spirit to the common idea of "probability" than that.

[3] To some extent I'm restricting my discussion to "discrete" probability distributions to avoid having to talk much about "measurability", and to some extent I have failed here; if you flip a fair coin 100 times, any series of outcomes has a probability of less than 1×10-25. There are 161700 different series that contain 3 heads and 97 tails; if I don't distinguish between any of those 161700 different outcomes, then the probability of that single aggregate "3 heads" event is bigger than 10-25, even though any single way of doing it is not. If I insist on rounding the probability of each possible outcome to 0, then it is certain that an "impossible" outcome will result, but if I say "there are 101 measurable events, one for each possible number of heads," then the probability of an "impossible" outcome is extremely low (in this case, there are 6 such "impossible" outcomes, and they are, taken together, "impossible"). Ultimately you would probably want to take account of how many different events you want to distinguish when you're deciding what threshold you're rounding to 0; if you want to distinguish 1025 different events, then a probability threshold substantially smaller than 10-25 should be used.

[4] In some sense, this is what the Stanford site calls "objective probability", insofar as I'm asserting a "right" and "wrong" notion. What might be a conditional probability from the standpoint of the "objective" probability idea — that is, the probability conditional on the information we know — is what I'm thinking of here as my "prior" probability, along with an assertion that what from the "objective probability" standpoint would be a "prior" probability isn't actually meaningful.

[5] This, too, is basically "measurability", which is perhaps unavoidable in any non-trivial treatment of probability, even with finite "sample spaces".