Tuesday, August 9, 2016

simplified heuristics and Bellman equations

An idea I've probably mentioned is that certain behavioral biases are perhaps simplifications that, on average, at least in the sort of environment in which the human species largely evolved, work very well.  We can write down our von Neumann / Morgenstern / Friedman / Savage axioms and argue that a decision-maker that is not maximizing expected utility (for some utility and some probability measure) is, by its own standards, making mistakes, but the actual optimization, in whatever sense it's theoretically possible with the agent's information, may be very complicated, and simple heuristics may be much more practical, even if they occasionally create some apparent inconsistencies.

Consider a standard dynamic programming (Bellman) style set-up: there's a state space, and the system moves around within the state space, with a transition function specifying how the change in state is affected by the agent's actions; the agent gets a utility that is a function of the state and the agent's action, and a rational agent attempts to choose actions to optimize not just the current utility, but the long-run utility.  Solving the problem typically involves (at least in principle) finding the value function, viz. the long-run utility that is associated with each state; where one action leads to a higher (immediate) utility than the other but favors states that have lower long-run utility, the magnitudes of the effects can be compared.  The value function comprises all the long-run considerations you need to make, and the decision-making process at that point is superficially an entirely myopic one, trying in the short-run to optimize the value function (plus, weighted appropriately, the short-run utility) rather than the utility alone.

A problem that I investigated a couple of years ago, at least in a somewhat simple setting, was whether the reverse problem could be solved: given a value function and a transition rule, can I back out the utility function?  It turns out that, at least subject to certain regularity conditions, the answer is yes, and that it's generally mathematically easier than going in the usual direction.  So here's a project that occurs to me: consider such a problem with a somewhat complex transition rule, and suppose I can work out (at least approximately) the value function, and then I take that value function with a much simpler transition function and try to work out a utility function that gives the same value function with the simpler transition function.  I have a feeling I would tend to reach a contradiction; the demonstration that I can get back the utility function supposed that it was in fact there, and if there is no such utility function I might find that the math raises some objection.  If there is such a utility function that exactly solves the problem, of course, I ought to find it, but there seems to me at least some hope that, even if there isn't, the math along the way will hint how to find a utility function, preferably a simple one, that gives approximately the same value function.  This, then, would suggest that a seemingly goal-directed agent pursuing a comparatively simple goal would behave the same way as the agent pursuing the more complicated goal.

cf. Swinkels and Samuelson (2006): "Information, evolution and utility," Theoretical Economics, 1(1): 119--142, which pursues the idea that a cost in complication in animal design would make it evolutionarily favorable for the animal to be programmed directly to seek caloric food, for example, rather than assess at each occasion whether that's the best way to optimize long-run fecundity.