Saturday, August 2, 2014

productivity and labor compensation

This post will be much more practically oriented than most of this blog, and a premium is placed on brevity.

It is occasionally noted (though often obscuring important details) that
  • cash income
  • per household
  • deflated with consumer prices (especially if badly chained)
has, in my lifetime, lagged badly behind
  • economic production
  • per hour worked
  • deflated with the GDP deflator
in the United States.  Especially through 2008, almost all of this is because of
  • an increasing portion of labor compensation becoming "in-kind", in part for tax reasons,
  • decreasing household size, and
  • chaining effects and increasing prices of imports relative to exports.
Our worsening terms of trade are certainly interesting, and a reasonable target for policy attention, but most sources that compare the first series to the second series try to imply that the difference means something very different from what it does.

Now, in the last five years labor compensation has lagged behind production in common units of account; this is frequently true early in the business cycle, and the "beginning" of this business cycle has been frustratingly longer than usual, but it's premature to diagnose a secular shift.

Monday, July 7, 2014

the value of markets

One of my favorite financial writers writes:
The job of equity markets is to provide liquidity and price discovery. An efficient market will provide liquidity at a very low cost, and will adjust prices very quickly to respond to changes in demand.
He goes on to mostly ignore the tension between the two.

Suppose you own a large block of stock and decide to sell it to build a deck on the back of your house.  What should a well-functioning market do?  I think most people in the field would say that a perfect market — in particular, a very liquid one — will allow you to sell it all quickly for very close to its current price.  The purpose of the market, after all, is to allow you to buy and sell stocks when you want to, ideally in a cheap and efficient manner.

Now suppose you own a large block of stock and decide to sell it because you've found out that the CEO is laundering money for a drug cartel and most of the reported "profits" of the company are fraudulent.  What should a well-functioning market do?  I think most people in the field would say that a perfect market — in particular, one that is doing its price-discovery job well — should drop precipitously, incorporating the information that your trade conveys about the value of the stock into its price even before you finish trading — such that the last few shares you sell will be at a substantial discount to the earlier, less informed price.

Finally, suppose you own a large block of stock and decide to sell it.  What should a well-functioning market do?  Well, ideally it would know why you're selling.  In practice the best you might expect is that some of the market participants to eventually get a sense of whether certain kinds of trades (placed at certain times of day, in certain size, perhaps in sequences of orders that seem connected to each other) are more likely to indicate that the price is too high than other kinds of trades, which mostly just mean that somebody lacks a back deck and wants one.  If it gets pretty good at this, you might even hear the guy who got the scoop on the CEO's corruption complain about the "lack of liquidity" in the market; he might even badly abuse the term "front running".  For more on that, go read Levine's column.

There is one more important wrinkle to add here, which is time-horizon.  Suppose we're talking about a small cap stock that trades 50,000 shares a day, and you're trying to sell 100,000 shares (for deck-like reasons).  You decide to break it up into smaller orders to sell over the course of two weeks.  After you sell 10,000 shares the first day and 10,000 shares the next day, traders in the stock (such as there are) figure out that there's probably a bunch more sell orders coming over the next several days.  Is that information about the value of the stock or isn't it?

Well, ideally again, perhaps someone would step in and do a block trade with you at (close to) the initial price.  Insofar as the market isn't likely to be perfectly liquid, though, your trade can be expected to lower the price at least a bit, at least for a while. A trader with a time-horizon of months will regard this as a temporary "liquidation" that doesn't really concern the long-term (even months-long) value of the stock, but to a trader with a time-horizon of a day or two, "something's going to push the price down over the next day or two" is as informative as information comes.

So at this point the approximately ideal market with some first-order approximation of reality layered onto it probably drops a little bit for now, with the common expectation that it will bounce back in the next month or two, with the drop at such a size that some extra buyers are willing to come in and in some aggregate sense spread the sale out over the next few months, making a bit of extra profit on the deal, but small enough (and with a small enough expected attendant rebound) that you're willing to forego that "bit of extra profit" in order to get your hands on the cash now.  On some level, perhaps you might as well call up Goldman Sachs and pay them the fee for doing a block trade with you.

Tuesday, June 17, 2014

instruments of Fed policy

If the fed raised the reserve requirement (not now; under the sort of circumstances that we persist in calling "normal" more than five years since they were last seen), that should steepen the yield curve as long-term credit becomes scarcer relative to the supply of demand deposits and other short-term highly liquid investment. In periods historically where the yield curve becomes inverted I imagine some benefit might have been derived from somewhat tighter constraints, and where it's steepest perhaps somewhat looser; perhaps it would make sense to change reserve requirements in tandem with a target for the steepness of the yield curve.

The tl;dr version of the previous post is that, in the short term — on the order of an eighth of a year — the FOMC is likely to continue to ask the New York Fed to aim at something that is readily monitored in something like real-time, and it seems like the difference between long-term rates and short-term rates is a better target over that kind of period than short-term rates alone; in particular, if long-term rates go up over the weeks after an FOMC meeting, presumably that means the market has come to believe that inflation and/or returns to sunk capital will be higher than was believed at the last meeting and a somewhat high short-term rate is appropriate.

So I'm now suggesting that the FOMC set a target for the steepness of the yield curve, and, just as it customarily used to change the deposit rate in lockstep with the FOMC federal funds target, the board of governors would then customarily change the required deposit ratio in lockstep with the target for the steepness of the yield curve. There are clearer reasons for deviating from this from time to time than was the case with the deposit rate, and I'm not denying the board of governors the ability to do that, but rather than "leave them unchanged" as the default, I would suggest something slightly procyclical as the default instead.

Monday, May 19, 2014

long term impacts and empirics

There has been an accumulation of evidence in the past several years that labor markets respond to prices slowly and, at least at the first incidence, through flows rather than stocks; an exogenous increase in wages causes employers to pull back in hiring and increase layoffs, not by a lot (in terms of rate) but for a long time. A lot of previous studies had missed these effects because they looked at employment levels and included "corrections" for trend — when in fact the relevant "trend" was the very signal for which they were looking.

Labor is perhaps the strongest example, but there are a lot of markets in which we participate in which we establish, in some meaningful sense, relationships; if there are five supermarkets nearby, you may almost always go to one or two of them, such that you wouldn't respond to a sale at one of the others.  In other contexts, too, long-term behavior may obey very different rules than short-term behavior, but a lot of the most popular empirical techniques right now involve the use of changes shortly after other changes to infer causal relationships; long-term interrelationships are a lot harder to tease out causally.  (A change that happens in many places two or three months after legislation is announced, passed, or goes into effect is easy to attribute as an effect of the legislation; changes over the course of ensuing years are harder to distinguish from underlying trends that may, in fact, have motivated the legislation in the first place.)  When long-term and short-term impacts differ, I think there's a consensus that the long-term impacts are, from a policy standpoint, generally more important — the long-term is longer than the short-term, after all — but I worry that a lot our empirical studies now are trumpeting results about short-term relationships because teasing out causal directions can be done more convincingly.  Certainly there is something to be said for doing the possible rather than the impossible, but I think more studies of long-term behavior for which the "identification" [technical word] is less compelling should be encouraged, even if they might be harder to interpret in a crystal clear way.

Monday, May 5, 2014

A simple identity for Bayesian updating

For random variables A and X, consider the relationship
E{XA}=E{X}E{A}+ ρXAσXσA
which, up to a bit of arithmetic, is basically the definition of correlation. If A is a binary variable, though, we can do more with this; among other things, in this case σA2=E{A}(1-E{A}). Conflating the variable A a bit with the "event" A=1, and doing a bit of algebra, we get
The effect of the arrival of new information on the expected value of a variable is proportional to the square root of the odds ratio. Among other things, it can't be more than σX times the square root of the odds ratio, though this bound, which (obviously?) is reached when X is a linear function of A and therefore is a binary variable, can be more directly derived in that context.

Saturday, April 5, 2014

fair prices, non-pecuniary exchange, and bargaining costs

I've mentioned earlier my notion that the popularity both of focal prices in very non-competitive markets and of in-kind exchange (rather than the "usual" semi-monetary exchange) may reduce bargaining costs. The former story especially seems to work best where there isn't that much incomplete information; in particular, there should be a common belief that everyone is probably gaining at the likely focal point.  Attempts to negotiate or insist on a price different from the focal point are then interpreted as antisocial attempts to claim more of the surplus.

The case of non-pecuniary exchange, though, seems to be importantly driven by incomplete information, especially of how the thing being exchanged would compare (either in terms of cost or benefit) with dollars, and the fact that it's easier to find a focal point for in-kind exchange that seems obviously "fair" and mutually beneficial without the need for costly bargaining.

The main thought that I've been dwelling on more recently is that, while we often talk about "price discovery" as a function of the market, in both of these situations the cost of performing "price discovery" is being avoided. The market is not, in its ultimate sense, "failing"; the market ensuring that mutually beneficial trades take place, and in fact take place relatively efficiently. Instead of working out the price on the way to a solution to the ultimate problem, it simply routes around the hard part and jumps straight toward the end.

Thursday, January 30, 2014

overfitting and regularization

I'm trying to think through and recast some of the ideas around regularization from fields that do mostly atheoretic modeling of largish data sets.  The general setup is that we have a set of models ℋ — e.g. {yi=mx+b+σεi|m,b,σ are real numbers with σ positive} where ε follows some distribution, though typically we're imagining a set of models that requires far more than 3 real numbers to naturally parameterize it — and we're looking for the one* that best describes the population from which the data are sampled.  Now this really is kind of key; if you mistake your problem for "find the one that best describes the data", that's when you're going to get overfitting — if you have 1000 data that basically follow y=x2+ε and you try to fit a 100-order polynomial to the data, your model is going to depend on the noise in that particular data set and will do less well at fitting "out of sample" — i.e. at describing data from the population that aren't in your sample — than if you had used a simpler model.

On some level it might seem hopeless to account for the data you can't see, but regularization can work quite well, and even makes a certain amount of intuitive sense.  The way it's usually done, I have a set of subsets of ℋ that is much smaller than ℋ (in some sense — typically the set of subsets is of much smaller dimension than ℋ itself, i.e. I can specify a subset with only a couple of parameters even if specifying a particular point in ℋ requires many parameters).  Now I ask, for each subset H, if I randomly select (say) 80% of my data sample and pick the model h in H that best describes that 80% of the data, how well will it tend to fit the other 20% of the data?  Often some of the subsets turn out to do much, much better than others.  It seems reasonable to think that if H does a poor job at this exercise, then even if you pick a model in H that fits all of the data you have, it's going to be hard to trust that that model is a good description of the data you don't see; there's perhaps something about H that makes it prone to pay too much attention to "noise", i.e. to the things about the sample that are not representative of the population.  So you try instead to restrict yourself to subsets of ℋ that seem to do well out-of-sample in-sample, and hope that this implies that they're likely to do well out-of-sample out-of-sample as well.

I've already perhaps recast it slightly from its usual presentation, but I'm trying to recast it further, and look for a way of doing something like regularization but without resort to this set of subsets.  To get there, though, I want to remain focussed on the effect of a single observation on the choice of model within each H.  To some extent, we can take a point x in the population and break down the extent to which it will tend to be poorly fit "out of sample" into two parts:
  • how poorly does it typically fit when x is included in the sample? I.e., for a sample that includes x, if we look at the model in H that best fits that sample, how badly does it fit x?
  • how much worse does it fit when x is not in the sample than when it is?
I wish to emphasize at this point that, even if this depends to some extent on x — i.e. if some points have a greater tendency to be hard to fit out-of-sample than other points — it will still also tend to depend on H, i.e.—some sets of models will be more prone to producing bad out-of-sample fits than other sets of models. Standard optimization techniques allow us to minimize (and observe) how poorly a model fits in sample; I'm looking, then, for an indication of how poorly a model tends to fit out of sample.

Well, here's one potential little gem: if the log-likelihood function of a sample is additively separable in its data points and we can parameterize H such that the log-likelihood functions are continuously differentiable and at least prone toward concavity, then the optimization procedure is fairly straightforward: take derivatives and set to 0.  Well, I think that if, further, the log-likelihood functions associated with different potential observations all have more or less the same second derivatives — in particular, if it is very uncommon that an observation would have a second derivative that was a large (positive or negative) multiple of its average value at that point in H and points somewhat near that point in H — then there shouldn't be much of an overfitting problem; the amount worse that a point tends to fit when it's not in the sample than when it's in the sample is going to be constrained by those second derivatives.

I don't know whether this goes anywhere, but if I can find a reasonable way of looking at an ℋ and constructing a reasonably rich subset that satisfies the second-derivative constraint on a reasonable "support" in the space of potential observations, then that would appeal to me as somewhat less arbitrary than imposing a system of subsets at the beginning.

Insofar as the matching of the second derivatives is exact, this would mean the likelihood functions would only differ from each other or some common standard by functions that are linear in the parameters.  Particularly where ℋ lacks a natural parameterization, but even where it does not, this tempts me to try to use these deviations themselves as a parameterization.  Along manifolds in ℋ that are not robust in this way to overfitting, this parameterization won't work; it might be that this could be put in terms of this parameterization itself, allowing us essentially to carve out a subset of ℋ on the basis of what we can coherently parameterize in terms of the differences between likelihood functions at different potential data points.

* This is probably the usual setup. On some level I'd prefer to work with sets or posterior probability distributions or some such, but I think the ideas are best worked out with "point estimates" anyway.

I don't know whether this will be useful or confusing, but I record here that an element of ℋ can, for the most part, be viewed as a map from the space of potential observations (say X) to the space of log-likelihood functions; this can get confusing insofar as a log-likelihood function is itself a map from measurable subsets of X to ℋ, which is a bit self-referential.