Saturday, July 25, 2009

smoothing data

One of the more interesting developments this summer in my own set of intellectual tics is that I've become increasingly enthusiastic about using atheoretic time series models as smoothing functions. If I want the "smoothed" value for a function at a particular time, I use the atheoretic model to predict its value a few periods out, and I use that predicted value as the smoothed value; particularly if I'm using a model that will always predict a constant value for the series once it's taken more than a couple periods out — e.g. an ARIMA(0,1,n) model will give the same prediction for n periods from now as for n+1, n+2, etc. — then any change in the "long-run" value represents "innovation", i.e. a surprise; a large rise in unemployment claims that results in very little change in the prediction is mostly not new economic news, but simply an expression of the short-term dynamics that were anticipated from previous data points. A model that does a good job of capturing these short-term dynamics should therefore result in predictions that change much less than the series itself does, and so provides a smoother series than the input.

For longer-term periods of time, there's probably some philosophical value to separating the short-term smoothed data to a prediction of where the data will be later; in particular, a model that did a very good job of predicting the data five years out would not be suitable for "smoothing" if I'm hoping to use the smoothed data to observe the business cycle. Measurement errors aside, each time scale will have fluctuations that are to be viewed as material and shorter-term "noise"; the real purpose of smoothing functions is to eliminate the noise, preserving as much of the "signal" as possible. As long as my projections only go a few periods out, I imagine that's what I'm doing; again, changes in my projection represent changes in inferred "signal", while fully anticipated changes in the data series are identified as being due to noise.

I have, in the past, looked at smoothing functions that require future data points to construct today's value; for example, if I look at data from 2008 and I wish to smooth stock-market prices, my smoothed function might start decreasing substantially in August or September because of the lower values it needs to achieve to match the data in November. If the point is to look at data as it comes in and identify trends early, though, that doesn't work so well; hence my preference for looking at purely backward-looking measures, even when I have forward data sets available to me. There are good economic contexts in which it makes sense to use all available data to try to extract noise from signal and seek dynamics that may not have been ascertainable in real-time; in those situations, atheoretic ARIMA models are probably not your best choice.