Data: big and small

Everyone is talking about “big data” – the use of massive quantities of information to analyze everything from the weather to the concentration of matter in the universe. For several years, economists have been getting into the big data act, too. One example is nowcasting – using vast arrays of electronic data to assess the current state of the economy ("now”) rather than waiting for official data and its inevitable delays.  In addition, with the help of computer scientists, they have been collecting price data from the web.  Then there is Google Trends, used to track everything from unemployment to flu epidemics – with varying degrees of accuracy! 

It doesn’t take much imagination to think of ways to use more and more data to track economic activity more and more accurately.  Presumably, we could use the billions of noncash payment transactions – credit card, debit card, automated clearing house (ACH), etc. – to track virtually any sort of activity we might want at even a daily frequency.  Would it be worth it? The jury is still out.

Let’s focus on GDP, where revisions to preliminary estimates tend to be quite large.  Unfortunately, it’s questionable whether big data can help us to anticipate these revisions.  There are various reasons for this. Some of the data that we would need – such as measures of income and corporate profits – are only available with a substantial lag. For example, tax information for the economy as a whole is produced with a lag of roughly two years. In addition, because the composition of the economy changes, we would have to wait for the periodic surveys conducted by government statisticians (such as the business census) to weight all this information together properly.

Another difficulty arises from the fact that the largest GDP revisions often occur in components like inventories and trade.  To anticipate these changes, firms would have to provide someone – presumably the government – with their supply chain information.  But that information is likely to be highly proprietary.

The largest revisions also tend to occur at what in retrospect turn out to be economic turning points – that is, the beginning of recessions and recoveries. These cyclical reversals affect investment spending more than consumer spending. For example, the biggest revision on record concerned the fourth quarter of 2008 – just after the Lehman debacle – when by mid-2011 we knew that the economy had plunged by -8.9% (at an annual rate), rather than the initial estimate of a -3.8% (annual rate) drop reported in early 2009. The revision to investment, which constitutes only 15% of GDP, accounted for nearly 90% of the GDP revision.

In theory, we might do better in the area of consumer spending, where credit card information allows us to track at a high frequency the pace and scale of outlays and pay stubs and government transfers offer guides to personal income. However, the collectors of that information also must safeguard the privacy of their clients – both the vendors and the spenders. In practice, there is no evidence that the intermediaries receiving this information produce better economic nowcasts or forecasts.

And yet, if we examine the early GDP estimates themselves, we see that their accuracy has improved over time. For example, assuming that we know the “true” GDP in any period five years later, then we can compare preliminary estimates of growth rates to those that are published five years later.  For example, we can analyze the growth rate in the first quarter of 2009 that was reported during 2009 with the growth rate reported in 2014.  Looking at the data in this way, we see that the scale of the revisions has trended down over the last decade. What we don’t know is whether this trend reflects better measurement, or whether the reduced volatility of the economy from the mid-1980s until 2007 made it easier to measure GDP even with the limited information that is initially available for the preliminary estimate.

Our conclusion: The burden of proof is on the big-data proponents to demonstrate that collecting lots of small pieces of data will give us a faster, more accurate big picture of the economy. We’re not really from Missouri, but you’ll still have to show us.