August 21, 2012

Policy

The Use (Not Abuse) of Statistics

By: Lynn Scarlett

Economist Thomas Sowell quipped in his book Knowledge and Decisions that information is everywhere, but knowledge is rare. Sowell may have had in mind the welter of statistical data that accompanies so many public policy discussions. No matter what the issue, no matter what the philosophical or political perspective, combatants in policy debates arm themselves with data.

Data constitute information; they do not constitute knowledge. Knowledge results from organizing data and other information into useful sets and situating that information into a broader interpretive context. Statistical analysis is a tool for finding patterns amid informational “noise.” How meaningful those patterns are and whether the purported patterns actually exist depend on the methodological care with which an analyst proceeds. Data can be diced, sliced, merged, correlated, averaged, and extrapolated in endless ways. Not all these efforts are equally illuminating.

Statistical data are seductive. They lend an aura of credibility to what would otherwise appear as simple matters of opinion. But statistics can mislead. Even casual observers of public policy debates need to be wary of the perils that lurk behind numbers. Six perils stand out.

Beware the Timeframe.

Policy analysts often proclaim success or failure of public policies, or point to the emergence over time of new problems. These proclamations require showing how circumstances have changed from one moment in time to another. We see claims that things are getting worse—illegitimate births are increasing, the weather is hotter, students are performing worse than a generation ago. Or, we see claims that some policy has made things worse or better. Comparison of circumstances across time is a legitimate exercise. But not any old timeframe will do. Which timeframe is appropriate depends upon the topic.

Ten years of data may be an acceptable timeframe for assessing whether California’s mandatory class-size reduction has improved student performance. A month, six months, or even a year all present timeframes too short to draw any conclusions about class size and student performance. But a year may be an appropriate timeframe for discerning whether, say, privatizing waste collection systems is increasing or decreasing program costs. Seemingly impressive statistics can quickly become misleading or meaningless if the data cover a timespan unsuited to distinguishing real from spurious patterns.

Beware of Apples and Oranges.

How many times have we heard the “apples and oranges” caveat? Yet would-be pundits often ignore it, rushing to find meaning by comparing unlike situations. We often see this sleight of hand. Student test scores are getting worse, say pundits comparing SAT scores of the 1980s with those of the 1950s. But are we talking about comparable sets of students? A much larger percentage of students take SATs today than 30 years ago. Perhaps only those most likely to do well took the test 30 years ago. If so, mean test scores would be higher than today, even if “average” student performance was no better than today. Students may, on average, be performing worse today than 30 years ago, but reporting aggregate SAT scores, with no adjustment to compare matched groups of students, cannot confirm this conclusion.

Beware of Total Context, or Confounding Variables.

In a world of so much specialization, scientists often “know” a great deal about a narrow set of phenomena. They may be excellent scientists; they may control for many variables that they can conceive might be important. But their scope of knowledge may exclude critical factors. An air chemist may know little about astrophysics; a biologist may know little about nuclear physics; a meteorologist may know nothing about biology. This challenge is especially important when observed correlations among different phenomena are, though statistically significant, still rather weak. What else is going on?

The problem of confounding variables can haunt even fairly simple problems. Does the advent of mandated recycling account for recent reductions in per capita waste disposal? Or are reductions the consequence of ever-improved use of materials that cut down on the amount of stuff used per item consumed? Perhaps introduction of user fees or waste education programs account for the decline. Well done studies attempt to account for different possible explanations of perceived events.

Beware: Correlation Is Not Causation.

Incidence of heat stroke correlates highly with ice cream consumption. The higher the consumption, the more cases of heat stroke. Ice cream consumption also correlates with outdoor temperature—the greater the consumption, the higher the temperature. So, does eating ice cream cause heat stroke or high temperatures? Silly example. We know these correlations of different phenomena do not mean the one caused the other.

Maybe they did; maybe not. Maybe causation is in the opposite direction: high temperatures may cause people to want more ice cream. In fact, these correlations simply cannot tell us anything about causation.

Yet, in less obvious contexts, we see observations of correlation translated into statements of causation all the time. Raising speed limits may be correlated with fewer accidents, but did the higher speeds cause the improved safety? Or was it drunk-driving pullover programs? Better driving habits? Improved roads? Citing a single correlative relationship is insufficient to draw conclusions about cause and effect.

Beware the Relativity Problem.

How many times do we see pundits announce that Town X privatized its trash collection system and saved $750,000, or $1 million, or whatever? Great data? Interesting data? Important data? I don’t know. These pronouncements give me no context. The $750,000 might be a lot; it might represent minimal savings.

Unless I know how much was previously spent on the trash system, I have no idea whether $750,000 is a 1 percent savings, a 10 percent savings, or a 90 percent savings. And unless I have some idea of savings ranges achieved by other practices or for other services, I don’t even know if a 1 percent savings is noteworthy or not.

There is, of course, another “relativity” problem. We repeatedly see announcements that “Study Y found a 3 percent (or 4, 8, or whatever percent) increase in tumors among frogs (for example) exposed to some chemical.” Sounds impressive, but is it a statistically significant finding? Without more information, we don’t know. We need to know the size of the population tested; we need to know how much random variation in tumors occurs in given frog populations.

Beware the Effects of Dynamism.

Policy pundits often warn, “if present trends continue,” some outcome will ensue. Human population will explode to the point of crisis; traffic congestion will snarl city roads, slowing us to perpetual jams in which we move along at five miles per hour. Extrapolating from present trends to future outcomes is tantalizing and often produces specters of future disaster. But, of course, present trends often don’t continue. In modern bumper sticker parlance, “stuff happens.” Birth rates fall; people move, change their driving hours, hop on a train, carpool, telecommunicate—they react and respond, and trends alter. Thus, a challenge for the analyst is to recognize the pitfalls of extrapolation.

Lynn Scarlett is the former Executive Director of the Reason Public Policy Institute. This post is an excerpt from the Institute for Humane Studies “Creating Your Path to a Policy Career” guide.