examine the totality of the evidence and determine which studies, if any, can lend the subject clarity. This conversation begins with a consideration of the difference between good science and bad science.
THE PROBLEM OF CREATING GOOD SCIENCE
A general definition of
science
is the asking and answering of questions through a controlled process of testing, repeating, and confirming results. The scientific method we all learned in grade school covers the basic contours: to know if a thing is true, one must isolate that thing and test it without changing anything else. Change two or more things, and you introduce the possibility that your results might come from any or all of them; no definitive answers can emerge until the universe of possible explanations is winnowed down to a final candidate.
To deal with this problem in human studies, scientists usually create a comparison group alongside the group being tested—the
control group
. This group, as the name suggests, helps scientists control for factors that might be acting on the experiment from elsewhere, offering a simple way to tell if the results are due to the experiment or something else. Without a control group, it is dismayingly easy to produce a “finding” that cannot withstand further scrutiny. Say, for instance, the test group in a drug experiment develops a rash. One might assume that the drug causes the rash. But if the untreated control group develops the same rash, then it is most likely due to something unseen that’s influencing both groups, and not the drug.
Human research tends to cleave into two major “kingdoms”: observational studies and controlled studies. Observational studies
observe
and
compare
groups of people. This research is conducted passively; in other words, without interventions or controls. Any significant differences that emerge between the populations studied—say, finding that people who drink more diet soda tend to have a higher incidence of depression than people who don’t—can’t prove anything but may be used to generate hypotheses about what is causing this difference.
Yet people still assume the obvious when confronted with a correlation of this sort. In the diet soda study, which was actually run by the National Institute of Health and widely reported, many people jumped to the conclusion that depression must be caused by something in the soda. 1 But a moment of creative consideration turns up several other plausible possibilities. What if the people who drink diet soda are simply more judgmental about their body appearance and generally more prone to self-criticism? What if, since drinking more diet soda correlates with a history of being overweight, the depression arises physiologically from the effects of obesity, or as a result of the cluster of health problems that go along with it, such as obstructive sleep apnea and diabetes? What if people who are depressed simply crave sweet things, as evidence suggests? And what of the fact that diet soda drinkers tend to cluster more in urban areas: is there something about this environment that promotes depression?
Strong correlation is tantalizing, a just-so homily that satisfies our need for simple explanations. It feels definitive and self-apparent, especially given the huge number of subjects typically involved in such studies. The NIH study that produced the diet soda finding, for instance, had 260,000 subjects. Headlines are driven and public health advice administered whenever a major observational study unearths a provocative new correlation. But it turns out that the record of observational studies like these for generating accurate medical advice is, in a word, abysmal. Award-winning science journalist Gary Taubes described the issue in the
New York Times Magazine
:
Stephen Pauker, a professor of medicine at Tufts University and a pioneer in the field of clinical decision making, says, “Epidemiologic studies, like diagnostic tests, are probabilistic