Copyright STAT

It is human nature to want to know if some exposure — some medicine, treatment, policy intervention — causes some outcome. What are the root drivers of crime? Does homework improve educational performance? Does acetaminophen increase the risk of autism? How much alcohol is “too much,” in terms of cancer risk? Identifying causes helps us make decisions — both as individuals and as a society. Determining causes, however, is often tricky, especially for complex or common exposures. Correlations are easy. But as the saying goes, correlation does not equal causation. Often, there is a factor (known as a “confounder”) that causes both a certain exposure and a certain outcome. For example, crime and ice cream consumption tend to move together, but one is presumably not causing the other. Eating ice cream and crime both tend to increase in the summer, with time off school or heat being the underlying potential common causes. Other times, in what’s known as selection bias, those who are exposed — who experience an intervention or take a treatment — differ systematically from those who do not. Schools that give more homework may also have other policies that improve academics (or academically inclined families may select schools with more homework), so an association between homework and education may reflect that confounding rather than the effect of homework itself. So how do researchers go from correlation to causation? Many researchers have been taught that the “gold standard” for establishing causality is a randomized controlled trial. By randomly assigning groups to either receive the exposure or not, researchers can ensure that the groups are, on average, similar on everything except whether they received the exposure of interest or the comparison condition. If there is a difference in outcomes, then it must be due to the exposure and not to other preexisting differences. But researchers can’t always randomize. There are ethical concerns — researchers should not randomize when it requires withholding an intervention or treatment that has proven benefits or exposing people to a known risk. And there are feasibility concerns — when an exposure is easily accessible it can be hard to find a group that is not already exposed to the treatment. When considering potential links between acetaminophen and autism, it would be difficult at this point in time to randomize pregnant individuals to take or not take acetaminophen. In cases like these, when an experiment is not possible, researchers must implement clever designs to analyze non-randomized data, such as electronic health records, or large-scale studies such as the Nurses Health Study. These designs typically aim to reduce confounding by identifying sources of randomness that exist naturally in the world or by making comparisons that adjust for observed confounders as well as possible. And luckily there are methodological fields, including parts of statistics, economics, epidemiology, and more, focused on how to do such studies well. One set of designs are those that aim to create or identify some induced randomness in the world. Known as “randomized encouragement” or “instrumental variables” designs, they might, for example, randomly select people to be encouraged to increase their fruit and vegetable consumption, such as through coupons or messaging. Or they might aim to identify different policy or practice environments that make it easier or harder for some people to obtain fresh fruits and vegetables, like varying access to grocery stores. Other designs, known as difference-in-differences or comparative interrupted time series, might aim to take advantage of policy or practice changes to compare groups before and after a policy change, like a change in access to medications or treatments. A nice feature of these designs is they can often use publicly available data, such as monthly state-level mortality counts. These studies are strongest when there are also comparison groups that did not experience the policy change, to account for other underlying time trends in the outcome, such as changing diagnostic criteria. Finally, comparison group designs, implemented in many cohort studies such as the Nurses Health Study or the All Of Us cohort, aim to adjust for as many characteristics as possible, to remove confounding due to those observed factors. This includes propensity score methods that can allow for comparison of individuals similar on a wide range of variables, including medical history, family history, and other contextual factors. These designs are strongest when accompanied by assessment of how robust the results are to a potential unobserved confounding variable. A variety of strong randomized and non-randomized designs exist, and each is appropriate in different settings and contexts and for different research questions. We encourage researchers to learn about these designs, to have a broad set of tools in their methodological toolbox. This is also why we might see a variety of study designs used for similar research questions, including sibling-controlled studies using data on cohorts, policy evaluation studies taking advantage of policy changes, and maybe natural experiments that take advantage of some random “shock” to isolate effects. And this is a good thing. For nuanced causal questions for which it is difficult to think of the one ideal randomized trial that would definitively answer the question of interest (and on a reasonable time scale and budget), often the most appropriate approach is to conduct a variety of studies, each with its own strengths and limitations, and then to assess the bulk of the evidence. This evidence synthesis is ideally done through a careful and thoughtful approach, with insights from a range of substantive and methodological experts. Science is often a process of building up evidence over time and across contexts. Knowing “what causes what” is a journey, not a destination. We will learn more answers to our causal questions — but there are not always clear yes/no answers. Determining what might increase the risk of autism — and more broadly what the causes and risk factors of autism are — will take more research across different disciplines and will likely open up more questions. We must learn to be comfortable with that while also striving for learning. Continuing to ask the questions and to ensure they are studied as rigorously as possible will get us closer to a more complete understanding. Cordelia Kwon, M.P.H., is a Ph.D. student in health policy at Harvard University. Elizabeth A. Stuart, Ph.D., is professor and chair in the Department of Biostatistics at the Johns Hopkins Bloomberg School of Public Health. These are the views of the authors and do not represent the views of their organizations.