As the United States grapples with the national reckoning over race prompted by the killing of George Floyd, it has become increasingly clear that even a pandemic does not strike equally. Nationwide, Black people have been 3.7 times as likely as white people to die of COVID-19, taking age into account; in some states, Black people have died of COVID-19 at age-adjusted rates five to nine times higher than those of white people. Against this backdrop, the importance of studying racial disparities in social, economic, and public health outcomes has rarely been clearer. But researchers, and the journalists who report on their findings, should exercise caution in trying to uncover the sources of these stark discrepancies. Studying race, and in particular the relationship between race and social outcomes like health or police violence, comes with both statistical and conceptual challenges, which make understanding exactly why Black people are dying from COVID-19 at higher rates harder than it might seem. Perhaps the biggest issue arises out of what statisticians call “post-treatment bias.” Because racial identity is assigned at birth, it affects a wide range of other aspects of people’s lives—where someone lives, how they’re educated, the sorts of opportunities they have, and how much money they earn. To understand the effect of race on a certain outcome—say, police violence, or the likelihood of death among COVID-19 patients—scholars will often control for factors like education, income, health status or occupation. But all these variables are “post-treatment,” or downstream, of race, in the sense that race itself can shape how a person is raised, educated and employed. Controlling for these variables can distort any results that scholars may find. Consider the following analogy: if researchers set out to investigate whether smoking leads to death, but controlled for whether someone gets lung cancer, they might find that smoking doesn’t increase mortality—because they’ve effectively removed an important pathway by which smoking influences health. What’s more, by controlling for lung cancer, they’re now comparing the life spans of smokers who don’t get lung cancer, who are likely to be unusually healthy, to nonsmokers without lung cancer (and comparing nonsmokers who get lung cancer, a highly unusual group, to smokers who get lung cancer). In the context of race, controlling for almost any socioeconomic or health variable—as most studies on ethnic disparities in COVID-19 deaths do—can create serious biases in an analysis, calling many empirical results into question. Similar issues abound in the study of race and policing. Take the recent debate over whether there is evidence of racism in American policing. Roland Fryer, an economist at Harvard, found that police shoot white, Black and Hispanic Americans whom they’ve stopped at equal rates. However, as political scientists Dean Knox, Will Lowe, and Jonathan Mummolo point out, if there is initial discrimination in who gets stopped in the first place, estimating racial disparities in how people are treated once they’ve been stopped becomes much more complicated—especially since police officers are more likely to stop Black and Hispanic people than white people, and more of those stops are unjustified. If Black people are stopped by police for lesser (or nonexistent) offenses, “equal treatment” in terms of the use of force would actually indicate deeply unequal policing overall. Being stopped by the police is “post-treatment” to race, and failing to account for this bias can lead to erroneous conclusions that may mask the extent of racism in American institutions. A second challenge, as the political scientists Maya Sen and Omar Wasow point out, comes from the instability of racial labels. As one study concluded, “No two measures of race will capture the same information.” In one 19-year survey of thousands of Americans, a full 20 percent of the sample changed either how they were racially classified by others or how they identified themselves. Survey respondents even changed identification in response to life events: incarceration, unemployment or having an income below the poverty line made respondents more likely to identify as Black, while people who get married are more likely to be seen as and identify as white. Between the 2000 and 2010 censuses, nearly 10 million respondents altered their self-identified race or response regarding Hispanic origin; only 41 percent of Hispanics identified their race and ethnic origin the same way in both censuses. Another study found that homicide victims were more likely to be classified as Black on their death certificates, while people who died of cirrhosis of the liver were more likely to be classified as Native American—even accounting for the race of the victim as given by their next of kin. Throughout American history, racial boundaries have shifted considerably. Chinese Americans in the Mississippi Delta were once classified as almost Black, while a 1974 U.S. federal committee on racial and ethnic definitions struggled with how to categorize people of South Asian origin: it initially recommended they be labeled white/Caucasian before classifying them as Asian or Pacific Islanders. And there are big differences in racial definitions across countries: in the United States, thanks to the so-called one-drop rule, a person with any Black heritage has historically been categorized as Black; in Brazil, an individual is not “Black” if he or she has any European ancestry. In other words, racial identities are largely not biologically determined, but are instead the product of social forces. Yet as sociologist Ann Morning has documented, the biological view of race still dominates in biology textbooks and among biology undergraduates and biology professors in the U.S.. And when quantitative social scientists study race, they often just include a binary variable for Black or white in an equation. When this sort of research finds racial disparities in outcomes like COVID-19 mortality rates or police killings, it often raises more questions than it provides answers: it doesn’t explain why or how race affects life outcomes, nor does it shed much light on potential policy interventions that could help. But there are better ways to study race. In a 2016 paper, Sen and Wasow propose that researchers should think about race not as an essential set of unchangeable biological characteristics but rather as what they call “a bundle of sticks” that includes factors like skin color, dialect, neighborhood, genes, class, names, and region of ancestry. While “race” itself cannot be manipulated in a study, many of these traits, which are closely linked with what we mean by race, can be. By focusing only on one stick in the bundle at a time—rather than on the combination of ancestry, neighborhood, socioeconomic status, skin color, names, and the like that would be conveyed by a simple “race” variable—researchers can attempt to isolate exactly which factors lie behind the racial disparities we observe. For example, researchers can manipulate the name on a resume to study how the perceived race of a job applicant affects their likelihood of getting hired. The late Harvard sociologist Devah Pager, for example, found that employers responded as much to an applicant with a stereotypically white name and a criminal record as they did to a Black applicant without one. Or researchers can try to isolate the role of perceived skin color, another stick in the bundle, in racial disparities in policing. In one recent paper in Nature Human Behaviour, which examines police stops around 7 P.M., researchers found that Black drivers were more likely to be stopped when it is sunnier; after sunset, a “veil of darkness” protects Black drivers from being racially profiled. Again, by focusing on one factor in the bundle of sticks—perceived skin color, in this case—the researchers isolated other factors that could drive disparities in traffic stops, such as socioeconomic and neighborhood characteristics. Another way to avoid issues of post-treatment bias and the instability of racial labels is to focus on differences within a racial group, rather than attempting comparisons between them, to see which mechanisms might be driving differences in outcomes. To study educational disparities between Blacks and whites, for example, instead of simply conducting an analysis of educational outcomes across the population at large and controlling for race and socioeconomic factors—a common approach that is both contaminated by post-treatment bias and conceptually unclear—researchers could try to isolate within-group variation in one of the bundle of sticks plausibly related to education. Take the Moving to Opportunity experiment, for example, which involved the random assignment of housing vouchers. Researchers have analyzed this experiment to compare the academic performance of Black youngsters from high-poverty neighborhoods to similarly situated Black youngsters in moderate-poverty neighborhoods, finding that neighborhoods can substantially influence socioeconomic outcomes. Some of the pitfalls of across-group analysis can be seen in Charles Murray’s notorious and deeply flawed book The Bell Curve, which asserted that white Americans possessed a genetic advantage in cognitive ability over Black Americans. Murray simply compared Black and white differences in IQ tests and concluded that these differences must be, at least in part, genetic. This analysis views race almost exclusively as an essential biological category, fails to address the complex mix of social factors underpinning racial identity, and is vulnerable to all the statistical biases discussed above. A more sophisticated research design examined Black individuals who varied in levels of European ancestry but shared a similar social environment and identity, and found no relationship between genes associated with white ancestry and cognitive ability. “By identifying meaningful within-group differences,” Sen and Wasow commented, “scholars can narrow the causal mechanisms that explain disparate across-race outcomes.” In the context of COVID-19, researchers should avoid drawing too many conclusions about the underlying causes of racial disparities from studies that simply include a binary variable for race—whose “effect” is conceptually unclear—and that control for socioeconomic characteristics and health conditions that are likely downstream of race. A more promising approach would involve isolating one possible explanation for this gap—say, differences in neighborhood health institutions, incarceration rates, or particular occupational roles and health conditions—and studying differences within racial groups to gain insight into which factors might be driving racial disparities in COVID-19 mortality. To create a more just society, we must understand the underlying causes of racial disparities in social, economic and public health outcomes. But while studying the extent of disparities between groups—such as the black-white wealth gap or disparities in COVID-19 death rates—is essential, if researchers wish to explain the causes of these disparities they need to make sure they remain sensitive to the statistical, conceptual and historical complexities associated with race. Researchers are likely to make more progress if they approach race as a composite measure of linked characteristics like dialect, ancestry, neighborhood, class and skin color—rather than as a fixed biological category.

Against this backdrop, the importance of studying racial disparities in social, economic, and public health outcomes has rarely been clearer. But researchers, and the journalists who report on their findings, should exercise caution in trying to uncover the sources of these stark discrepancies. Studying race, and in particular the relationship between race and social outcomes like health or police violence, comes with both statistical and conceptual challenges, which make understanding exactly why Black people are dying from COVID-19 at higher rates harder than it might seem.

Perhaps the biggest issue arises out of what statisticians call “post-treatment bias.” Because racial identity is assigned at birth, it affects a wide range of other aspects of people’s lives—where someone lives, how they’re educated, the sorts of opportunities they have, and how much money they earn. To understand the effect of race on a certain outcome—say, police violence, or the likelihood of death among COVID-19 patients—scholars will often control for factors like education, income, health status or occupation. But all these variables are “post-treatment,” or downstream, of race, in the sense that race itself can shape how a person is raised, educated and employed. Controlling for these variables can distort any results that scholars may find.

Consider the following analogy: if researchers set out to investigate whether smoking leads to death, but controlled for whether someone gets lung cancer, they might find that smoking doesn’t increase mortality—because they’ve effectively removed an important pathway by which smoking influences health. What’s more, by controlling for lung cancer, they’re now comparing the life spans of smokers who don’t get lung cancer, who are likely to be unusually healthy, to nonsmokers without lung cancer (and comparing nonsmokers who get lung cancer, a highly unusual group, to smokers who get lung cancer). In the context of race, controlling for almost any socioeconomic or health variable—as most studies on ethnic disparities in COVID-19 deaths do—can create serious biases in an analysis, calling many empirical results into question.

Similar issues abound in the study of race and policing. Take the recent debate over whether there is evidence of racism in American policing. Roland Fryer, an economist at Harvard, found that police shoot white, Black and Hispanic Americans whom they’ve stopped at equal rates. However, as political scientists Dean Knox, Will Lowe, and Jonathan Mummolo point out, if there is initial discrimination in who gets stopped in the first place, estimating racial disparities in how people are treated once they’ve been stopped becomes much more complicated—especially since police officers are more likely to stop Black and Hispanic people than white people, and more of those stops are unjustified. If Black people are stopped by police for lesser (or nonexistent) offenses, “equal treatment” in terms of the use of force would actually indicate deeply unequal policing overall. Being stopped by the police is “post-treatment” to race, and failing to account for this bias can lead to erroneous conclusions that may mask the extent of racism in American institutions.

A second challenge, as the political scientists Maya Sen and Omar Wasow point out, comes from the instability of racial labels. As one study concluded, “No two measures of race will capture the same information.” In one 19-year survey of thousands of Americans, a full 20 percent of the sample changed either how they were racially classified by others or how they identified themselves. Survey respondents even changed identification in response to life events: incarceration, unemployment or having an income below the poverty line made respondents more likely to identify as Black, while people who get married are more likely to be seen as and identify as white.

Between the 2000 and 2010 censuses, nearly 10 million respondents altered their self-identified race or response regarding Hispanic origin; only 41 percent of Hispanics identified their race and ethnic origin the same way in both censuses. Another study found that homicide victims were more likely to be classified as Black on their death certificates, while people who died of cirrhosis of the liver were more likely to be classified as Native American—even accounting for the race of the victim as given by their next of kin.

Throughout American history, racial boundaries have shifted considerably. Chinese Americans in the Mississippi Delta were once classified as almost Black, while a 1974 U.S. federal committee on racial and ethnic definitions struggled with how to categorize people of South Asian origin: it initially recommended they be labeled white/Caucasian before classifying them as Asian or Pacific Islanders.

And there are big differences in racial definitions across countries: in the United States, thanks to the so-called one-drop rule, a person with any Black heritage has historically been categorized as Black; in Brazil, an individual is not “Black” if he or she has any European ancestry.

In other words, racial identities are largely not biologically determined, but are instead the product of social forces. Yet as sociologist Ann Morning has documented, the biological view of race still dominates in biology textbooks and among biology undergraduates and biology professors in the U.S.. And when quantitative social scientists study race, they often just include a binary variable for Black or white in an equation. When this sort of research finds racial disparities in outcomes like COVID-19 mortality rates or police killings, it often raises more questions than it provides answers: it doesn’t explain why or how race affects life outcomes, nor does it shed much light on potential policy interventions that could help.

But there are better ways to study race. In a 2016 paper, Sen and Wasow propose that researchers should think about race not as an essential set of unchangeable biological characteristics but rather as what they call “a bundle of sticks” that includes factors like skin color, dialect, neighborhood, genes, class, names, and region of ancestry. While “race” itself cannot be manipulated in a study, many of these traits, which are closely linked with what we mean by race, can be. By focusing only on one stick in the bundle at a time—rather than on the combination of ancestry, neighborhood, socioeconomic status, skin color, names, and the like that would be conveyed by a simple “race” variable—researchers can attempt to isolate exactly which factors lie behind the racial disparities we observe.

For example, researchers can manipulate the name on a resume to study how the perceived race of a job applicant affects their likelihood of getting hired. The late Harvard sociologist Devah Pager, for example, found that employers responded as much to an applicant with a stereotypically white name and a criminal record as they did to a Black applicant without one. Or researchers can try to isolate the role of perceived skin color, another stick in the bundle, in racial disparities in policing. In one recent paper in Nature Human Behaviour, which examines police stops around 7 P.M., researchers found that Black drivers were more likely to be stopped when it is sunnier; after sunset, a “veil of darkness” protects Black drivers from being racially profiled. Again, by focusing on one factor in the bundle of sticks—perceived skin color, in this case—the researchers isolated other factors that could drive disparities in traffic stops, such as socioeconomic and neighborhood characteristics.

Another way to avoid issues of post-treatment bias and the instability of racial labels is to focus on differences within a racial group, rather than attempting comparisons between them, to see which mechanisms might be driving differences in outcomes. To study educational disparities between Blacks and whites, for example, instead of simply conducting an analysis of educational outcomes across the population at large and controlling for race and socioeconomic factors—a common approach that is both contaminated by post-treatment bias and conceptually unclear—researchers could try to isolate within-group variation in one of the bundle of sticks plausibly related to education.

Take the Moving to Opportunity experiment, for example, which involved the random assignment of housing vouchers. Researchers have analyzed this experiment to compare the academic performance of Black youngsters from high-poverty neighborhoods to similarly situated Black youngsters in moderate-poverty neighborhoods, finding that neighborhoods can substantially influence socioeconomic outcomes.

Some of the pitfalls of across-group analysis can be seen in Charles Murray’s notorious and deeply flawed book The Bell Curve, which asserted that white Americans possessed a genetic advantage in cognitive ability over Black Americans. Murray simply compared Black and white differences in IQ tests and concluded that these differences must be, at least in part, genetic. This analysis views race almost exclusively as an essential biological category, fails to address the complex mix of social factors underpinning racial identity, and is vulnerable to all the statistical biases discussed above. A more sophisticated research design examined Black individuals who varied in levels of European ancestry but shared a similar social environment and identity, and found no relationship between genes associated with white ancestry and cognitive ability. “By identifying meaningful within-group differences,” Sen and Wasow commented, “scholars can narrow the causal mechanisms that explain disparate across-race outcomes.”

In the context of COVID-19, researchers should avoid drawing too many conclusions about the underlying causes of racial disparities from studies that simply include a binary variable for race—whose “effect” is conceptually unclear—and that control for socioeconomic characteristics and health conditions that are likely downstream of race. A more promising approach would involve isolating one possible explanation for this gap—say, differences in neighborhood health institutions, incarceration rates, or particular occupational roles and health conditions—and studying differences within racial groups to gain insight into which factors might be driving racial disparities in COVID-19 mortality.

To create a more just society, we must understand the underlying causes of racial disparities in social, economic and public health outcomes. But while studying the extent of disparities between groups—such as the black-white wealth gap or disparities in COVID-19 death rates—is essential, if researchers wish to explain the causes of these disparities they need to make sure they remain sensitive to the statistical, conceptual and historical complexities associated with race. Researchers are likely to make more progress if they approach race as a composite measure of linked characteristics like dialect, ancestry, neighborhood, class and skin color—rather than as a fixed biological category.