Data Disparities and Heterogeneous Treatment Effects

(kind of technical)

I recently finished Invisible Women by Caroline Criado Perez. This is an amazingly researched and ambitious book that documents disparities in outcomes between men and women in transportation access, leisure, workplace accessibility, venture financing, product selection, health, safety, political access, disaster relief, and many other facets of life. Criado Perez attributes such discrepancies in part to “male-unless-otherwise-indicated” modes of research and thinking that fail to account for women, despite them constituting roughly half of humans.

A particularly egregious example of this is the large share of drug trials conducted disproportionately or entirely on male animals and human subjects, even for certain diseases that are more female-prevalent. Further, when both sexes are included in studies, results are often not sex-disaggregated, which prevents researchers from measuring heterogeneous effects for males and females.¹

Such discrepancies can cause both false positives and false negatives for women—drugs on the market may not be as effective in women as suggested by trials, and many potential drugs that would have worked for women may be discarded because they do not work on men. Women consequently have 50% more adverse drug reactions than men, including high incidence of drugs not working at all.²

Social science research is not exempt from such disparities. I have noticed papers, particularly older papers in the field of labor economics, that conduct analysis exclusively on men. Papers that include both men and women often do not sex-disaggregate results, despite it being fairly low effort in many cases.

There are statistical justifications that may be made for such practices, but I would argue they are misguided. Women and men had different labor market trajectories over the latter half of the twentieth century as women’s labor force participation increased. Suffice to say this should not justify looking only at male outcomes without some very strong a priori justification for doing so (such as the study measuring an intervention that can only plausibly have effects on men–likely not a common scenario).

Not reporting sex-disaggregated data is a trickier case. Pooling data often increases statistical power and can lead to more precise estimates. Likewise, sex-disaggregating the data reduces sample sizes and can lead to more variable results that fail to replicate in other studies. A cash-strapped researcher running an RCT may not have the resources to recruit a large enough sample size for reliable estimation of sex-disaggregated results.³

However, the consequence of not reporting such estimates at all can stymie scientific progress. Accumulated underpowered sex-disaggregated results can be analyzed in meta-analyses and literature reviews, which are more appropriate ways of evaluating parameter estimates than relying on individual studies anyway. As long as journal publication outcomes do not select on such underpowered estimates (specifically, underpowered estimates should be interpreted in the text with appropriate caution, should often belong in appendices, and should not drive journal acceptances), such practices should improve the stock of scientific knowledge and help reduce data disparities.

Of course, disaggregating data and reporting heterogeneous treatment effects matters for more than just gender differences. Such considerations also apply for other dimensions–including but not limited to race, income, and geography–along which our priors should suggest plausibly different effects of policies.⁴ Even considering heterogeneous treatment effects across all these variables may still mask policy-relevant heterogeneity; a nice method for measuring heterogeneous effects in a data-driven way is to use causal forests.⁵

Such research practices may be a drop in the bucket relative to many of the policy issues raised in the book that disadvantage women, but they are a pretty low cost way for social scientists to help address disparities.

In recent years regulations in the United States and elsewhere have compelled federally funded clinical trials to include women. However, independent drug manufacturers are exempt, and legacy drugs that predate recent regulations still often lack clinical trial results for women. Criado Perez also references other loopholes that lead to continued disparities.
There is an important economic question about how firms can reasonably produce products that fail to adequately cater to half of the population if industries are at all competitive. Having just read Douglass North’s Institutions, Institutional Change & Economic Performance, one explanation I can think of is the high persistence of inefficient formal and informal institutions because of network effects and coordination failures that prevent organizations from achieving superior equilibria. However, this question is beyond my expertise, and Criado Perez offers some alternatives, such as a paucity of existing basic research on women’s health issues that in turn stymies pharmaceutical investment.
The statistical penalty of reporting sex-disaggregated results is both that smaller sample sizes in subpopulations increase standard errors and also that reporting treatment effects in subpopulations requires multiple hypothesis testing corrections.
Relatedly, a reasonable criticism of Invisible Women raised by a PhD student friend of mine is that it could have spent more time considering heterogeneity across women along dimensions of race, class, geography, and gender identity.
See also this paper for an application.