A new study highlights how social determinants of health (SDoH)—such as poverty, education, and racism—are often buried in clinical notes, and how this lack of structured documentation may deepen health disparities, especially for minority communities. Researchers from Yale and collaborating institutions used large language models (LLMs) to extract SDoH from unstructured clinical notes across four U.S. healthcare systems.
The study found that social factors like low education, racism, and poverty contribute to hundreds of thousands of deaths annually—numbers comparable to leading causes of disease-related mortality. Yet, these factors are inconsistently recorded in electronic health records. “The lack of specific EHR fields to record SDoH information and the lack of standards for collecting data related to SDoH are some of the major reasons for insufficient SDoH documentation,” the authors noted.
Minority patients are particularly affected. For example, adverse childhood experiences, financial instability, and social isolation—factors disproportionately impacting Black and Hispanic populations—were more frequently documented in mental health settings but often overlooked elsewhere. The study warns that models trained on data from one institution may not generalize well to others, especially when patient populations differ in race, ethnicity, or socioeconomic status.
The researchers stress that models trained on biased data may perpetuate or amplify existing disparities. They call for more diverse datasets and better documentation practices to ensure AI tools support equitable care. “Differences in race, ethnicity, sex, age distributions and other traits across sites could contribute to language biases,” they concluded.
See “Social determinants of health extraction from clinical notes across institutions using large language models” (May 17, 2025)