A new study published in npj Digital Medicine warns that large language models (LLMs), like GPT-4, may inadvertently reinforce healthcare inequities when non-decisive socio-demographic factors such as race, sex, and income are included in clinical inputs.
Researchers introduced EquityGuard, a contrastive learning framework that detects and mitigates bias in medical applications such as Clinical Trial Matching (CTM) and Medical Question Answering (MQA).
Evaluations show GPT-4 demonstrates greater fairness across diverse groups, while other models like Gemini and Claude show notable disparities. EquityGuard improves equity in outputs and is particularly promising for use in low-resource settings where fairness is most critical.