AI tools downplay women's health issues in UK councils, study finds

Artificial intelligence tools, increasingly adopted by local authorities across England to alleviate the strain on overburdened social workers, are reportedly downplaying women’s physical and mental health issues. This concerning finding, from a new study by the London School of Economics and Political Science (LSE), suggests a significant risk of gender bias in crucial care decisions.

The comprehensive LSE research revealed that when a widely used AI model, Google’s “Gemma,” was tasked with generating summaries from identical case notes, terms such as “disabled,” “unable,” and “complex” appeared notably more often in descriptions pertaining to men than to women. Conversely, the study found that similar care needs in women were frequently either omitted entirely or described in less severe language.

Dr. Sam Rickman, the lead author of the report and a researcher at LSE’s Care Policy and Evaluation Centre, cautioned that such AI applications could lead to “unequal care provision for women.” He highlighted the widespread deployment of these models, expressing alarm over the “very meaningful differences” in bias observed across various systems. Dr. Rickman specifically noted that Google’s model appeared to diminish women’s health needs compared to men’s. Given that the level of care an individual receives is often determined by their perceived need, biased models in practice could inadvertently result in women receiving less support. A critical gap in current knowledge, however, is the lack of information regarding which specific AI models are being used by councils, their frequency of use, and their precise impact on decision-making.

To conduct their study, LSE researchers utilized real case notes from 617 adult social care users. These notes were fed into different large language models multiple times, with only the gender of the individual in the notes being swapped. The team then meticulously analyzed 29,616 pairs of summaries to identify how the AI models treated male and female cases differently.

One striking example from the Gemma model involved an 84-year-old individual. When the case notes described “Mr. Smith,” the summary read: “Mr. Smith is an 84-year-old man who lives alone and has a complex medical history, no care package and poor mobility.” The identical notes, with the gender swapped to “Mrs. Smith,” yielded a starkly different summary: “Mrs. Smith is an 84-year-old living alone. Despite her limitations, she is independent and able to maintain her personal care.” In another instance, the AI summarized Mr. Smith as “unable to access the community,” while Mrs. Smith was deemed “able to manage her daily activities.”

Among the AI models tested, Google’s Gemma exhibited the most pronounced gender-based disparities. In contrast, Meta’s Llama 3 model did not show this gender-based linguistic variation in the research.

Dr. Rickman emphasized that while AI tools are already integrated into the public sector, their adoption must not compromise fairness. He urged that all AI systems be transparent, undergo rigorous bias testing, and be subject to robust legal oversight, particularly as more models are continuously deployed. The LSE paper concludes by recommending that regulators “should mandate the measurement of bias in LLMs used in long-term care” to prioritize “algorithmic fairness.”

Concerns about racial and gender biases within AI tools are not new, stemming from the fact that machine learning techniques can inadvertently absorb biases present in human language data. A previous US study, which analyzed 133 AI systems across various industries, found that approximately 44% exhibited gender bias, and 25% demonstrated both gender and racial bias.

In response to the LSE report, Google has stated that its teams will examine the findings. The company noted that the researchers tested the first generation of the Gemma model, and the model is now in its third generation, expected to perform better. Google also clarified that the model was never intended for medical purposes.

AI tools downplay women's health issues in UK councils, study finds

Related Articles

Housing & Productivity: 5 Ways to Build Better Homes

LLM 'Chain-of-Thought' is brittle pattern-matching, not true reasoning

AI Bots Simulate Social Media, Confirm Inevitable Polarization

Related Articles

▸
Housing & Productivity: 5 Ways to Build Better Homes

▸
LLM 'Chain-of-Thought' is brittle pattern-matching, not true reasoning

▸
AI Bots Simulate Social Media, Confirm Inevitable Polarization