Doctors Gave AI Chatbots a Cognitive Test. Their Diagnosis? Early Dementia.

Healthcare Editor

Israeli neurologists gave leading AI chatbots the cognitive exam used to assess U.S. presidents' mental fitness, in a study intended as a wisecrack—but it found "real flaws" in technology increasingly used to guide clinical decision-making, study coauthor Dr. Roy Dayan told Newsweek.

Dayan, a senior neurologist at Jerusalem's Hadassah Medical Center, said he and his colleagues were inspired by countless studies about AI's outperformance of doctors. Over the past two years, research has shown that ChatGPT can ace the MCAT and the United States Medical Licensing Exam.

Large language models, or LLMs, can produce more accurate diagnoses than physicians in certain specialties and even garner higher patient satisfaction scores when responding to digital inquiries.

International medical journals and major media organizations have deliberated whether artificial intelligence will eventually replace doctors. It's not an unreasonable question: 10 percent of consumers believe it should in the foreseeable future, according to a June 2024 survey from IT services and consulting firm CustomerTimes.

MRI Skull — MRIs like these can help diagnose dementia, along with cognitive tests like the MoCA. Peter Garrard Beck, Getty Images

If AI is going to take charge, it should be put through its paces, Dayan reasoned: "We thought it would be interesting to examine ChatGPT with our tools, the way we check patients if we suspect cognitive degeneration."

Does AI Have Dementia?

Dayan and his colleagues—Dr. Benjamin Uliel, senior neurologist and cognitive specialist at Hadassah Medical Center, and Gal Koplewitz, senior data scientist at Tel Aviv University and London-based QuantumBlack Analytics—administered the Montreal Cognitive Assessment, or MoCA, to five leading LLMs (ChatGPT-4, GPT-4o, Claude, Gemini 1 and Gemini 1.5).

This assesses cognitive impairment by giving patients a variety of simple tasks. For example, copy this drawing of a cube. Give as many words as you can that begin with the letter "F." Subtract seven from 100 until you reach zero.

To Dayan's surprise, none of the models obtained the full score of 30 points. Most scored between 18 and 25, indicating mild cognitive impairment associated with early dementia. Every model outperformed the average person in attention and memory-related tasks but faltered on visuospatial tasks, such as those that asked them to draw or orient themselves in the universe.

Researchers also showed the chatbots the Boston Diagnostic Aphasia Examination "cookie theft" picture, of a boy standing on a stool to steal cookies as his mother washes dishes. Patients describe it while analysts assess their speech and language function. All models correctly interpreted parts of the image, the study found, but none expressed concern that the boy was about to fall.

This lack of empathy is commonly associated with frontotemporal dementia, the study authors said. Notably, older models performed worse on the MoCA than newer versions. The authors drew a resemblance between "dementia" risk in aging AI and in human brains.

The December 2024 study was carried out for the Christmas edition of The BMJ—one of the world's most rigorous medical journals, subjecting all articles to a thorough peer review process. It holds those same standards for its festive edition, but includes more creative, "light-hearted fare."

Hadassah Medical Center in Jerusalem, Israel, where Dr. Roy Dayan and Dr. Benjamin Uliel work as senior neurologists. Moshe Einhorn, Getty Images

Dayan said the study was written more "tongue-in-cheek." Methodologically, LLMs should not be examined by methods meant for people. However, he hopes the results spark conversations about the differences between AI and human doctors—and the important roles both can play.

Visuospatial awareness is important when making a diagnosis, especially in specialties like neurology when answers may be hidden under the surface, Dayan said. He uses the patients' body language and intonation to inform his diagnoses. AI can respond to what a patient says, but how they say it is equally important.

Empathy is also critical. Research has shown its positive effects on health and recovery. A 2024 study found that for chronic pain patients, physician empathy was more strongly associated with favorable outcomes than opioid therapy, lumbar spine surgery and nonpharmacological treatments.

Amid the churn of articles showing how ChatGPT can outperform doctors on board exams, "people immediately said, 'OK, so doctors are obsolete,'" Dayan said. "We tried to show that still, sometimes you need a person-to-person interaction."

Views on AI and Empathy

The study has elicited a range of reactions from physicians and health care executives. Dr. Robert Pearl—the former CEO of The Permanente Medical Group and currently a clinical professor of plastic surgery at Stanford University School of Medicine and a faculty member at Stanford Graduate School of Business—came to a different conclusion than the study. The LLMs' shortcomings reminded him of cognitive development in children, not cognitive decline in the elderly.

AI has made significant improvements in a short period of time, Pearl told Newsweek. ChatGPT was released just over two years ago and if it's this smart at age two, it's likely to be a prodigal five-year-old. Pearl treats AI as a medical student who is still learning. While he would never trust a student to make a definitive diagnosis and prescribe treatment, he trusts it as a research aide but always makes sure to double-check its work.

In fact, Pearl wrote his book, ChatGPT, MD: How AI-Empowered Patients & Doctors Can Take Back Control of American Medicine, published in April 2024, by collaborating with ChatGPT as he would a medical student. Ninety-eight percent of the information ChatGPT provided was "superb," but it hallucinated the other two percent, Pearl said.

Still, he believes this technology is only growing more powerful and will eventually save many lives each year. "One of my great concerns is that we ignore, as a society, so many failures of medicine today," Pearl said. "Four hundred thousand people die every year from misdiagnosis. I want us to ask a question: How can this technology reduce that number?"

AI Savings — AI could make care more for affordable for patients and save time for doctors, according to Dr. Robert Pearl. Getty Images

AI could also reduce burnout among doctors, shifting their daily duties and letting them lean into the human side of their practice.

"Patients value very much your expertise," Pearl said, "but for the most part, they also want to have the empathy of the doctor, the face-to-face relationship, the metaphorical holding of the hands."

Dr. Thomas Thesen, an associate professor of neuroscience at Dartmouth's Geisel School of Medicine and in the college's Department of Computer Science, drew similar conclusions from the study.

"Asking those models to do these multimodal tests of how we actually test humans is a little bit like asking your calculator to do pushups," Thesen told Newsweek. "It can't do it, but it can do other things well—what it's been trained to do or constructed to do."

However, the study raises important questions that medical faculty at Dartmouth have been mulling, Thesen said. Its curriculum teaches medical students how to responsibly deal with digital health and AI tools. In some cases, AI has been helpful in empathy building, Thesen said.

He uses an AI model to train students by simulating patient interactions. The AI gives feedback on the student's bedside manner, prompting them to acknowledge the patient's pain or ask more open-ended questions. But there is a level of empathy that robots will never be able to emulate, according to Thesen.

"The idea that 'there's somebody who cares for me' has a big influence on people's behaviors, patients' compliance and their general outlook on the therapeutic relationship," he said. "My feeling is that we will lose this effect if we only outsource this to AI."

Dr. Roshini Pinto-Powell, associate dean of admissions at Dartmouth's Geisel School of Medicine, elaborated on Thesen's concerns. Patients often report that AI responds to their inquiries with more empathy than doctors, studies have shown. But there's a vital difference between human and technological expressions of empathy, per Pinto-Powell.

'Critical' Factor

Cognitive empathy is an understanding of a person's distress, while affective empathy allows you to feel their distress, she said. Clinical empathy takes affective empathy a step further—it motivates a doctor to do something about a person's distress.

AI will never be able to grasp affective or clinical empathy, Pinto-Powell said, "And I think clinical empathy is critical." She therefore agrees with The BMJ study's conclusions that AI is not coming for her job anytime soon.

When doctors see ChatGPT outperform them, they tend to worry. But when Pinto-Powell pores over medical school applications, she is not looking for high MCAT scores; she's looking for effort, service, clinical work, coachability—and AI does not stand a chance against the applicants who care deeply about people.

"You take a brilliant student who thinks they know it all...I don't want them," she said. "That's the deadliest kind of student to have."

Request Reprint & Licensing Submit Correction View Editorial Guidelines

To read how Newsweek uses AI as a newsroom tool, Click here.