Ask the Doctor.. AI Robots Failed to Diagnose 80% of Diseases
Variety

Ask the Doctor.. AI Robots Failed to Diagnose 80% of Diseases

SadaNews - As people around the world increasingly rely on AI-powered chatbots, a new study has revealed that "there's no escaping the doctor." It shows that these chatbots made mistakes in over 80% of medical cases in their early stages.

A new study published yesterday in Jama Network Open highlighted the risks of relying on these robots as digital doctors, showing that they struggle to suggest a range of possible diagnoses when patient data is limited, often narrowing down their options too quickly to arrive at just one answer.

The results also indicated that chatbots can identify likely cases when the condition is fully and clearly specified, but their reliability declines in early or more ambiguous stages.

Risks of Relying on Technology

The results underscored the risks associated with relying solely on technology to identify health problems, particularly in cases where the data entered by users is vague or fragmented.

Aria Rao, the principal author of the study and a researcher at the "Mass General Brigham" health care system based in Massachusetts, stated: these models are excellent for naming the final diagnosis when the data is complete, but they face difficulties at the outset when little information is available, according to the Financial Times.

The study tested AI models using 29 hypothetical clinical cases based on a medical reference. The experiment involved the progressive disclosure of data step by step, including the current medical history, clinical examination results, and laboratory test results. Researchers asked the robot models diagnostic questions and measured failure rates, defined as the percentage of questions that were not answered correctly and completely.

Researchers also evaluated 21 models of chatbot robots, including leading models developed by OpenAI, Anthropic, Google, xAI, and DeepSeek.

However, they found that the diagnostic failure rates exceeded 80% across all models when asked to perform what is known as differential diagnosis, that is, when complete patient information is not available.

Failure rates dropped to below 40% when moving to the final diagnosis with more complete data, where the best models surpassed 90% accuracy.

Anthropic previously confirmed that its "Claude" model is trained to direct individuals posing medical questions to specialists.

For its part, Google explained that its "Gemini" model is designed to do the same, with built-in reminders in the application to encourage users to double-check information.

Furthermore, OpenAI's usage policy states that its services should not be used to provide medical advice requiring professional licensing without involving appropriately qualified specialists.