Ask the Doctor.. AI Robots Failed to Diagnose 80% of Diseases
SadaNews - As people around the world increasingly rely on AI-powered chatbots, a new study has revealed that "there's no escaping the doctor." It shows that these chatbots made mistakes in over 80% of medical cases in their early stages.
A new study published yesterday in Jama Network Open highlighted the risks of relying on these robots as digital doctors, showing that they struggle to suggest a range of possible diagnoses when patient data is limited, often narrowing down their options too quickly to arrive at just one answer.
The results also indicated that chatbots can identify likely cases when the condition is fully and clearly specified, but their reliability declines in early or more ambiguous stages.
Risks of Relying on Technology
The results underscored the risks associated with relying solely on technology to identify health problems, particularly in cases where the data entered by users is vague or fragmented.
Aria Rao, the principal author of the study and a researcher at the "Mass General Brigham" health care system based in Massachusetts, stated: these models are excellent for naming the final diagnosis when the data is complete, but they face difficulties at the outset when little information is available, according to the Financial Times.
The study tested AI models using 29 hypothetical clinical cases based on a medical reference. The experiment involved the progressive disclosure of data step by step, including the current medical history, clinical examination results, and laboratory test results. Researchers asked the robot models diagnostic questions and measured failure rates, defined as the percentage of questions that were not answered correctly and completely.
Researchers also evaluated 21 models of chatbot robots, including leading models developed by OpenAI, Anthropic, Google, xAI, and DeepSeek.
However, they found that the diagnostic failure rates exceeded 80% across all models when asked to perform what is known as differential diagnosis, that is, when complete patient information is not available.
Failure rates dropped to below 40% when moving to the final diagnosis with more complete data, where the best models surpassed 90% accuracy.
Anthropic previously confirmed that its "Claude" model is trained to direct individuals posing medical questions to specialists.
For its part, Google explained that its "Gemini" model is designed to do the same, with built-in reminders in the application to encourage users to double-check information.
Furthermore, OpenAI's usage policy states that its services should not be used to provide medical advice requiring professional licensing without involving appropriately qualified specialists.
NVIDIA's RTX Spark Platform Ignites Laptop Prices
Disheartening Leak About iPhone 18 Pro Battery Specifications
Intisar Shneeb is the First Woman to Lead a Sports Club in the History of Libya
Is your child exaggerating silly jokes? What lies beyond this stage may surprise you
Art and Words: A Cultural Evening in Taiz that Opens Doors to Memory and Colors
Has Sherine Really Regained Her Spark?.. The Controversy Over "Bahriya" Raises the Questio...
A Message Before Her Sudden Departure: What Happened in the Last Hours of Siham Jalal's Li...