
AI Chatbots Support Doctors in Complex Treatment Decisions, Study Finds
Artificial intelligence-powered chatbots are not only improving disease diagnosis but are also proving effective in guiding complex treatment decisions, according to new research that highlights their growing role in clinical care.
The findings are significant for healthcare systems globally, including India, where clinicians routinely navigate nuanced decisions beyond diagnosis. These include medication adjustments, surgical timelines, and patient-specific treatment pathways. The research suggests that AI tools, when used alongside physicians, can enhance decision-making in such grey areas of medicine.
The study, published in Nature Medicine, examined how large language model-based chatbots perform in what researchers describe as “clinical management reasoning”. This involves decisions that do not have a single correct answer and depend heavily on physician judgment and patient context.
Led by Jonathan H. Chen, the research team evaluated three groups: a chatbot working independently, 46 doctors supported by a chatbot, and 46 doctors relying only on internet searches and medical references. Participants were given five de-identified patient cases and asked to outline their clinical decisions along with the reasoning behind them.
Each response was assessed against a rubric developed by board-certified physicians. The results showed that the chatbot alone outperformed doctors who relied solely on traditional resources. However, doctors who used chatbot support performed at par with the chatbot, underscoring the value of human-AI collaboration.
An earlier study published in JAMA Network Open had already demonstrated that chatbots could surpass doctors in diagnostic accuracy. The current research shifts focus to post-diagnosis care, an area often shaped by multiple variables such as patient preferences, clinical history, and healthcare system constraints.
To explain this complexity, co-lead author Ethan Goh likened diagnosis to identifying a destination on a map, while treatment decisions resemble choosing the best route to get there. In real-world scenarios, these decisions may involve whether to proceed with invasive procedures, delay interventions, or gather more diagnostic information.
The study highlights that contextual factors, such as a patient’s willingness to undergo procedures or their likelihood of adhering to follow-ups, play a critical role. These are areas where physician judgment remains indispensable.
Chen emphasised that combining human expertise with computational tools yields better outcomes than either working alone. He noted that the findings challenge existing assumptions about how AI should be integrated into healthcare workflows and call for a reassessment of task allocation between humans and machines.
A follow-up study published in Nature Digital Medicine explored this integration further. In a randomised controlled trial involving 70 doctors, researchers examined how the sequence of AI and physician input affects outcomes.
The study found that when AI reviewed cases after doctors had already formed opinions, it tended to align with the physician’s assessment, despite being programmed for independent reasoning. The most effective approach was parallel evaluation, where both doctor and AI assessed cases simultaneously, followed by an AI-generated synthesis of both perspectives. Lead author Selin Everett said the goal was to move from using AI as a tool to treating it as a collaborative clinical partner.
Despite these promising results, the exact mechanism behind improved performance in physician-chatbot partnerships remains unclear. Researchers are exploring whether AI prompts more deliberate thinking among doctors or contributes novel insights that might otherwise be overlooked.
The findings also address a recurring concern about whether AI could replace doctors. While the performance of chatbots is notable, the researchers caution against bypassing medical professionals.
Chen stressed that while AI can provide valuable information, it also carries the risk of inaccuracies. The ability to distinguish credible information from unreliable outputs is becoming increasingly important in clinical practice.
The research involved contributions from institutions including Harvard University, University of Minnesota, University of Virginia, Microsoft, and Kaiser, among others.
The studies indicate that AI chatbots are evolving from diagnostic aids to partners in clinical decision-making. While they are unlikely to replace physicians, their integration into healthcare workflows, particularly through collaborative models, could reshape how complex treatment decisions are made in the years ahead.



