Artificial intelligence beats doctors in accurately assessing eye problems
在眼部疾病患者的预检分诊中,我们可以实际利用人工智能来判断紧急病例。
—Arun Thirunavukarasu
We could realistically deploy AI in triaging patients with eye issues to decide which cases are emergencies.
—Arun Thirunavukarasu
一项研究发现,人工智能模型 GPT-4 在评估眼部问题并提供治疗建议方面显著胜于非专科医生。
A study has found that the AI model GPT-4 significantly exceeds the ability of non-specialist doctors to assess eye problems and provide advice.
剑桥大学的一项研究发现,GPT-4 的临床知识和推理论证能力越来越接近于眼科专科医生。
The clinical knowledge and reasoning skills of GPT-4 are approaching the level of specialist eye doctors, a study led by the University of Cambridge has found.
在研究中,“大型语言模型”GPT-4与不同级别的医生进行了对比,包括非专科初级医生、实习医生和眼科专家医生。所有实验对象需处理一系列涉及特定眼部问题的87个病例情景,按要求从四个选项中选择诊断结果或治疗建议。
GPT-4 - a ‘large language model’ - was tested against doctors at different stages in their careers, including unspecialised junior doctors, and trainee and expert eye doctors. Each was presented with a series of 87 patient scenarios involving a specific eye problem, and asked to give a diagnosis or advise on treatment by selecting from four options.
在实验中,GPT-4的得分显著高于非专科初级医生(其眼科专业知识储备量与全科医生相当)。
GPT-4 scored significantly better in the test than unspecialised junior doctors, who are comparable to general practitioners in their level of specialist eye knowledge.
GPT-4与实习医生和眼科专家医生的得分相近,尽管顶尖医生的得分更高。
GPT-4 gained similar scores to trainee and expert eye doctors - although the top performing doctors scored higher.
研究人员表示,大型语言模型不太可能取代专业的医疗人员,但其有潜力作为临床工作的一环改善医疗保健。
The researchers say that large language models aren’t likely to replace healthcare professionals, but have the potential to improve healthcare as part of the clinical workflow.
研究人员还表示,像GPT-4这样最前沿的大型语言模型在有效管控的情况下可能有助于提供眼科问题相关的建议、诊断和管理意见。例如对患者进行分诊,或者为难以获得专业医疗人员诊治的人群提供诊治服务。
They say state-of-the-art large language models like GPT-4 could be useful for providing eye-related advice, diagnosis, and management suggestions in well-controlled contexts, like triaging patients, or where access to specialist healthcare professionals is limited.
该研究的主要作者Arun Thirunavukarasu博士说道,“我们可以在实际诊疗中应用人工智能来对眼病患者进行分诊,让人工智能来判断哪些病例是需要立即由专科医生看诊的紧急情况,哪些可以由全科医生处理,以及哪些不需要治疗。”这是Thirunavukarasu博士在剑桥大学临床医学院就读时开展的研究。
“We could realistically deploy AI in triaging patients with eye issues to decide which cases are emergencies that need to be seen by a specialist immediately, which can be seen by a GP, and which don’t need treatment,” said Dr Arun Thirunavukarasu, lead author of the study, which he carried out while a student at the University of Cambridge’s School of Clinical Medicine.
他补充道:“这些模型可以按照已投入使用的算法进行准确运算。我们发现,GPT-4 在诊断眼部症状和体征以回答更为复杂的问题方面的能力与专业临床医生一样出色。”
He added: “The models could follow clear algorithms already in use, and we’ve found that GPT-4 is as good as expert clinicians at processing eye symptoms and signs to answer more complicated questions.
“随着进一步发展,大型语言模型还可以为那些难以及时从眼科医生那里获得及时建议的全科医生提供指导。在英国,人们等待眼科看诊的时间比以往任何时候都要久。”
“With further development, large language models could also advise GPs who are struggling to get prompt advice from eye doctors. People in the UK are waiting longer than ever for eye care.
需要大量的临床记录帮助微调和开发这些模型,世界各地正在开展工作以促进这一目标。
Large volumes of clinical text are needed to help fine-tune and develop these models, and work is ongoing around the world to facilitate this.
研究人员表示,他们的研究优于以往类似的研究,因为他们将人工智能的能力与执业医生进行比较,而不是与各组检查结果进行对比。
The researchers say that their study is superior to similar, previous studies because they compared the abilities of AI to practicing doctors, rather than to sets of examination results.
“医生并非整个职业生涯都在为考试复习。我们希望看到人工智能在与执业医生进行真实医疗场景下的知识和能力比拼时表现如何,由此做出公正的评判。” Thirunavukarasu如此说道,他现任国民健康服务基金信托牛津大学医院的学术初级医生。
“Doctors aren't revising for exams for their whole career. We wanted to see how AI fared when pitted against to the on-the-spot knowledge and abilities of practicing doctors, to provide a fair comparison,” said Thirunavukarasu, who is now an Academic Foundation Doctor at Oxford University Hospitals NHS Foundation Trust.
他补充道:“我们还需要对商用模型的能力和局限性进行画像,因为患者可能已经在使用这些商用模型寻求医疗建议,而非网上问诊。”
He added: “We also need to characterise the capabilities and limitations of commercially available models, as patients may already be using them - rather than the internet - for advice.”
该实验囊括了一系列眼部问题,包括极度光敏、视力下降、病变、眼睛瘙痒和疼痛等,这些问题取自用于测试实习眼科医生的教科书。这本教科书目前在互联网上无免费渠道,因此GPT-4的训练数据集不太可能包含书中的内容。
The test included questions about a huge range of eye problems, including extreme light sensitivity, decreased vision, lesions, itchy and painful eyes, taken from a textbook used to test trainee eye doctors. This textbook is not freely available on the internet, making it unlikely that its content was included in GPT-4’s training datasets.
研究结果今日已发表于《PLOS数字健康》期刊。
The results are published today in the journal PLOS Digital Health.
Thirunavukarasu表示:“即便是在未来,我认为人工智能的应用也不会取代医生对患者进行护理。最重要的一点是,我们要给予患者决定是否让计算机系统参与诊疗的权力。那将是每位患者个人作出的决定。”
“Even taking the future use of AI into account, I think doctors will continue to be in charge of patient care. The most important thing is to empower patients to decide whether they want computer systems to be involved or not. That will be an individual decision for each patient to make,” said Thirunavukarasu.
GPT-4 和 GPT-3.5,即“生成式预训练转换模型”,是在包含数千亿单词的数据集中进行训练的,这些单词来自文章、书籍和其他网络资源。GPT-4 和 GPT-3.5就是两个大型语言模型示例;其他广泛使用的模型包括路径语言模型2(PaLM 2)和Meta大型语言模型2 (LLaMA 2)。
GPT-4 and GPT-3.5 – or ‘Generative Pre-trained Transformers’ - are trained on datasets containing hundreds of billions of words from articles, books, and other internet sources. These are two examples of large language models; others in wide use include Pathways Language Model 2 (PaLM 2) and Large Language Model Meta AI 2 (LLaMA 2).
该研究还使用同一测试题集对 GPT-3.5、PaLM2 和 LLaMA 进行了测试。GPT-4 提供的答案比上述模型更准确。
The study also tested GPT-3.5, PaLM2, and LLaMA with the same set of questions. GPT-4 gave more accurate responses than all of them.
GPT-4支持在线聊天机器人ChatGPT为人类询问提供定制相应。最近几个月以来,ChatGPT在医学界引起了重大关注,因为其在医学院考试中获得了及格等级,并且在处理患者询问方面提供了比人类医生更为准确且更有同理心的解答。
GPT-4 powers the online chatbot ChatGPT to provide bespoke responses to human queries. In recent months, ChatGPT has attracted significant attention in medicine for attaining passing level performance in medical school examinations, and providing more accurate and empathetic messages than human doctors in response to patient queries.
人工智能大型语言模型领域发展迅速。自该研究进行以来,更为先进的模型已发布,这些模型可能更接近专业眼科医生的水平。
The field of artificially intelligent large language models is moving very rapidly. Since the study was conducted, more advanced models have been released - which may be even closer to the level of expert eye doctors.