中国神经再生研究(英文版) ›› 2025, Vol. 20 ›› Issue (1): 234-241.doi: 10.4103/1673-5374.393103

• 原著:脑损伤修复保护与再生 • 上一篇    下一篇

基于语言和运动多模态深度学习模型早期识别脑卒中

  



  • 出版日期:2025-01-15 发布日期:1900-01-01
  • 基金资助:

    国家重点研发计划项目(2020AAA0109605),梅州市重大科技创新平台,广东省科技计划项目(2019A0102005

Early identification of stroke through deep learning with multi-modal human speech and movement data

Zijun Ou1, #, Haitao Wang1, #, Bin Zhang2, #, Haobang Liang1, Bei Hu3, Longlong Ren3, Yanjuan Liu3, Yuhu Zhang2, Chengbo Dai2, Hejun Wu1, *, Weifeng Li3, *, Xin Li3, *   

  1. 1School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, Guangdong Province, China; 2Department of Neurology, Guangdong Neuroscience Institute, Guangdong Provincial People’s Hospital (Guangdong Academy of Medical Sciences), Southern Medical University, Guangzhou, Guangdong Province, China; 3Department of Emergency Medicine, Guangdong Provincial People’s Hospital (Guangdong Academy of Medical Sciences), Southern Medical University, Guangzhou, Guangdong Province, China
  • Online:2025-01-15 Published:1900-01-01
  • Contact: Hejun Wu, PhD, wuhejun@mail.sysu.edu.cn; Weifeng Li, MD, liweifeng2736@gdph.org.cn; Xin Li, MD, sylixin@scut.edu.cn.
  • Supported by:
    This study was supported by the Ministry of Science and Technology of China, No. 2020AAA0109605 (to XL) and Meizhou Major Scientific and Technological Innovation Platforms and Projects of Guangdong Provincial Science & Technology Plan Projects, No. 2019A0102005 (to HW).

摘要:

早期识别和治疗脑卒中可显著改善患者的预后和生活质量。由于院前检查中,急救人员常使用一些简单的工具如辛辛那提院前脑卒中评估量表和面部、手臂、言语、时间评估量表进行初步评估,但这些方法可能无法发现轻微的或不典型的运动或言语障碍症状,因此需要更为精确和敏感的脑卒中识别方法。此次试验中建立了一种先进的多模态深度学习模型,结合了面部、肢体动作及语音特征分析,同时引入了动作特征对比学习,以评估急救医疗服务中表现出四肢无力、面部轻瘫和言语障碍等症状的疑似脑卒中患者。试验收集了一个数据集,包括急诊室患者指定肢体运动、面部表情和语音测试的视频和音频记录。基于这个数据集,将构建的模型与选择了I3D, SlowFast, X3D, TPN, TimeSformer, MViT六种当前流行的动作特征分析网络进行比较,结果显示,此次实验构建的模型的预测有效性高于其他模型,且此外,多模态模型优于单模态模型,凸显了利用患者的多种动作和言语信息特征的优势。上述结果表明,采用多模态深度学习模型结合面部和手臂运动分析可显著提升脑卒中早期识别的准确性和灵敏度,这为脑卒中急救医疗服务提供了一种实用且有力的工具。

https://orcid.org/0000-0001-9758-5698 (Hejun Wu); https://orcid.org/0009-0009-3401-0163 (Weifeng Li); https://orcid.org/0000-0003-0469-5121 (Xin Li)

关键词: 脑卒中, 快速, 深度学习, 早期检测, 人工智能, 诊断, 筛查

Abstract: Early identification and treatment of stroke can greatly improve patient outcomes and quality of life. Although clinical tests such as the Cincinnati Pre-hospital Stroke Scale (CPSS) and the Face Arm Speech Test (FAST) are commonly used for stroke screening, accurate administration is dependent on specialized training. In this study, we proposed a novel multimodal deep learning approach, based on the FAST, for assessing suspected stroke patients exhibiting symptoms such as limb weakness, facial paresis, and speech disorders in acute settings. We collected a dataset comprising videos and audio recordings of emergency room patients performing designated limb movements, facial expressions, and speech tests based on the FAST. We compared the constructed deep learning model, which was designed to process multi-modal datasets, with six prior models that achieved good action classification performance, including the I3D, SlowFast, X3D, TPN, TimeSformer, and MViT. We found that the findings of our deep learning model had a higher clinical value compared with the other approaches. Moreover, the multi-modal model outperformed its single-module variants, highlighting the benefit of utilizing multiple types of patient data, such as action videos and speech audio. These results indicate that a multi-modal deep learning model combined with the FAST could greatly improve the accuracy and sensitivity of early stroke identification of stroke, thus providing a practical and powerful tool for assessing stroke patients in an emergency clinical setting.

Key words: artificial intelligence, deep learning, diagnosis, early detection, FAST, screening, stroke