基于检索增强生成技术的中医药问答大语言模型的构建

Construction of Traditional Chinese Medicine Question-Answering Large Language Model Based on Retrieval-Augmented Generation Technology

  • 摘要:
    目的 构建基于检索增强生成技术的中医药问答大语言模型。
    方法 收集中医古籍《伤寒论》、中医教材、名老中医经方及其他人工标注的中医数据集组建中医药语料库,构建中医药知识向量库;将检索增强生成(RAG)技术结合P-Tuning v2微调方法与大语言模型(ChatGLM2-6B)进行融合构建中医药问答大语言模型。
    结果 以精确率、召回率与F1值为知识问答任务的评价指标进行验证,在简单类中医问答可以达到90%以上的准确率,其中成分类问题的回答准确性最高,F1值达到0.928,中高难度问答准确率在75.8%~87.7%之间,F1值均达到0.766以上;以多样性和准确性为中医问题生成任务的评价指标进行专家打分,研究模型相较于基座模型高出了9.5分。
    结论 研究模型具备良好的语义理解能力和较高的可靠性,有效缓解了模型幻觉并帮助患者明确问题意图,对推进中医药知识的研究以及人性化的交互式回答具有重要意义,为促进中医经验的传承与普及、中医诊疗智能化建设提供了创新方式。

     

    Abstract:
    OBJECTIVE To construct a large language model for TCM question-answering.
    METHODS TCM corpora were built by collecting TCM classics such as Treatise on Cold Damage, TCM textbooks, prescriptions from famous TCM doctors, and other manually annotated TCM datasets. A TCM knowledge vector library was constructed. The RAG technology was fused with the P-Tuning v2 fine-tuning method and the large language model (ChatGLM2-6B) to build the TCM question-answering large language model.
    RESULTS Recision, Recall, and F1 score were used as evaluation metrics for knowledge question-answering tasks. The model achieved over 90% accuracy in simple TCM question-answering, with the highest accuracy in component-type questions, reaching an F1 score of 0.928. The accuracy of medium to high difficulty questions ranged from 75.8% to 87.7%, with F1 scores all exceeding 0.766. Expert ratings based on diversity and accuracy were used as evaluation metrics for TCM question generation tasks, and the model in this paper scored 9.5 points higher than the baseline model.
    CONCLUSION The model in this paper demonstrates good semantic understanding and high reliability, effectively alleviating model hallucinations and helping patients clarify their question intentions. It is of great significance for advancing research on TCM knowledge and providing personalized interactive answers. It also provides an innovative approach to promoting the inheritance and popularization of TCM experience and the intelligent construction of TCM diagnosis and treatment.

     

/

返回文章
返回