淋巴瘤医案不同聚类分析方法比较研究

Comparative Study on Different Cluster Analysis Methods of Lymphoma Medical Records

  • 摘要:
      目的  以淋巴瘤临床医案为范例数据, 对不同聚类分析方法挖掘结果进行比较, 从而分析中医医案药物聚类挖掘方法的优化方案与结果差异。
      方法  对淋巴瘤医案进行统一预处理与规范, 运用分散性聚类中的快速聚类、结构性聚类中的层次聚类进行挖掘分析, 并从算法特点、终值偏倚与临床拟合3个维度综合比较。
      结果  研究共涉及患者138人次, 病例354诊次, 药物451味。分散性聚类中药物分散性聚类所得群集类26类, 群集数最大29, 最小1;方剂分散性聚类所得群集类22类, 群集数最大19, 最小3, 位点值最大14, 最小3。结构性聚类中F10药物凝聚层次聚类, 群集数最大25, 最小12;结构性聚类中F20药物凝聚层次聚类, 群集数最大21, 最小8;结构性聚类中F30药物凝聚层次聚类, 群集数最大15, 最小5。
      结论  对于中医临床医案单病种数据挖掘研究, 方法的选取主要取决于样本的总体数量与药物的总体频数。数据量较小时宜选取结构性聚类, 药物结构性聚类挖掘设计宜采用较高药物频幅, 挖掘终值偏倚较低, 研究结果临床拟合度较好; 数据量较大时宜选取分散性聚类, 分散性聚类挖掘设计宜采用方剂分散性聚类, 挖掘终值偏倚较低, 研究结果临床拟合度较好。

     

    Abstract:
      OBJECTIVE  To compare the mining results of different cluster analysis methods with lymphoma clinical medical records as sample data, and to analyze the optimization plan and result differences of the drug cluster mining methods in traditional Chinese medical records.
      METHODS  Through performing unified preprocessing and standardization of lymphoma medical records, as well as using fast clustering in decentralized clustering, and hierarchical clustering in structural clustering for analysis and mining, such three dimensionalities as characteristics of algorithms, terminal value bias, and clinical fitting were analyzed and compared comprehensively.
      RESULTS  This study involved 138 patients, 354 visits and 451 kinds of medicines. In the decentralized clustering, the drug decentralized clustering results had 26 clusters, while the largest number of clusters was 29 and the smallest was 1. The prescription decentralized clustering results included 22 clusters, the largest number of clusters was 19, the smallest was 3, the largest dot value was 14, and the smallest was 3. As for the F10 drug hierarchical clustering of structural clustering, the largest number of clusters was 25, and the smallest was 12. As for the F20 drug hierarchical clustering of structural clustering, the largest number of clusters was 21 and the smallest was 8. As for the F30 drug hierarchical clustering of structural clustering, the largest number of clusters was 15, and the smallest was 5.
      CONCLUSION  For the research on data mining of traditional Chinese medicine(TCM) clinical medical records for a single disease, different clustering methods have been applied to study the drug combination or core prescriptions used in the clinical application of Chinese medicine. The selection of the methods mainly depends on the total number of samples and the overall frequency of drugs. When the amount of data is small, structural clustering should be selected while a higher drug frequency range should be used in the design of drug structural clustering mining, so as to get a lower final value of the mining bias and a better research result of the clinical fitting degree. When the amount of data is large, the decentralized clustering should be selected, and the prescription decentralized clustering in the design of decentralized clustering mining should be adopted to get a lower final value of the mining bias and better research result of the clinical fitting degree.

     

/

返回文章
返回