OBJECTIVE To compare the mining results of different cluster analysis methods with lymphoma clinical medical records as sample data, and to analyze the optimization plan and result differences of the drug cluster mining methods in traditional Chinese medical records.
METHODS Through performing unified preprocessing and standardization of lymphoma medical records, as well as using fast clustering in decentralized clustering, and hierarchical clustering in structural clustering for analysis and mining, such three dimensionalities as characteristics of algorithms, terminal value bias, and clinical fitting were analyzed and compared comprehensively.
RESULTS This study involved 138 patients, 354 visits and 451 kinds of medicines. In the decentralized clustering, the drug decentralized clustering results had 26 clusters, while the largest number of clusters was 29 and the smallest was 1. The prescription decentralized clustering results included 22 clusters, the largest number of clusters was 19, the smallest was 3, the largest dot value was 14, and the smallest was 3. As for the F10 drug hierarchical clustering of structural clustering, the largest number of clusters was 25, and the smallest was 12. As for the F20 drug hierarchical clustering of structural clustering, the largest number of clusters was 21 and the smallest was 8. As for the F30 drug hierarchical clustering of structural clustering, the largest number of clusters was 15, and the smallest was 5.
CONCLUSION For the research on data mining of traditional Chinese medicine(TCM) clinical medical records for a single disease, different clustering methods have been applied to study the drug combination or core prescriptions used in the clinical application of Chinese medicine. The selection of the methods mainly depends on the total number of samples and the overall frequency of drugs. When the amount of data is small, structural clustering should be selected while a higher drug frequency range should be used in the design of drug structural clustering mining, so as to get a lower final value of the mining bias and a better research result of the clinical fitting degree. When the amount of data is large, the decentralized clustering should be selected, and the prescription decentralized clustering in the design of decentralized clustering mining should be adopted to get a lower final value of the mining bias and better research result of the clinical fitting degree.