YANG Lei, WANG Tianshu, YANG Tao, HU Kongfa. Study on TCM Constitution Identification Based on Multi-Level Feature Fusion of TCM Tongue ImagesJ. Journal of Nanjing University of traditional Chinese Medicine, 2026, 42(4): 627-636. DOI: 10.14148/j.issn.1672-0482.2026.0627
Citation: YANG Lei, WANG Tianshu, YANG Tao, HU Kongfa. Study on TCM Constitution Identification Based on Multi-Level Feature Fusion of TCM Tongue ImagesJ. Journal of Nanjing University of traditional Chinese Medicine, 2026, 42(4): 627-636. DOI: 10.14148/j.issn.1672-0482.2026.0627

Study on TCM Constitution Identification Based on Multi-Level Feature Fusion of TCM Tongue Images

  • OBJECTIVE To integrate multimodal features from tongue images and textual descriptions, constructing a hierarchically fused deep learning model for Traditional Chinese Medicine (TCM) constitution identification.
    METHODS Corresponding tongue diagnosis texts were generated using a large pre-trained language model, forming a multimodal dataset of 945 samples. The proposed TCM-DFM model employed ResNet50 to extract image features and BERT to encode text semantics. A gating mechanism was used in the low-dimensional feature space to achieve visual-semantic adaptive weighting, and cross-modal attention was used in the high-dimensional semantic space to establish pathological feature associations. A dynamic decision fusion mechanism was used to integrate the prediction results of unimodal and multimodal models. On a dataset containing six TCM constitution labels, the model performance was compared with baseline methods such as early fusion and late fusion, and the model performance was evaluated by metrics such as accuracy, precision, recall, F1 score, and confusion matrix.
    RESULTS The TCM-DFM model achieved an accuracy of 84.52%, precision of 82.54%, recall of 84.52%, and F1-score of 83.39%, outperforming all baseline models. In the comparison of multimodal fusion methods, the method of GCAF reached 83.33% accuracy, a 23.81% gain over the best unimodal model. Ablation tests verified the synergistic effects of the gating and attention mechanisms. Visualization showed the model concentrated on clinically key tongue regions, aligning with TCM “inspecting tongue shape” principles.
    CONCLUSION The proposed model effectively integrates information from tongue images and textual descriptions, overcoming limitations of unimodal analysis and conventional fusion methods. It significantly improves the accuracy of constitution classification and underscores the essential role of tongue diagnosis in TCM constitution identification.
  • loading

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return