Discovering Symptom-herb Relationship by Exploiting SHT Topic Model

Similar documents
An Intelligent Writing Assistant Module for Narrative Clinical Records based on Named Entity Recognition and Similarity Computation

Reader s Emotion Prediction Based on Partitioned Latent Dirichlet Allocation Model

Causal Knowledge Modeling for Traditional Chinese Medicine using OWL 2

A Predictive Chronological Model of Multiple Clinical Observations T R A V I S G O O D W I N A N D S A N D A M. H A R A B A G I U

Research on Digital Testing System of Evaluating Characteristics for Ultrasonic Transducer

Liu Jing and Liu Jing Diagnosis System in Classical TCM Discussions of Six Divisions or Six Confirmations Diagnosis System in Classical TCM Texts

Design of Palm Acupuncture Points Indicator

The use of Topic Modeling to Analyze Open-Ended Survey Items

Wikipedia-Based Automatic Diagnosis Prediction in Clinical Decision Support Systems

LUNG LESION PARENCHYMA SEGMENTATION ALGORITHM FOR CT IMAGES

Visualized study on formulation regularities of decoctions in traditional Chinese medicine

Session Report. Bioinformatics: Omic approach & data analysis ; Sunday 20 August 2017

Sentiment Classification of Chinese Reviews in Different Domain: A Comparative Study

May All Your Wishes Come True: A Study of Wishes and How to Recognize Them

Exploring Trends of Cancer Research Based on Topic Model

UNDERSTANDING THE EMOTIONS BEHIND SOCIAL IMAGES: INFERRING WITH USER DEMOGRAPHICS

GIANT: Geo-Informative Attributes for Location Recognition and Exploration

Phase IV clinical trial of Shufeng Jiedu Capsule in the treatment of cases of acute upper espiratory infection of wind-heat syndrome

Volume 2, Issue 3, March 2014 International Journal of Advance Research in Computer Science and Management Studies

Bayesian Face Recognition Using Gabor Features

Integrative network analysis: Bridging the gap between Western medicine and traditional Chinese medicine

Intro to Nutrition and Food Therapy in Traditional Chinese Medicine

Acupuncture Outmatches Drug For IBS

Research on Classification of Diseases of Clinical Imbalanced Data in Traditional Chinese Medicine

Statistical Validation of TCM Syndrome Postulates in the Context of Patients with Cardiovascular Disease

TCM Ideology and Methodology

The Development and Application of Bayesian Networks Used in Data Mining Under Big Data

A Comparison of Collaborative Filtering Methods for Medication Reconciliation

An Improved Algorithm To Predict Recurrence Of Breast Cancer

Hierarchical Latent Class Models and Statistical Foundation for Traditional Chinese Medicine

EMPEROR'S COLLEGE MTOM COURSE SYLLABUS HERB FORMULAE II

4-1 Dyspnea (Chuan, 喘 )

Mining Medline for New Possible Relations of Concepts

Data Mining in Bioinformatics Day 4: Text Mining

Deep Learning based Information Extraction Framework on Chinese Electronic Health Records

Acupuncture and Herbs Eliminate Meniere s Disease

Face Gender Classification on Consumer Images in a Multiethnic Environment

Mining Medical Causality for Diagnosis Assistance

EECS 433 Statistical Pattern Recognition

An Analysis on the Emotion in the Field of Translator's Subjectivity. Wei Yuehong1, a

Application of BP and RBF Neural Network in Classification Prognosis of Hepatitis B Virus Reactivation

A REVIEW PAPER ON DATA MINING CLASSIFICATION TECHNIQUES FOR DETECTION OF LUNG CANCER

Rating prediction on Amazon Fine Foods Reviews

Acupuncture Heals Erectile Dysfunction Finding

Lung Cancer Diagnosis from CT Images Using Fuzzy Inference System

Appropriate Quality regarde as

Upper limb biomechanical study of driving task based on AnyBody software

Bayesian Bi-Cluster Change-Point Model for Exploring Functional Brain Dynamics

Challenges of Automated Machine Learning on Causal Impact Analytics for Policy Evaluation

A scored AUC Metric for Classifier Evaluation and Selection

Research Article Clinical and Epidemiological Investigation of TCM Syndromes of Patients with Coronary Heart Disease in China

BioScience Trends. 2018; 12(5): College of Business Administration, Shenyang Pharmaceutical University, Shenyang, China.

Type II Fuzzy Possibilistic C-Mean Clustering

Does daily travel pattern disclose people s preference?

Joint Inference for Heterogeneous Dependency Parsing

Improved Processing Research on Arc Tooth Cylindrical Gear

Minimum Feature Selection for Epileptic Seizure Classification using Wavelet-based Feature Extraction and a Fuzzy Neural Network

JEAM: A Novel Model for Cross-Domain Sentiment Classification Based on Emotion Analysis

Bootstrapped Integrative Hypothesis Test, COPD-Lung Cancer Differentiation, and Joint mirnas Biomarkers

Video Saliency Detection via Dynamic Consistent Spatio- Temporal Attention Modelling

A SIMULATION BASED ESTIMATION OF CROWD ABILITY AND ITS INFLUENCE ON CROWDSOURCED EVALUATION OF DESIGN CONCEPTS

Using Bayesian Networks to Analyze Expression Data. Xu Siwei, s Muhammad Ali Faisal, s Tejal Joshi, s

CONTENT. Singapore: Development of Traditional Chinese Medicinal Industry Attracts Attentions... 3

International Journal of Computer Science Trends and Technology (IJCST) Volume 5 Issue 1, Jan Feb 2017

CNN based Multi-Instance Multi-Task Learning for Syndrome Differentiation of Diabetic Patients

Connecting Distant Entities with Induction through Conditional Random Fields for Named Entity Recognition: Precursor-Induced CRF

Annotation and Retrieval System Using Confabulation Model for ImageCLEF2011 Photo Annotation

Stepwise Knowledge Acquisition in a Fuzzy Knowledge Representation Framework

Frequently asked questions regarding concentrated herbs

Hierarchical Bayesian Modeling of Individual Differences in Texture Discrimination

Comparison of Two Approaches for Direct Food Calorie Estimation

Diagnosis of Diabetes Based on Nadi Pariksha Using Tridosha Analysis and ANN

EMPEROR'S COLLEGE MTOM COURSE SYLLABUS HERB FORMULAE I

Research Article Study on TCM Syndrome Differentiation of Primary Liver Cancer Based on the Analysis of Latent Structural Model

Rapid Detection of Milk Protein based on Proteolysis Catalyzed by Trypsinase

Contrastive Analysis on Emotional Cognition of Skeuomorphic and Flat Icon

How to Review a Paper. Prof. Dr. Guowang Xu Elsevier Author Workshop Apr. 17, 2009

Objective research on tongue manifestation of patients with eczema

Mission, Values and Vision. 1. To promote a long, joyous, healthier life and to provide affordable, compassionate holistic medical care for islanders.

24 h. P > h. R doi /j. issn

FEATURE EXTRACTION USING GAZE OF PARTICIPANTS FOR CLASSIFYING GENDER OF PEDESTRIANS IN IMAGES

Research Article Analysis of Agreement on Traditional Chinese Medical Diagnostics for Many Practitioners

Detection of Lung Cancer Using Backpropagation Neural Networks and Genetic Algorithm

Building a Diseases Symptoms Ontology for Medical Diagnosis: An Integrative Approach

Efficacy of the Extended Principal Orthogonal Decomposition Method on DNA Microarray Data in Cancer Detection

A Semi-supervised Approach to Perceived Age Prediction from Face Images

Handling Partial Preferences in the Belief AHP Method: Application to Life Cycle Assessment

arxiv: v1 [cs.ir] 25 Dec 2015

Incorporating Game-theoretic Rough Sets in Web-based Medical Decision Support Systems. Abstract

Human and Optimal Exploration and Exploitation in Bandit Problems

Data Mining Approaches for Diabetes using Feature selection

Variable Features Selection for Classification of Medical Data using SVM

Integrating the Best of Traditional Chinese Medicine with Conventional Healthcare

Research Article The Research of Clinical Decision Support System Based on Three-Layer Knowledge Base Model

Clinical Investigation II (Other Diseases and Safety) August 20, 2017, 9:30 12:00

610 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 59, NO. 2, FEBRUARY 2011

SCIENCE & TECHNOLOGY

Research on the Psychological Counseling Method of Ideological and. Political Education of Modern University Students. Wenwen Yi and Yi Yang

Transcription:

[DOI: 10.2197/ipsjtbio.10.16] Original Paper Discovering Symptom-herb Relationship by Exploiting SHT Topic Model Lidong Wang 1,a) Keyong Hu 1 Xiaodong Xu 2 Received: July 7, 2017, Accepted: August 29, 2017 Abstract: TCM has been widely researched through various methods in computer science in past decades, but none digs into huge amount of clinical cases to discover the meaningful treatment patterns between symptoms and herbs. To meet the challenge, we explore the unstructured and intricate experiential data in clinical case, and propose a method to discover the treatment patterns by introducing a novel topic model named SHT (Symptom-Herb Topic model). Combinational rules are incorporated into the learning process. We evaluate our method on 3,765 TCM clinical cases. The experiment validates the effectiveness of our method compared with LDA model and LinkLDA model. Keywords: Traditional Chinese Medicine, topic model, SHT model, combinational rules 1. Introduction Traditional Chinese medicine (TCM) has been attracting more and more attention because of its complementary therapeutic effects to western medicines. TCM involves multiple types of entities, such as herb, prescription (a composition that consists of certain herbs), symptom, and syndrome ( Zheng in Mandarin Chinese, a complex pattern of symptoms, which is used as a holistic summary of a patient s status). Multiple types of relations can exist between these heterogeneous entities, such as composition relations between herbs and prescription, treatment relations between symptoms and herbs. TCM clinical cases describe how doctors diagnose and cure the disease. The unstructured TCM clinical cases involve the symptoms of a patient, the corresponding herbs, the initial visit information and the return visit information. How to dig into enormous clinical cases to mine the valuable relations between symptoms and herbs remains a challenging task. Data mining approaches play critical roles in TCM related topics, such as new drug discovery [1], syndrome differentiation [2], [13], herbal combinational rule mining [4], [5], intelligent diagnosis [7], and patient classification [14]. Related works on relation extraction from the TCM literature are scarce. Wu et al. [6] was one of the pioneering works on this subject. The authors used a bootstrapping method to extract syndrome-disease associations from a corpus of data. In a recent work by Wang et al. [4], the authors created a herbal network based on attribute similarity calculation, and employed random walk based community detection to discover the latent combinational relations between two herbs. Chen et al. [7] designed a data mining approach to examine the relationship among symptoms, syndromes 1 Qianjiang College, Hangzhou Normal University, Hangzhou, China 2 Zhejiang Chinese Medical University, Zhejiang, China a) violet wld@163.com and herbs. This tripartite information network derived more accurate information than linking symptom and herb alone. Wan et al. [10] used a heterogeneous factor graph model (HFGM) to infer the multiple types of relations (e.g., herb-syndrome, herbdisease) from the entire corpus of TCM literature. Zhao et al. [13] found that a novel machine learning algorithm, minimum reference set-based multiple instance learning, was superior to other machine learning algorithms for TCM syndrome differentiation. Recently, more and more researchers have adopted topic models to discover the relations between TCM objects. Lin et al. [3] proposed a symptom-herb-therapies-diagnosis topic model to diagnose the disease and administer appropriate drugs and treatments given a patient s symptoms. Zhang et al. [8] proposed a hierarchical topic model (HSHT) to automatically extract the hierarchical latent topic structures with both symptoms and their corresponding herbs in the TCM clinical data. Yao et al. [11] employed Labeled-LDA (Labeled Latent Dirichlet Allocation) to mine treatment patterns in TCM clinical cases, but it only discovered the treatment patterns between herbs and disease by supervised model, which required labeled training data. The main goal of our paper is close to Zhang et al. [8]. However, we are different from theirs because: 1) We propose separate modeling for symptoms and herbs; 2) combinational rules between herbs are incorporated into the process of topic modeling, which is more consistent with TCM theory; that is, when two herbs are used together, their interaction should display their superiority over a single herb in the treatment of diseases. In TCM, a syndrome can be inferred from symptoms. The process of the treatment is to determine syndromes by observing a patient s symptoms and then determine appropriate herbs. Thus, we consider that the symptoms of a patient and the corresponding Chinese herbs have the same latent topic, which is known as syndrome. Based on this, we propose a topic model named SHT to automatically discover treatment patterns between sympc 2017 Information Processing Society of Japan 16

toms and herbs from TCM clinical cases. After topic modeling, we can obtain the probability distribution of symptoms and its corresponding list of herbs in one topic (syndrome). The mining results provide valuable auxiliary information for TCM clinical diagnosis. Specifically, TCM doctors can use these associations to assist clinical treatment, since the mining results show the treatment patterns between symptoms and herbs. For example, to cure a patient with the disease dyspnea with cough, the doctor can navigate the results and find out the corresponding herbs for reference (see Table 2). In addition, the extracted relations may promote the understanding of TCM in Western countries. 2. SHT Topic Model 2.1 Topic Modeling With respect to our topic model based method, a clinical case is considered as a document. A clinical case involves the symptoms of a patient and the corresponding Chinese herbs, so herbs and symptoms are treated as words in the document. TCM doctors have to select a set of herbs to cure a syndrome, which is reflected by a pattern of symptoms. In this way, a clinical case is a mixture of topics, syndromes are topics of the clinical case ( document ). And a corpus is a collection of clinical cases. Let C = {c 1, c 2,...,c d } be the set of clinical cases, Z = {z 1, z 2,...,z k } be the set of syndromes, H = {h 1, h 2,...,h k } be the set of herbs, S = {s 1, s 2,...,s n } be the set of symptoms. The generative process of clinical cases is shown in Fig. 1. This process is analogous to the generative process of probabilistic topic model [15]. Topic models, like Latent Dirichlet Allocation (LDA) [15], model each document as a mixture of underlying topics. Traditional LDA model generates a single word from one topic. Here, we generate a single symptom and a single herb from one syndrome. The generative processes for symptoms and herbs are very similar. Topics for symptoms are the probability distribution on symptom set, topics for herbs are the probability distribution on herb set. Note that Syndromes can be considered as the semantic bridge between symptoms and herbs. However, the efficiency of a single herb is usually limited in TCM. When two herbs are used together, their interaction should display their superiority over a single herb in the treatment of diseases, we say that these two herbs have compatibility rule. Thus, it is more meaningful to analyze paired herbs than a single herb. Based on the above, we propose a novel topic model named SHT model to discover the treatment patterns between symptoms and herbs, and incorporate compatibility rules into the model. We introduce a variable x i to indicate whether herb h i has compatibility rule with herb h j. If x i = 1, then h i and h j are paired herbs; otherwise, they are generated from the distributions associated with their corresponding syndromes. The graphical model of SHT model is shown in Fig. 2. In Fig. 2, plates represent replications, shaded circles represent observed variables, and unshaded circles represent hidden variables. The outer plate represents clinical cases, while the inner plates represents the repeated choice of topics (syndromes) and words (symptoms and herbs) within a clinical case. h i and h j are herbs, s denotes symptoms. z s denotes the topic assigned for symptoms, z h denotes the topic assigned for herbs. D is the number of clinical cases, K and L represent the topic number of symptoms and herbs, M and N represent the number of unique herbs and unique symptoms. γ is the prior parameter for variable x i. Dirichlet priors α and β are set over the clinical case and topic distributions, respectively. SHT generates a collection of clinical cases by the process below: (i) For each clinical case c i, i [1...D] in the collection, draw θ i from a Dirichlet distribution with parameter α. Each θ i represents the probability of certain topic (syndrome) in clinical case c i. (ii) For symptoms in each clinical case, draw δ k from a Dirichlet distribution with parameter β. Each δ k represents the probability of seeing all symptoms given topic k, k [1...K]. (iii) For herbs in each clinical case, draw ϕ l from a Dirichlet distribution with parameter β. Each ϕ l represents the probability of seeing all herbs given topic l, l [1...L]. (iv) For each symptom index s [1...N] in clinical case c i : (a) draw a topic z s from θ i, z s [1...K]; (b) draw a symptom s from δ zs. (v) For each herb h p, p [1...M] in clinical case c i : (a) generate x p from Bernoulli distribution with parameter γ; (b) draw a topic z h from θ i, z h [1...L]; (c) if x p = 0, draw a herb h p from ϕ zh ;ifx p = 1, draw a herb Fig. 1 The generative process of clinical cases. Fig. 2 Bayesian network of SHT model. c 2017 Information Processing Society of Japan 17

Table 1 The meaning of components in Eq. (1) Eq. (6). δ k (s) = nk s + β n k + Nβ θ i (k) = nci k + α n ci + Lα (5) (6) pair (h p, h q ) from ϕ zh. According to the TCM theory, the topic number for symptoms K should be numerically equal to the topic number for herbs L. We set K = L during the generative process. In step (iii), we construct distributions of herbs per topic analogously to the construction of the symptoms distributions per topic. 2.2 Learning SHT Parameters We employs Gibbs sampling for learning the parameters [15]. The Gibbs sampling procedure considers each symptom or each herb in the clinical case in turn, and estimates the probability of assigning the current symptom or herb to each topic, conditioned on the topic assignments to all other words. For all symptoms in each clinical case, we use the assignment of z s for term s based on p(z s s). Similarly, assign the topic z h for term h i based on p(z h h i ). The approximately computation of p(z s s) is described in Eq. (1). We use p(z s = k z s, s, s) tosimulatep(z s s), which estimates the probability of assigning the current symptom to each topic (p(z s = k)), conditioned on the topic assignment to all other symptoms (z s ), not including the current symptom ( s). During Gibbs sampling, we draw the topic assignment z s and z hi according to Eq. (1) and Eq. (2). p(z s = k z s, s, s) nk, s s + β n ci k, s + α (1) n k, s + Nβ n ci, s + Kα where z s = k means assigning current symptom s to topic (syndrome) k, z s denotes the topic assignments for all symptoms except symptom s. The meanings of n k, s s, n k, s, n ci k, s and nci, s refer to the corresponding components in Table 1, but not including the current assignment instance s (represented by the token s). If x i = 0: p(z hi = k z hi, h i, h i ) nk, hi h i n k, hi + β + Mβ n ci k, h i + α n ci, h i + Lα If x i = 1, h i and h j are regarded as a whole. We assign the topic for the unit h i, h j. After Gibbs Sampling iterations, we estimate the syndrome-herb distribution ϕ, the syndrome-symptom distribution δ and the document-syndrome distribution θ as follows: if x i = 0, ϕ k (h i ) = nk h i + β (3) n k + Mβ if x i = 1, ϕ k (h i, h j ) = nk h i + n k h j + β n k + Mβ (2) (4) 3. Results and Discussion 3.1 Setup We collect 3,765 clinical cases from Professional Knowledge Service System for Chinese Herbal Medicine *1. The symptoms and the herbs should be extracted by text matching according to Traditional Chinese Medical Subject Headings (TCM Mesh) [12] and Chinese pharmacopoeia (2,000 edition). We designed three experiments to validate our method: LDA-based method, LinkLDA-based method [9] and SHT-based method. LinkLDA can simultaneously model the content of documents and citations in previous work [9]. We employed it to extract the latent topic structures which involve the symptoms and their corresponding herbs. To evaluate the performance of our topic model, we used two metrics: the perplexity and the accuracy of top 5 words discovered for latent topics. The former can be thought of as the effective number of equally likely words (symptoms or herbs) according to the model. It is a common way to evaluate the effectiveness of topic models on topic modeling. We computed the perplexity of the test sets with parameters learned from the corresponding training sets. Let C be the set of clinical cases, the definition of perplexity is defined as follows: Dtest perplexity(c test ) = exp i=1 ln p(w i C train ) Dtest i=1 N (7) i where C test is the test data set, w i is a vector of words in clinical case c i of the test set, and C train is the training set. N i denotes the total number of symptoms and herbs in clinical case c i. p(w i C train ) denotes the probability of the words w i in a test clinical case c i under the parameters trained by training set. Note that lower numbers denote better performance. The latter evaluation can be computed as follows: The correct number of returned words Accuracy = (8) The total number of returned words The correct number of returned words (symptoms and herbs) is determined by expert s manual judgement. For each syndrome (topic), if a symptom can reflect the syndrome and a herb has therapeutic effects on the syndrome, then the word is correct. We randomly selected 20% clinical cases as the test set. In topic modeling process, we set the hyper-parameters for both two models as follows: α = 50/K, β = 0.1, and the iteration number l = 100. 3.2 Overall Performance and Discussions In ancient TCM books, 917 paired herbs have been discovered by TCM experts *2. The data set of paired herbs was incorporated into the SHT topic modeling process. We conducted treatment pattern mining through LDA, LinkLDA and SHT model, and calculated perplexity on different number of topics, which vary from *1 http://zcy.ckcest.cn/medicalrecord/browse *2 http://pan.baidu.com/s/1jifnae6 c 2017 Information Processing Society of Japan 18

Table 2 Probability distributions of 5 topics in SHT model. 10 to 100. Table 2 shows the probability distributions of 5 discovered topics in SHT model. In Table 2, top 5 symptoms and top 5 herbs are returned for each topic. For example, several symptoms are returned in the topic Dyspnea with cough, such as cough, excessive phlegm, deep and adynamic pulse, thick yellow sputum and upward adverseness of gas to the chest. These words are typical symptoms in the syndrome of Dyspnea with cough. The efficiency of the corresponding 5 herbs is relieving cough. These results show that the treatment patterns between symptoms and herbs can be discovered by our model. Meanwhile, a pair of herbs tend to indicate more intensive relations with the corresponding topics than a single herb, such as Caulis Perllae and Platycodonis Radix in topic 1, Saposhnikoviae Radix and Schizonepetae Herba in topic 5. Therapeutic effects can be promoted by the coordination of two herbs. The results are valuable for TCM practitioners to conduct automatic diagnosis research and new prescription discovery. However, LDA cannot discover combinations of effective interacting herbs. Fig. 3 Perplexity of different models. Figure 3 shows the perplexity scores on different number of topics for LDA, LinkLDA and SHT. We can see that SHT and LinkLDA outperform LDA regardless of the topic number, which demonstrates the effectiveness of separate modeling for symptoms and herbs. Actually, the separate modelling for symptoms c 2017 Information Processing Society of Japan 19

This paper has presented a method of clinical records mining based on probabilistic topic model. We propose a novel topic model named SHT to discover the treatment patterns between symptoms and herbs. The combinational rules are incorporated into the SHT modeling process. Each discovered topic involves a list of symptoms and its corresponding list of herbs. The performance shows that our approach is superior to other topic models in extracting symptom-herb relations from TCM clinical cases. The results can provide valuable information for TCM automatic diagnosis or poly-pharmacology research. The dosage of herbs in a prescription plays a key role in clinical treatment. The efficiency of a composition of herbs would change when we adjust the dosage of herbs. In the future, we plan to incorporate the dosage information into the topic modeling process. Besides, we intend to use the mining results to construct a calculation model for automatic diagnosis. Specifically, when the doctor provides the symptoms of a patient, our model may automatically return the corresponding combination of herbs to cure the disease. Acknowledgments This study was funded by Zhejiang Provincial Natural Science Foundation of China under Grant No. LQ14F020008 and No. LY17E070004, National Natural Science Foundation of China under Grant No. 61602402. Fig. 4 Accuracy of discovered words. and herbs can improve the topic structure. If symptoms and herbs are regarded as a whole, the probability distribution for symptoms and herbs will be mixed together. SHT performs better than LinkLDA when K 60, which means that concerning combinational rules can improve the modeling performance. However, the performance of SHT is close to LinkLDA when K > 60, this is possibly because the larger number of topics would decrease the number of paired herbs. Figure 4 shows the accuracy of discovered words. The accuracy has to be calculated manually by expert s instruction, thus we set the topic number between 10 and 40 to alleviate the heavy workload for our task. Our results show that the SHT is more efficient at extracting symptom-herb relationship from the clinical case compared with the basic LDA model (increasing accuracy by 18%, 17%, 19% and 23% for different number of topics) and the LinkLDA (increasing precision by 5%, 12%, 1%, 2% for different number of topics). However, some of our results can be improved upon, and our approach can be expanded upon in the future. Firstly, some other types of important TCM entities, such as prescriptions and diseases, are not incorporated into our model. If we can bring such entities into our unified model in the future, then more types of relations can be extracted. Secondly, most of the symptom names are manually extracted because there is not a standard or unified terminology glossary for TCM symptoms, so entity recognition techniques are needed to detect symptom entities in clinical cases. 4. Conclusions References [1] Yang, H., Chen, J., Tang, S., Li, Z., Zhen, Y., Huang, L. and Yi, J.: New drug R&D of traditional Chinese medicine: Role of data mining approaches, Journal of Biological Systems, Vol.17, No.03, pp.329 347 (2009). [2] Liu, X.L., Hong, W.X., et al.: Using Formal Concept Analysis to Visualize Relationships of Syndromes in Traditional Chinese Medicine, Medical Biometrics, Vol.6165, pp.315 324 (2010). [3] Yang, T., Wu, C., Xu, Z. and Ding, Y.: The syndrome differentiation model and program of traditional Chinese medicine based on the fuzzy recognition, Proc. BIBM, pp.285 287, Shanghai, China, IEEE (2013). [4] Wang, L., Zhang, Y. and Xu, X.: A Novel Group Detection Method for Finding Related Chinese Herbs, Journal of Information Science and Engineering, Vol.31, No.4, pp.1387 1411 (2015). [5] Qiao, S.J. and Tang, C.J.: Mining the compatibility rule of multidimensional medicines based on dependence model sets, Journal of Sichuan University (Engineering and Science Edition), Vol.39, No.4, pp.134 138 (2007). [6] Wu, Z., Zhou, X., Liu, B., et al.: Text mining for finding functional community of related genes using TCM knowledge, Proc. PKDD 04, Seattle, pp.454 470, WA, USA, ACM (2004). [7] Chen, J., Poon, J., Poon, S.K., Xu, L. and Daniel M.Y.: Mining Symptom-Herb Patterns from Patient Records Using Tripartite Graph, Evidence-Based Complementary and Alternative Medicine, Vol.2015, 435085 (2015). [8] Zhang, X., Zhou, X., Huang, H., Chen, S. and Liu, B.: A hierarchical symptom-herb topic model for analyzing traditional Chinese medicine clinical diabetic data, Proc. BMEI, pp.2246 2249, Yantai, China, IEEE (2010). [9] Lin, Y. and Mizil, A.N.: Topic-Link LDA: Joint Models of Topic and Author Community, Proc. ICML, pp.665 672, Montreal, Canada, IEEE (2009). [10] Wan, H., Moens, M., Luyten, W., Zhou, X., Mei, Q., Liu, L. and Tang, J.: Extracting relations from traditional Chinese medicine literature via heterogeneous entity networks, Journal of the American Medical Informatics Association, Vol.23, No.2, pp.356 365 (2016). [11] Yao, L., Zhang, Y., Wei, B., Wang, W., Zhang, Y., Ren, X. and Bian, Y.: Discovering treatment pattern in Traditional Chinese Medicine clinical cases by exploiting supervised topic model and domain knowledge, Journal of Biomedical Informatics, Vol.58, pp.260 267 (2105). [12] Wu, L.: Chinese Traditional Medicine and Material Medical Subject Headings, Chinese Medical Ancient Books Publishing, Beijing (1996). [13] Zhao, Y., He, L., Xie, Q., Li, G., Liu, B. and Wang, J.: A Novel Classification Method for Syndrome Differentiation of Patients with AIDS, Evidence-Based Complementary and Alternative Medicine, Vol.2015, 936290 (2015). [14] Zhao, C., Li, G., Wang, C. and Niu, J.: Advances in Patient Classification for Traditional Chinese Medicine: A Machine Learning Perspective, Evidence-Based Complementary and Alternative Medicine, Vol.2015, 936290 (2015). [15] Steyvers, M. and Griffiths, T.: Probabilistic topic models, Latent Semantic Analysis: A Road to Meaning, Landauer, T., et al. (Eds.), Lawrence Erlbaum (2006). c 2017 Information Processing Society of Japan 20

Lidong Wang was born on December, 4, 1982. She received her M.S. degree in Computer Science from Ningbo University and her Ph.D. degree from the College of Computer Science and Technology, Zhejiang University. She is currently an Associate Professor at Hangzhou Normal University. Her current research interests include image processing, machine learning and text mining. technology. Keyong Hu is currently a teacher of Electronic Information Engineering in Qianjiang College of Hangzhou Normal University. He has received Ph.D. degree in 2016 from Zhejiang University of Technology, Hangzhou China, in Mechatronic Engineering. His research interests include artificial intelligence, new energy Xiaodong Xu is currently a Professor at Zhejiang Chinese Medical University. He has (co)authored over 30 publications on the drug exploitation in Traditional Chinese Medicine. His research interests include prescriptions, drug exploitation and combinational rule analysis. (Communicated by Tatsuya Akutsu) c 2017 Information Processing Society of Japan 21