A TCM acupoints ranking approach towards post-stroke dysphagia based on an improved MCTS decision method

Su, Chong; Han, Peijin; Jiang, Bingxu; Liu, Cunzhi; Chen, Jie

doi:10.3233/THC-199034

A TCM acupoints ranking approach towards post-stroke dysphagia based on an improved MCTS decision method

Issue title: The 7th International Conference on Biomedical Engineering and Biotechnology

Guest editors: Carlos Gómez and Severin P. Schwarzacher

Article type: Research Article

Authors: Su, Chong^{a; 1; *} | Han, Peijin^b | Jiang, Bingxu^a | Liu, Cunzhi^{c; *} | Chen, Jie^{d; 1}

Affiliations: [a] College of Information Science and Technology, Beijing University of Chemical Technology, Beijing 100029, China | [b] School of Medicine, Johns Hopkins University, Baltimore, MA 21224, USA | [c] Dongfang Hospital, Beijing University of Chinese Medicine, Beijing 100073, China | [d] Department of Acupuncture and Massage, Beijing Zhongguancun Hospital, Beijing 100190, China

Correspondence: [*] Corresponding authors: Chong Su, College of Information Science and Technology, Beijing University of Chemical Technology, Beijing, 100029, China. E-mail: [email protected]; Cunzhi Liu, Dongfang Hospital, Beijing University of Chinese Medicine, Beijing 100073, China. E-mail: [email protected].

Note: [1] Co-first Authors.

Keywords: Hybrid knowledge, Monte Carlo Tree Search (MCTS) algorithm, sequential decision-making, multiple acupoints

DOI: 10.3233/THC-199034

Journal: Technology and Health Care, vol. 27, no. S1, pp. 367-381, 2019

Published: 18 June 2019

Get PDF

Abstract

Traditional Chinese Medicine (TCM) multiple-acupoints stimulation is widely used to improve dysphagia among post-stroke patients. However, prior research in evidence-based acupuncture mostly focused on the treatment effects of single acupoint’s on dysphagia, while the evidence of optimal sequence of multiple-acupoints stimulation remains limited. In this paper, we developed an evaluation method of hybrid knowledge (deterministic knowledge and the experiential group decision knowledge) sequences based on segmentation mechanism of sub-sequence fragments, and then, we proposed a Monte Carlo Tree Search (MCTS) sequential decision-making method under the hybrid knowledge. Thereafter, we applied this proposed sequential decision-making approach to optimizing sequential decision-making schema of multiple-acupoints stimulation, to treat dysphagia among post-stroke patients. Finally, we verified the validity and the feasibility of this method by comparing it to other sequential decision-making search methods.

1.Introduction

Traditional sequential decision-making methods mainly focus on solving the problems of time series prediction and sequential decision-making in technological processes. In all kinds of assembly sequence production, sequential decision-making problems mainly exists in the process of workpiece processing [1, 2, 3], investment decision-making, etc. [4, 5]. Sequential decision-making methods can also be applied to medical research, such as medical resource scheduling [6] and medical personnel scheduling [7].

In Traditional Chinese Medicine (TCM), the decision of multiple-acupoints stimulation sequence can affect the efficacy of acupuncture [8]. Therefore, sequential decision-making of acupoints sequence is a typical sequential decision-making problem, which is worth studying for its clinical significance. In mathematics, compared to the traditional “optimization of set division [9, 10, 11]” problems, “sequencial decision-making problems in multi-acupoints” can be seen as “optimization problem of treatment sequence division”. Prior research in evidence-based acupuncture mostly focused on evaluating single acupoint’s treatment effects on symptom improvement [12, 13] using statistical tests. Quantitative research on the treatment effects of multiple-acupoints stimulation sequences is limited. For example, no consensus currently exists on the optimal multiple-acupoints sequence for treating post-stroke dysphagia [14, 15, 16]. Given the decisions of multi-acupoint stimulating sequences are made mainly based on the experts’ experience, it is difficult to determine the optimal sequence, and further to establish the standardized the treatment protocols and to train junior Traditional Chinese Physicians.

At present, difficulties in the complex sequential decision-making problems (e.g. the generation process of complex medical treatment schemes) remain to be solved, such as comprehensively understanding the decision-making mechanisms, the non-linear relationship between each decision-making node, and the final decision-making results. Compared to deterministic prior knowledge, experts’ group discussion [17, 18] can help extract the experts’ group knowledge to build the hybrid knowledge model to constrain the treatment decision sequence generation in increasingly complex decision-making problems.

Besides, traditional sequential decision-making search methods often require explicit structured objective functions as search drivers. Monte Carlo Tree Search (MCTS) provides a method to select random samples in decision space, which can construct a search tree based on the results of periodic decision and find the best decision sequence in a given domain. The great success of the application of MCTS to Computer Go [19, 20, 21, 22, 23, 24] indicates its significant impact on the field of artificial intelligence. Therefore, the randomness of the search process and the arbitrariness of the termination time of the MCTS method lay a solid theoretical foundation for the construction of sequence searching given incomplete prior knowledge.

Inspired by the discussions above, we proposed a MCTS sequential decision-making method based on hybrid knowledge, aiming at showing the generation process of feasible decision-making solutions in sequential decision-making problems, with ambiguous objective functions and the participation of experts. We further applied this method to generating ranking scheme of multiple-acupoint, in treating dysphagia among post-stroke patients.

The rest of the paper was organized as follows. Section 2 introduces the evaluation method of decision sequence based on hybrid knowledge; Section 3 presents the MCTS sequence decision method based on mixed knowledge; Section 4 provides the validation of the proposed sequential decision method by applying it on the generation process of the acupuncture sequential scheme for improving dysphagia among post-stroke patients. In addition, the advantages of the proposed method are verified by comparing it to the other traditional search methods. Finally, Section 5 offers conclusions and prospects for future researches.

2.Decision sequence priority evaluation method based on hybrid knowledge

We introduced the acquisition method of the hybrid knowledge based on the combination of deterministic knowledge and experts’ group knowledge. Then, we proposed an evaluation method of sequential decision based on hybrid knowledge.

2.1Decision sequence evaluation based on hybrid knowledge

In this section, a conceptual framework of complete decision sequence evaluation based on subsequence fragment priority is described. The process of evaluating a complete decision sequence was shown in Fig. 1.

Figure 1.

Evaluation flow chart of complete decision sequence.

As shown in Fig. 1, the evaluation process consists of two steps: 1) obtain hybrid prior knowledge based on existing deterministic knowledge and group decision-making empirical knowledge, 2) calculate the evaluation value of a complete sequence according to sequence segmentation results (sub-sequence fragmentation state) and sub-sequence’s (or sub-node’s) position.

The detailed implementation method was presented as follows.

2.2Hybrid knowledge extraction

If the experts’ knowledge is predominant in sequential decision-making problems, it is efficient to construct hybrid knowledge by combining their empirical knowledge and a small amount of deterministic knowledge.

2.2.1Empirical knowledge based on group decision making

In order to discuss the evaluation methods of the sub-sequence fragments of the complete decision sequence in this paper, the definition of the sub-sequence fragments is first given.

Definition 1 (sub-sequence fragments): In a complete sequence evaluation, a sub-sequence consisting of one or more nodes is called a subsequence fragment. The sub-sequence consisting of one node is called one-dimensional sub-sequence, while the sequence consisting of n nodes is n-dimensional sub-sequence.

Therefore, in order to give the ranking of one-dimensional sub-sequence and multi-dimensional sub-sequence, we obtain the priority of each subsequence fragment by the classical group decision-making method based on fuzzy complementary judgment matrix approach [17, 18] (this approach focuses on the group experts’ preference among options).

2.2.2Integration of deterministic and empirical knowledge

In knowledge modeling, the existing experience or confirmed deterministic knowledge is assigned priority values based on their levels of certainty. The priority values can be classified into n levels (level 1 demonstrates highest certainty while level n demonstrates the lowest certainty). Detailed quantification of hierarchical partition is shown in Eq. (1). The evaluation value of subsequence fragments is stored in set E⁢{} Eq. (2):

(1)

value_point={1÷n×1level ⁢n⋮⋮(m+1)÷n×1level ⁢n-m⋮⋮n÷n×1level ⁢1

(2)

E⁢{}={value_point1,value_point2,⋯,value_pointn}

Where, value_point denotes the final evaluation value of the sub-sequence fragments, derived by level of evidence; n denotes the number of the levels; E⁢{} denotes the space of empirical and deterministic hybrid knowledge.

2.3Decision sequence evaluation based on hybrid knowledge

In this section, we give the evaluation value of a complete sequence based on the hybrid prior knowledge. Because of the vector feature of the decision sequence discussed in this paper, the orders of sub-sequence fragments (including one-dimensional decision node or multi-dimensional sub-sequence fragments) in the sequence can affect the results. Therefore, we propose a sequence segmentation method to get all the evaluation values of the complete sequences and return the maximum value as the evaluation value.

2.3.1Segmentation mechanism of sub-sequence fragments

We hypothesize that in hybrid prior knowledge, we only obtain the evaluation values of part of sub-sequence fragments in the whole decision sequence. Meanwhile, for one long decision sequence, there may be multiple evaluation values because its segmentation mode cannot be determined. Therefore, in this case, we give the definition of decision sequence segmentation.

Definition 2 (Decision Sequence Segmentation): When a long sequence is input, it is necessary to discuss all of the combinations of subsequence fragments, which is called Decision Sequence Segmentation.

We cannot compare the multiple segmentation patterns without prior knowledge. Therefore, based on the hybrid knowledge, we calculate the evaluation values for all the segmentation patterns in a given n length sequence, and then return the maximum value as its final evaluation value.

2.3.2Priority evaluation of complete sequences

The priority of a given complete sequence is determined by its final evaluation value. We propose a method to calculate the evaluation values for all the segmentation patterns in a given complete sequence. First, we give a definition of the evaluation preference.

Definition 3 (Evaluation preference of Decision Sequence Segmentation): In a given complete se- quence, higher evaluation values are assigned to the anterior sub-sequence fragments while lower evaluation values are assigned to the posterior sub-sequence fragments. We set 0.1 as the modifier value. Specific sequence fragment assignment formula is designed as follows:

(3)

𝑣𝑎𝑙i=0.1n×(n-k)+si

Where, n denotes the length of the sequence; k denotes the location of the sub-sequence fragment (calculated from the first decision-making stage of sequence fragments); si denotes the baseline value of the kth sub-sequence fragment based on hybrid knowledge database; 𝑣𝑎𝑙i denotes the final evaluation value of the kth sub-sequence fragment. i denotes the number of the evaluated complete sequence.

Then, according to different sequence segmentation patterns, we multiply the evaluation value of each sub-sequence fragment under each segmentation mode, and select the largest evaluation value as the final evaluation value of the complete decision sequence, shown as Eq. (4):

(4)

𝑣𝑎𝑙𝑢𝑒𝑙𝑖𝑠𝑡=πi=1n⁢𝑣𝑎𝑙i

Where m denotes the length of the given complete sequence, where we can obtain the total of 2 (n1) different sequence evaluation values. The maximum of them can be returned as the final_value, shown as Eq. (5):

(5)

final_value=max⁡{value_list}

3.MCTS sequential decision making method based on hybrid knowledge

In the real sequential decision-making process, the selection of sub-nodes is based on the current state feedback, thus MCTS algorithm is more suitable for solving this kind of sequential decision-making with “opponent model”. Therefore, based on the evaluation method of complete decision sequences proposed in the previous chapters, a MCTS sequential decision method based on hybrid knowledge is presented in this section.

3.1Improved MCTS-tree strategy optimization method

3.1.1Traditional MCTS method

Assuming that each child node in MCTS has the same attributes, the attributes of the child node in the Monte Carlo tree can be described as follows:

(6)

si=f⁢(d,n,v)

Where, si denotes the ith decision node, s0 denotes the original root decision node; d is the depth of the nodes; n is the visiting number of the node; v is designed as the evaluation value to the node. The traditional MCTS flow chart is shown as follows.

The traditional MCTS algorithm is completed by four steps: selection, expansion, simulation and backpropagation.

3.1.2The improved MCTS method

Traditional MCTS with complete random searching is not very effective in practical sequential decision-making. Therefore, incorporating the complete sequence evaluation method based on subse- quence fragments introduced in Section 2.3, we propose a novel UCT-max algorithm which considers not only the nodes’ evaluation value of the complete sequence, but also the local extremum of the next tree searching step. The novel UCT-max calculation approach is shown as follows:

(7)

𝑈𝐶𝑇-max=Cq×max⁢()+(1-Cq)×vl¯+2⁢C⁢p⁢2⁢ln⁡nnj

Where, Max () is the node with the largest evaluation value corresponding to the greedy maximum value, where the better value can be found by sequential searching. Cq is the proportion coefficient between the maximum evaluation value and the average evaluation value, which can help the searching avoid falling into local optimum value by a “greedy” direction on the basis of the overall situation.

3.2Sequential decision-making process based on hybrid knowledge and UCT-max algorithm

As shown in Fig. 2, this sequential searching algorithm is different from the traditional MCTS method. The steps of this algorithm are shown in detail as follows.

Figure 2.

UCT-max sequential decision flow chart based on hybrid knowledge.

Step 1: Set the empty node as root node, establish searching tree and give the width and depth of expansion.
Step 2: When expanding the searching tree, we need judge whether the node meets the selection requirements. If the node meets the selection requirements, we do “selection” operation, otherwise, we do “expansion” operation.
Step 3: When the searching tree reaches the target depth, it begins to simulate game decision process and randomly adds sub-nodes until the end of the sequence.
Step 4: Input the simulated sequence into the decision sequence evaluation equations, and obtain the evaluation value of the sequence according to the evaluation method in Fig. 1.
Step 5: Update the evaluation value and visiting times of the parent node in reverse order, and calculate the UCT-max value of the nodes according to Eq. (7).
Step 6: Judge whether the search is achieved. If the process is completed, output the sequence with the largest current evaluation value, otherwise, return to step 2.

4.Case studies

In this section, towards the multi-acupoints treatment sequencing scheme in dysphagia after stroke, the MCTS-max sequence decision-making method based on hybrid knowledge is applied and validated. Compared to the traditional sequential decision-making method, the advantages of the proposed algorithm are discussed.

4.1Acupoint sequences ranking problem in treating post-stroke dysphagia

We built a hybrid knowledge model from two aspects: the deterministic knowledge based on evidence-based medicine [25, 26] and the experience knowledge based on expert group decision strategy.

4.1.1Construction of hybrid priori knowledge

1) Deterministic knowledge acquisition based on evidence-based medicine

We constructed a deterministic knowledge database based on literatures [27, 28] regarding to the selection of acupuncture points for post-stroke dysphagia treatment. The scope of literature search [28] presented in this paper including: China Knowledge Network (CNKI), Hong Kong University database, Ovid Lww database, PubMed, and Cochrane Library online database. The retrieval time was up to February 29, 2016. Search Formula: (1) Chinese database was searched with the keywords relating to stroke, dysphagia, cerebral hemorrhage, cerebral infarction “AND” other stroke related terms (dysphagia, dysphagia, pseudobulbar palsy, true bulbar palsy, etc.) “AND” related terms of acupuncture (electro-acupuncture, rehabilitation training, swallowing training, etc.). (2) The English database was searched by using keywords as “dysphagia and stroke” and “acupuncture and swallow training”. Four hundred and two articles were retrieved and included in the literature on acupuncture and moxibustion treatment of post-stroke dysphagia. Four hundred and two prescriptions of acupuncture and moxibustion were extracted, among which 95 acupoints (62 points of fourteen meridians and 33 points of Waiqi meridians) were stimulated with the total frequency of 1981. We assigned each acupoint as an unique ID. The frequency of each acupoint is partially shown in Appendix 2.

2) Empirical knowledge acquisition based on group decision making

Four senior acupuncturists from Beijing Dongfang Hospital were invited to discuss the combinations of acupuncture sub-subsequence fragments with different dimensions, where one dimension corresponded to a single acupoint. Then, the experts’ decision was processed by group decision-making method (Appendix 1), where the evaluation values of one-dimensional, two-dimensional and three-dimensional sub-sequence fragments were obtained. The acupoints combinations and the evaluation values of sub-sequence fragments derived from group decision-making were presented in Appendix 3.

3) The efficiency of hybrid knowledge

The hybrid knowledge module constructed in the second chapter of this paper provides an effective way to analyze deterministic knowledge and empirical knowledge. Based on the deterministic knowledge of acupuncture extracted from literatures (Appendix 1), the number of acupoints involved in evidence-based dysphagia treatment in post-stroke patients has been reduced from more than 300 acupoints of the human body to 95 commonly used acupoints. In addition, from the empirical knowledge generated by group decision-making, 95 acupoints has been reduced to the five most commonly used acupoints in clinical practice. Hence, the application of hybrid model reduced the decision sequence searching space complexity from to, reduced the search time, and improved the search performance.

4.1.2Segmentation of acupuncture and moxibustion treatment sequence

Based on the previous discussion (Eqs (3)–(5) in Section 2.3), assuming that the acupoint ranking problem with depth of 5 (defining five different acupoints as five decision node variables: a1, a2, a3, a4, a5), and then, there are 16 ways of sequence segmentation.

For a given sequence of five acupoints, we calculated the evaluation values for all the segmentation patterns, and then returned the maximum value as its final evaluation value.

4.2Acupuncture ranking based on integrating hybrid knowledge and MCTS

4.2.1Sequential decision making process

According to the clinical workflow of treating dysphagia (the time of each acupuncture treatment for a single patient is generally less than 30 minutes, and the clinicians generally select at least five acupoints for acupuncture), we initialized the sequential decision-making problem as follows: five decision-making nodes (five acupoints of acupuncture and moxibustion), tree search depth of 2, width of 50. Based on the Fig. 2 in Section 3.2, we gave the detailed sequence searching process as follows:

Step 1: Select an acupoint to establish a root node randomly, expand downward into a search tree with depth of 5.
Step 2: Judge the depth of nodes and the number of sub-nodes, and when the selection conditions are satisfied, the selection process begins.
Step 3: When the depth reaches 2, the simulation begins until the end of the sequence, and the sequence is input into the evaluation module.
Step 4: Update the value and the visiting number. When the one-time MCTS is completed, a sequence string will be obtained.
Step 5: Adjust the parameters of node evaluation value, and after several executions, a series of sequences can be obtained for experts’ selection and exclusion.

4.2.2Experimental results and analysis

Because of the complexity of tree searching, we only take one search process here and briefly introduce its search process. The nodes’ ranking result of this search is: “1, 2, 52, 4, 9”, corresponding to the acupoints Lianquan, Fengchi, Throat Wall, Jinjin, Renying.

Then we set a 3-tuple (a, (b, c)) to describe the node’s searching information. Where, a is the acupoint ID, b is the evaluation value of the node and c is the visiting times of the MCTS algorithm to this node.

Table 1 lists the evaluation value and visiting times of the first layer nodes in the search process, along with the evaluation value and visiting times of the remaining nodes.

Table 1

Evaluation value and visiting times of first layer nodes

(1, (0.88, 12550))	(2, (0.83, 10050))	(3, (0.6, 50))	(5, (0.48, 50))	(7, (0.45, 50))
(10, (0.61, 2550))	(11, (0.46, 50))	(12, (0.31, 50))	(13, (0.17, 50))	(15, (0.36, 50))
(19, (0.31, 50))	(21, (0.14, 50))	(23, (0.28, 50))	(24, (0.17, 50))	(26, (0.3, 50))
(27, (0.11, 50))	(28, (0.06, 50))	(29, (0.06, 50))	(30, (0.07, 50))	(31, (0.07, 50))
(32, (0.06, 50))	(33, (0.2, 50))	(34, (0.24, 50))	(35, (0.09, 50))	(38, (0.1, 50))
(39, (0.24, 50))	(40, (0.14, 50))	(42, (0.1, 50))	(43, (0.13, 50))	(45, (0.04, 50))
(46, (0.05, 50))	(49, (0.1, 50))	(50, (0.13, 50))	(52, (0.19, 50))	(53, (0.36, 50))
(54, (0.15, 50))	(55, (0.21, 50))	(57, (0.16, 50))	(58, (0.06, 50))	(59, (0.09, 50))
(62, (0.45, 50))	(64, (0.13, 50))	(65, (0.23, 50))	(66, (0.48, 50))	(68, (0.13, 50))
(69, (0.13, 50))	(70, (0.15, 50))	(71, (0.07, 50))	(74, (0.09, 50))	(77, (0.12, 50))

The convergence of MCTS itself has been proved in references [29, 30]. In 30 experiments, the average value of the final evaluation value was roughly 0.2. In addition, based on 30 experiments, we obtained the optimal node sequence of global convergence [10, 9, 52, 2, 1], corresponding to the acupoints Panlianquan, Renying, Throat Wall, Jinjin, Renying.

4.2.3Clinical validation

We hypothesize the proposed acupuncture sequence is superior to the standard treatment. In order to compare the therapeutic effect of two treatments, we invited one acupuncturist and one rehabilitation physician from Beijing Zhongguancun Hospital to conduct a randomized controlled clinical trial. Patients were enrolled in the study if they were diagnosed with stroke, presented “Oral phase” dysphagia symptom lasting for one month to three months, and aged from 20 to 85 years old. Patients were excluded if they had severe dementia, mental disorders, sensory aphasia, or severe needle fainting. This study was approved by the Hospital IRB, and all patients signed the inform consent.

From December 2017 to December 2018, 84 patients were recruited and randomly assigned to treatment group (42 cases) and control group (42 controls). Dysphagia was assessed using the University of Southern Manchester clinical bedside assessment scale and Watian drinking experiment bedside assessment scale [14, 15, 16]. The decrease of the score indicates the improvement of dysphagia. There was no significant differences in social demographics, clinical pathophysiology characteristics and swallowing scores between the two groups before treatment.

For treatment group, the physician stimulated five acupoints in following order: Bian Lianquan, Renying, larynx wall, Jinjin and Lianquan. For the control group, the physician stimulated five acupoints in random order: Lianquan, Yamen, Fengchi, Fengfu and Fenglong, according to sixth edition of <Acupuncture and Moxibustion> Textbook [31]. The swallowing function of patients was assessed on the first day of treatment, after one month and after three months treatment based on two kinds of clinical bedside assessment scale, which are shown in Tables 2 and 3.

Table 2

The key data in the comparision based on Southern Manchester clinical bedside assessment scale

	Before VS after one	Before VS after three	After one VS after three
	month treatment	months treatment	months treatment
P value	4.66E-07	8.26E-08	0.041585

Table 3

The key data in the comparision based on Watian drinking experiment bedside assessment scale

	Before VS after one	Before VS after three	After one VS after three
	month treatment	months treatment	months treatment
P value	1.46E-07	6.54E-08	0.026678

Student T-test was used to compare the P value difference between the treatment group and the control group. There was no significant difference in the clinical bedside assessment score between the treatment group and the control group before treatment (P> 0.05). After one month and three months of treatment, the two clinical bedside assessment scale scores of the treatment group was significantly different from that of the control group (P< 0.05), and the therapeutic effect of the treatment group was better than that of the control group.

4.2.4Comparison and discussion

To verify the superiority and feasibility of the proposed sequential decision-making method, we compared UCT-max MCTS algorithm to three other searching algorithms (genetic algorithm (GA), greedy algorithm and traditional MCTS algorithm) on the convergence, complexity and evaluation rate respectively.

First, we introduce the initialization of GA and greedy algorithm based on hybrid knowledge evaluation module.

• Genetic algorithm initialization: The fitness function of genetic algorithm is defined according to Eqs (4) and (5). We then set the initial population to 300, the mutation rate to 0.001, the crossover rate to 0.8, the number of iterations to 400, and the selection rate to 0.3.
• Greedy algorithm initialization: In the current hybrid knowledge module, select the node with the largest evaluation value, and iterate five times.

1) Convergence

• The genetic algorithm is more dependent on the initial values of the parameters. Based on 50 experiments, we found that if the initial value is not optimal, it is easy to fall into local extremum, resulting in poor convergence.
• Because the greedy rules are fixed and the hybrid knowledge module is fixed, the final result of the sequential decision algorithm based on greedy algorithm is always [1, 2, 52, 3, 4]. The corresponding acupoints are Lianquan, Fengchi, Throat Wall, Qifeng, Jinjin. The convergence is always achieved.

2) Algorithms’ complexity

• The complexity of traditional and the proposed UCT-max MCTS algorithms are related to tree width, search depth and final sequence length. The initial values of parameters are presented as follows: tree search width is 50, tree search depth is 2, and the final sequence length is 5. Thus, the time complexity is 50×2 5 = 12500 × O (1), where O (1) is an operation constant.
• The complexity of GA is related to the size of population and the number of iterations. The initial values of parameters are presented as follows: the population is 300, the number of iterations is 400. Thus, the time complexity is 300 × 400 = 120000 × O (1), where O (1) is an operation constant.
• The complexity of greedy algorithm is related to the number of acupoints and the number of cycles. The initial values of parameters are presented as follows: the number of acupoints in this paper is 95, the number of cycles is 5. Thus, the time complexity is 500 × O (1), in which O (1) is an operation constant.

3) Evaluation rate

Definition 4 (Evaluation Rate): Evaluation Rate is the degree of similarity that the algorithm converges to the global optimal solution.

The evaluation rate formula is given as follows:

(8)

Evaluate_rate=(algorithm_val×𝑡𝑖𝑚𝑒𝑠)/(max⁡_⁢𝑣𝑎𝑙×𝑡𝑖𝑚𝑒𝑠)

Where, for a given complete sequence, algorithm_val is the sequence evaluation value obtained by the algorithm each time; max_val is the maximum of all the sequence evaluation values. For this problem, [10, 9, 52, 2, 1] is the sequence with the largest known evaluation value, which is 0.2514; times is the number of experiments.

Based on the above discussion, Table 4 gives a searching performance of four different sequential decision-making methods.

Table 4

The searching performance of four different sequential decision-making methods

Searching method	Algorithmic		Scores of	Evaluation	Scores of	Evaluation
	complexity		30 times	rate	100 times	rate
			experiments	(30 times)	experiments	(100 times)
UCT-max MCTS method based on the hybrid	12500	× O (1)	6.4128	85.25%	21.6229	86.01%
knowledge
Traditional MCTS method	12500	× O (1)	5.1140	67.81%	17.1304	68.14%
Genetic algorithm based on the hybrid knowledge	120000	× O (1)	3.4345	45.54%	11.4665	48.51%
Greedy algorithm based on the hybrid knowledge	500	× O (1)	6.3748	84.52%	21.249	84.52%

4) Discussion

From the Table 3, we can see the performances of the proposed UCT-max MCTS algorithm, traditional MCTS algorithm, GA and greedy algorithm in the acupoints’ ranking sequential problem, which are shown as follows:

• The traditional MCTS method, due to the lack of prior knowledge for search guidance, has a low “evaluation rate” score for obtaining the optimal solution, which is not suitable for solving the optimal value in real-time.
• The GA is based on hybrid knowledge, however, its convergence is limited to the initial values of the parameters and is unstable. Moreover, genetic algorithm is prone to fall into local extreme value.
• The greedy algorithm works well, however, it is limited to returning the only result. For the combination nodes of two or three, the greedy algorithm cannot achieve good results (evaluation rate is less than the propose method).

Moreover, genetic algorithm and greedy algorithm cannot be well applied in practical interaction. If the complexity of “feedback model” is considered later, it is difficult to be compatible. The UCT-max algorithm based on hybrid knowledge proposed in this paper effectively avoids the shortcomings of the above three algorithms. In addition, without increasing the complexity, the proposed UCT-max algorithm can obtain a high evaluation rate, which means that the method can cover more possible decision sequences effectively, along with avoid falling into local extreme value early. However, the proposed UCT-max method needs a clear sequential segment evaluation and a feedback mechanism of searching results. As far as the concerned work of this paper, experts are required to participate in the interactive evaluation to give the hybrid knowledge. Therefore, in the future work, we will focus on how to learn the evaluation preferences of experts, so as to enhance the intelligence level of the overall decision-making process.

Finally, compared to the standard treatment scheme, the superiority of acupuncture treatment scheme based on the proposed algorithm in this paper was validated by medical experts. This work can help establish the standardized the post-stroke dysphagia treatment protocols, and educate the junior Traditional Chinese Physicians.

5.Conclusions

To solve the problem of sequential decision-making based on professional knowledge and expertise, we first proposed an integration approach for acquiring hybrid knowledge based on deterministic and experiential group decision knowledge. In order to solve the problem of decision sequence evaluation, we then gave the definition of sub-sequence fragments and decision sequence segmentation, and introduced an evaluation method of complete sequence based on the priority of sub-sequence fragments.

Additionally, based on the above complete sequence evaluation methods, we proposed an UCT-max MCTS sequential decision-making method with hybrid knowledge with detailed steps. This method can guide the search direction, and enable the computer to automatically solve the experts engaged sequential decision-making problem with unclear objective function. Finally, this work was applied to determine the optimal multiple acupoints sequence in treating dysphagia among post-stroke patients, which demonstrates its clinical significance in establishing the standard treatment protocols and educating junior Traditional Chinese Physicians.

To verify validity of the method proposed in this paper, a case study of acupoints sequences ranking problem for dysphagia treatment in post-stroke patients was discussed. Different from the traditional set division optimization of alliance structure, “treatment sequence of multiple acupoints” was defined as a vector containing the priority gain of sub-nodes generated by a set of knowledge-constrained nodes. Then, the UCT-max MCTS sequence decision-making method based on hybrid knowledge is used to obtain the optimal treatment sequence. Compared to the traditional MCTS method, the genetic algorithm and the greedy algorithm, the advantages of the proposed method are illustrated in terms of algorithm complexity and convergence features.

Nonetheless, this work has some limitations. First, the sequential decision-making proposed in this paper is based on off-line hybrid knowledge construction. Further research is needed to study a dynamic sequential decision-making method for on-line updating knowledge and learning the evaluation preferences of experts. Second, the evaluation of treatment effects of acupuncture in this paper is mainly based on the simple mathematical description of doctor’s evaluation and patient’s subjective feelings, and the scientificity of curative effect evaluation needs to be improved. Third, the group knowledge derived from experts’ group discussion only included up to three dimensional sub-sequence fragments. Further research is needed to construct more comprehensive hybrid knowledge. Last, the method proposed is limited to solving single intervention problem. More work is needed for searching and evaluation of complex therapeutic sequences with multi-intervention (e.g., physical therapy, massage, ice stimulation, etc.). Depend on how we conduct the case study, we may need to address the limitation in clinical study design where we only use the “Oral phase” cases. In the future, we may introduce the “laryngeal phase” and “esophageal phase” cases to verify our methods.

Acknowledgments

This study was supported by the National Natural Science Foundation of China (61603023) and “2017 Beijing Excellent Talents Training Fund – The study on the Assisted Diagnostic System of Swallowing Disorder among Post-Stroke”.

Conflict of interest

The authors have no conflict of interest to report.

References

[1]	Arnold T, Wooders MH. Dynamic club formation with coordination. Journal of Dynamics & Games, (2017) ; 2: (3/4): 341-361.
[2]	Zhang L. Artificial Neural Network Model Design and Topology Analysis for FPGA Implementation of Lorenz Chaotic Generator. Electrical and Computer Engineering, (2017) ; 1-4.
[3]	Chen M, Zhou R, Zhang R. Application of Artificial Neural Network to Failure Diagnosis on Process Industry Equipments. In: International Conference on Natural Computation, (2010) ; 3: pp. 1190-1193.
[4]	Kullawan K, Bratvold RB. Sequential geosteering decisions for optimization of real-time well placement. Journal of Petroleum Science and Engineering, (2018) ; 165: : 90-104.
[5]	Tan S, Mavrovouniotis ML. Reducing data dimensionality through optimizing neural network inputs. Aiche Journal, (1995) ; 41: (6): 1471-1480.
[6]	Huang WT, Chen PS, Liu JJ, Chen YR, Chen YH. Dynamic configuration scheduling problem for stochastic medical resources. Journal of Biomedical Informatics, (2018) ; 4: (80): 96-105.
[7]	Chen PS, Lin YJ, Peng NC. A two-stage method to determine the allocation and scheduling of medical staff in uncertain environments. Pergamon Press, Inc, (2016) ; 99: (C): 174-188.
[8]	Hai H. Acupuncture: Theories and evidence. World Scientific. (2013) .
[9]	Ueda S, Iwasaki A, Conitzer V, et al. Coalition structure generation in cooperative games with compact representations. Autonomous Agents and Multi-Agent Systems, (2018) ; 32: (4): 503-533.
[10]	Zhang G, Su Z, Li M, et al. A Task-Oriented Heuristic for Repairing Infeasible Solutions to Overlapping Coalition Structure Generation. IEEE Transactions on Systems, Man, and Cybernetics: Systems, (2017) ; 1-17.
[11]	Michalak T, Rahwan T, Elkind E, Wooldridge M, Jennings NR. A hybrid exact algorithm for complete set partitioning. Artificial Intelligence, (2016) ; 230: (C): 14-50.
[12]	Wilson RD, Howe EC. A cost-effectiveness analysis of screening methods for dysphagia after stroke. PM&R, (2012) ; 4: (4): 273-282.
[13]	Huang W. The study of literature and clinical research on acupuncture treatment combined with rehabilitation training of dysphagia after stroke. Guangzhou University of Chinese Medicine, (2016) ; 4: : 1-86.
[14]	Smithard DG, O’Neill PA, Park C, Morris J, Wyatt R, et al. Complications and outcome after acute stroke does dysphagia matter? Stroke, (1996) ; 27: (7): 1200-1203.
[15]	Smithard DG, O’Neill PA, England RE, Park CL, Wyatt R, et al. The natural history of dysphagia following a stroke. Dysphagia, (1997) ; 12: (4): 188-193.
[16]	Smithard DG, O’Neill PA, Martin DF, England R. Aspiration following stroke: is it related to the side of the stroke? Clinical Rehabilitation, (1997) ; 11: (1): 73-76.
[17]	Ohmoto Y, Kataoka M, Nishida T. Extended methods to dynamically estimate emphasizing points for group decision-making and their evaluation. Procedia – Social and Behavioral Sciences, (2013) ; 97: : 147-155.
[18]	Meng F, Zhang Q. Induced continuous Choquet integral operators and their application to group decision making. Pergamon Press, Inc, (2014) ; 68: (1): 42-53.
[19]	Bosansky B, Lisy V, Lanctot M, Cermak J, Winands MHM. Algorithms for computing strategies in two player simultaneous move games. Artificial Intelligence, (2016) ; 237: : 1-40.
[20]	Rylan T, Conway A, Evan W, Sangaline B. A monte carlo simulation approach for quantitatively evaluating keyboard layouts for gesture input. International Journal of Human-Computer Studies, (2016) ; 99: : 1-22.
[21]	Kato H, Takaya M, Yamamura A. Analysis of a monte carlo tree search in knight-amazons. Procedia Computer Science, (2015) ; 62: : 31-38.
[22]	Powley EJ, Cowling PI, Whitehouse D. Information capture and reuse strategies in Monte Carlo Tree Search, with applications to games of hidden information. Artificial Intelligence, (2014) ; 217: (C): 92-116.
[23]	Gelly S, Wang Y. Exploration exploitation in go. UCT for Monte-Carlo go, (2006) ; 1-8.
[24]	Guillaume M, Chaslot JB, Winands MHM, Van Den Herik HJ, Uiterwijk JWHM, Bouzy B. Progressive strategies for Monte-Carlo tree search. New Mathematics and Natural Computation, (2008) ; 4: (3): 343-357.
[25]	Meng JC, Wang XH. Huang Di Nei Jing Su Wen. Shanghai Scientific & Technical Publishers, (2009) .
[26]	Wang M, Sun M, Mao J. Huang Di Nei Jing: Acupuncture treatment holistic view. Journal of Basic Chinese Medicine, (2010) ; 5: : 417-419.
[27]	Wilson RD, Howe EC. A cost-effectiveness analysis of screening methods for dysphagia after stroke. PM&R, (2012) ; 4: (4): 273-282.
[28]	Huang W. The study of literature and clinical research on acupuncture treatment combined with rehabilitation training of dysphagia after stroke. Guangzhou University of Chinese Medicine, (2016) ; 4: : 1-86.
[29]	Browne CB, Powley E, Whitehouse D, Lucas SM, Cowling PI, Samothrakis S. A survey of monte carlo tree search methods. IEEE Transactions on Computational Intelligence and AI in Games, (2012) ; 4: (1): 1-43.
[30]	Graf T, Platzner M. Adaptive playouts for online learning of policies during Monte Carlo Tree Search. Elsevier Science Publishers Ltd, (2016) ; 644: : 53-62.
[31]	Sun G. Acupuncture and Moxibustion Science. Shanghai Science and Technology Publisher, (1998) .

Appendices

Appendix 1: Group decision making method based on fuzzy complementary matrix

Step1: Gather group’s judgment preferences

Step2: Calculate the alternative’s superior degree index

Step3: Calculate the group consistency deviation index

Step4: Yield the ranking result of the ith alternative

Appendix 2: Effective acupoint information for evidence-based medicine in the treatment of dysphagia

Frequency of acupoints stimulated in post-stroke dysphagia treatment (partial).
Acupoint ID		Acupoint name	Frequency of the acupoint stimilated	Percentage (%)
	a1	Lianquan	221	11.16
	a2	Fengchi	219	11.06
	a3	Yifeng	131	6.61
	a4	Jinjin	112	5.65
	a5	Yuye	111	5.60
	a6	Wangu	82	4.14
	a7	Neiguan	76	3.84
	a8	Fengfu	71	3.58
	a9	Renzhong	61	3.08
	a10	Renying	58	2.93
	a11	Sanyinjiao	53	2.68
	a12	Hegu	53	2.68
	a13	Panglianquan	49	2.47
	…	…	…	…
	a95	Shenque	1	0.0005

Appendix 3: Priority of “multi-dimensional” acupoint sequence fragments given by experts based on group decision-making method

Acupoint	Two dimensional	Three-dimensional	Four-dimensional	Normalization	Maxinmum
sequence	evaluation value	evaluation value	evaluation value
a1a10	0.7771			0.419620897	0.41632662
a10a1	0.7508			0.405419341
a1a9	0.7245			0.391217784
ala2	0.6958			0.375720268
a2a1	0.6742			0.364056632
a2a10	0.6616			0.357252845
a10a9	0.6089			0.328795733
a10a2	0.6068			0.327661769
a1a11	0.5666			0.305954446
a10a11	0.5403			0.29175289
a9a1	0.5139			0.277497335
a9a10	0.4876			0.263295778
a2a9	0.4597			0.248230249
a2a11	0.4382			0.236620611

Acupoint	Two dimensional	Three-dimensional	Four-dimensional	Normalization	Maxinmum
sequence	evaluation value	evaluation value	evaluation value
a9a2	0.4087			0.220691109
a11a10	0.3824			0.206489552
a11a1	0.3561			0.192287996
a11a2	0.3274			0.17679048
a9a11	0.2911			0.157189092
a11a9	0.2968			0.160266996
a1a10a9		0.7713		0.188802	0.18880223
a1a10a2		0.7584		0.185644285
a2a1a10		0.7158		0.175216481
a10a1a9		0.7103		0.173870168
a9a1a10		0.6584		0.161165872
a1a9a2		0.6476		0.158522203
a10a9a2		0.6358		0.15563375
a9a10a1		0.5842		0.143002889
a9a1a2		0.57		0.139526954
a2a10a1		0.5482		0.13419066
a1a2a10		0.5139		0.125794565
a1a9a10		0.4876		0.119356742
a2a1a9		0.4613		0.112918919
a10a9a1		0.4318		0.105697788
a2a9a1		0.3911		0.095735074
a10a2a9		0.4079		0.099847447
a1a10a11		0.3613		0.088440506
a10a1a11		0.3297		0.080705328
a9a1a11		0.3034		0.074267505
a9a11a1		0.2771		0.067829683
a1a10a9a2			0.7776	0.148610784	0.14830501
a9a2a1a10			0.7321	0.139915065
a2a1a10a9			0.6867	0.131238458
a9a1a10a2			0.6368	0.121701835
a2a9a1a10			0.5998	0.114630591
a1a10a9a11			0.5708	0.109088265
a2a1a10a11			0.4853	0.092747959
a1a10a2a11			0.485	0.092690625
a2a11a1a10			0.4139	0.07910237
a1a9a2a11			0.3635	0.06947019
a2a9a1a11			0.3333	0.063698526
a9a10a2a11			0.2776	0.053053438