Parkinsonian Hand or Clinician’s Eye? Finger Tap Bradykinesia Interrater Reliability for 21 Movement Disorder Experts
Abstract
Background:
Bradykinesia is considered the fundamental motor feature of Parkinson’s disease (PD). It is central to diagnosis, monitoring, and research outcomes. However, as a clinical sign determined purely by visual judgement, the reliability of humans to detect and measure bradykinesia remains unclear.
Objective:
To establish interrater reliability for expert neurologists assessing bradykinesia during the finger tapping test, without cues from additional examination or history.
Methods:
21 movement disorder neurologists rated finger tapping bradykinesia, by Unified Parkinson’s Disease Rating Scale (MDS-UPDRS) and Modified Bradykinesia Rating Scale (MBRS), in 133 videos of hands: 73 from 39 people with idiopathic PD, 60 from 30 healthy controls. Each neurologist rated 30 randomly-selected videos. 19 neurologists were also asked to judge whether the hand was PD or control. We calculated intraclass correlation coefficients (ICC) for absolute agreement and consistency of MDS-UPDRS ratings, using standard linear and cumulative linked mixed models.
Results:
There was only moderate agreement for finger tapping MDS-UPDRS between neurologists, ICC 0.53 (standard linear model) and 0.65 (cumulative linked mixed model). Among control videos, 53% were rated > 0 by MDS-UPDRS, and 24% were rated as bradykinesia by MBRS subscore combination. Neurologists correctly identified PD/control status in 70% of videos, without strictly following bradykinesia presence/absence.
Conclusion:
Even experts show considerable disagreement about the level of bradykinesia on finger tapping, and frequently see bradykinesia in the hands of those without neurological disease. Bradykinesia is to some extent a phenomenon in the eye of the clinician rather than simply the hand of the person with PD.
INTRODUCTION
Parkinson’s disease (PD) is a clinical diagnosis, and at the centre of this is the presence of bradykinesia: “slowness of movement AND decrement in amplitude or speed (or progressive hesitations/halts) as movements are continued” [1].
The International Parkinson and Movement Disorder Society (MDS) criteria for PD diagnosis begins with a requirement for ‘parkinsonism’, defined as bradykinesia, in combination with either rest tremor, rigidity, or both [1]. Thus, bradykinesia is the sine qua non of PD. In addition, assessment of bradykinesia severity is central to measuring disease progression, response to treatment, and research outcomes. Despite this fundamental importance, the gold standard test for bradykinesia is a visual judgement made through the eye of an expert clinician [1, 2].
One of the most common methods to ascertain the presence and severity of bradykinesia in clinical practice is finger tapping, whereby an expert observes the patient repeatedly tapping their index finger against thumb “as quickly and as big as possible” [2]. This finger tapping test is part of the standard clinical rating scale: the (1987) Unified Parkinson’s Disease Rating Scale (UPDRS) [3, 4], and its (2008) Movement Disorder Society revision (MDS-UPDRS) [2]. In that scale, three elements of finger tapping bradykinesia, speed, amplitude, and rhythm-are assessed into a composite score between 0 and 4 (Table 1). An MDS-UPDRS finger tapping score above 0 does not necessarily mean bradykinesia is present, since any single element of bradykinesia in isolation will raise the MDS-UPDRS score above 0, without meeting the definition of bradykinesia. The MDS-UPDRS finger tapping score thus measures severity, but not necessarily presence, of bradykinesia. However, in contrast, an alternative rating scale, the 2007 Modified Bradykinesia Rating Scale (MBRS), rates each bradykinesia component separately, and includes a finger tapping item (Table 1) [5, 6]. Thus, subscores from the MBRS can also indicate the presence of bradykinesia, in addition to the severity.
Table 1
MDS-UPDRS Item 3.4 Finger Tapping: | |||
Score | Criteria | ||
0: Normal | No problems | ||
1: Slight | Any of the following: a) the regular rhythm is broken with one or two interruptions or hesitations of the tapping movement; b) slight slowing; c) the amplitude decrements near the end of the 10 taps. | ||
2: Mild | Any of the following: a) 3 to 5 interruptions during tapping; b) mild slowing; c) the amplitude decrements midway in the 10-tap sequence. | ||
3: Moderate | Any of the following: a) more than 5 interruptions during tapping or at least one longer arrest (freeze) in ongoing movement; b) moderate slowing; c) the amplitude decrements starting after the 1st tap. | ||
4: Severe | Cannot or can only barely perform the task because of slowing, interruptions or decrements. | ||
Modified Bradykinesia Rating Scale (MBRS): | |||
Score | Speed | Amplitude | Rhythm |
0 | Normal | Normal | Regular, no arrests or pauses in ongoing movement |
1 | Mild slowing | Mild reduction in amplitude in later performance, most movements close to normal | Mild impairment, up to two brief arrests / 10 seconds, none lasting > 1 second |
2 | Moderate slowing | Moderate reduction in amplitude visible early in performance but continues to maintain 50% amplitude through most of the task | Moderate, 3 to 4 arrests / 10 seconds; or 1 or 2 lasting > 1 second |
3 | Severe slowing | Severe, less than 50% amplitude through most of the task | Severe, 5 or more arrests / 10 seconds; or more than 2 lasting > 1 second |
4 | Can barely perform the task | Can barely perform the task | Can barely perform the task |
The upper half of the table shows Item 3.4, finger tapping, from the Movement Disorders Society sponsored revision of the Unified Parkinson’s Disease Rating Scale (MDS-UPDRS) [2]. The lower half of the table shows the Modified Bradykinesia Rating Scale (MBRS) [5, 6]. Each hand is tested separately. The patient is instructed to tap the index finger on the thumb 10 times (MDS-UPDRS) or for 10 seconds (MBRS) “as quickly AND as big as possible”.
Visual judgement as the gold standard evaluation for bradykinesia is problematic. Human assessment of movement is imprecise, with frequent disagreement among observers [7]. Bradykinesia is a complex, heterogeneous clinical sign that is difficult to gauge accurately. This means that subtle changes of parkinsonism are difficult to measure, blunting the accuracy of clinical decisions, both for diagnosis and monitoring, and research outcomes.
It is noteworthy that a robust estimate of interrater reliability for finger tapping bradykinesia has not been published. There are several reasons for this. First, almost all studies have used very few (between 2 and 5) raters [4–6, 8–10], which is likely to be too few to assess the range of variability in clinician judgements. Second, most studies involved clinical raters applying the entire UPDRS motor examination to each participant [4, 8–12], thus providing additional clinical information that influences the rater’s judgement for any specific aspect of the examination. Henderson et al. [11] previously demonstrated this effect, showing that there was greater variation in rater scores when finger tapping was assessed in isolation (Kendall’s W 0.5–0.6), rather than alongside other clinical assessments (Kendall’s W > 0.8). Third, most studies involved only people with PD, without any healthy control participants [4–6, 10–13]. This artificially avoids the difficult but important distinction between subtle bradykinesia and normal older age movement. Fourth, in some studies, PD medications are withheld prior to rating, thus exaggerating bradykinesia and making differences larger and therefore easier to detect [5, 6, 10, 11]. Fifth, only one [13] interrater reliability study has used the current MDS-UPDRS and only two have used MBRS [5, 6]. All other previous studies of interrater reliability for bradykinesia have used the older (now obsolete) version of UPDRS [4, 5, 8–12,14], which has substantial differences in how the grades of bradykinesia severity are defined.
These methodological problems mean that we still do not know how well neurologists agree on such a central clinical sign, and the published figures for interrater reliability can vary widely for finger tapping bradykinesia. Cohen’s κ of –0.07 (poor agreement or no agreement) [9, 15], κ of 0.47 (fair agreement) [4, 15], Kendall’s W of 0.87 (almost perfect agreement) [12, 15] have all been reported. We aim to address this by comparing 21 expert neurologists’ bradykinesia ratings for finger tapping when no other information is given, in people with PD and also in people without a neurological diagnosis, with a statistical method appropriate for ordinal rating data.
MATERIALS AND METHODS
The study was approved by the North of Scotland Research Ethics Committee, United Kingdom Health Research Authority (IRAS project ID 256116). Informed, written consent was obtained from all study participants.
Finger tapping video
Informed written consent for participation was obtained from 39 people with idiopathic PD and 30 controls without a neurological diagnosis. All PD participants had previously been diagnosed by a movement disorder specialist neurologist at Leeds Teaching Hospitals NHS Trust, United Kingdom, according to Movement Disorder Society clinical diagnostic criteria [1]. PD participants were subjectively and objectively in the ‘on’ state at the time of participation (no medications were withheld). One investigator, SW, graded Hoehn and Yahr stage for each participant, and also later scored the presence/absence of visible tremor in each video (but did not score any video for bradykinesia). Healthy controls were recruited from the companions of patients and hospital/university staff. They had no history of PD or other neurological diagnosis and were not taking any medication that could induce parkinsonism. None of the control participants had any visible tremor.
Participants rested their elbow on a chair arm with the forearm lifted at 45° (this helped to keep the tapping hand within the view of the camera). In accordance with MDS-UPDRS instructions, each participant was instructed to tap their index finger and thumb together “as quickly and as big as possible” with each hand examined separately. The participants tapped for just over 10 seconds, because the MDS-UPDRS specifies 10 taps while the MBRS specifies 10 seconds [2, 6].
We recorded videos of each hand during the task using a standard smartphone (iPhone SE) placed on a tripod (60 frames per second, 1920x1080 px) under ambient lighting. Only the hand and part of the forearm were within the video frame. The distance from camera to hand was approximately 1 m, and digits 1 and 2 were closest to the camera.
One video was discarded because the hand moved outside the video frame, making 137 videos: 77 Parkinson’s disease hands and 60 control hands. Each video was edited to contain 1 second prior to tapping onset and 10 seconds of finger tapping.
Clinical rating
We invited 21 consultant neurologists that specialise in movement disorders, from a range of clinics in the United Kingdom, to each rate 30 videos of finger tapping. Python [16] was used to select 30 random videos for each neurologist from the total set of 137 videos — for each clinician the list of all videos was shuffled randomly and the first 30 used. Each video was rated according to the MDS-UPDRS Item 3.4 Finger Tapping [2] (first ten taps) and the MBRS [6] (the full 10 seconds of tapping) (Table 1). The neurologists undertook the task independently, at separate locations, on their own computer screen, and were blinded to both PD/control status and to each other’s scores.
Inspired by informal comments made by the first two raters, we added an additional question for the subsequent 19 neurologists, asking them to judge whether the hand was most likely to be from a control or PD participant. This was in recognition that an experienced clinician may form an overall, subjective impression about whether the tapping appears parkinsonian or not, that is not necessarily strictly based on bradykinesia criteria.
Outcomes
The primary outcome is the interrater reliability for MDS-UPDRS finger tapping scores, reported as the intraclass correlation coefficient (ICC), which was the basis of the statistical power calculations. The secondary outcomes are: correlation coefficients describing the relationship between MDS-UPDRS score and each of the three MBRS score components, the proportions of healthy controls rated as bradykinesia by MBRS sub-score combination, and the accuracy of clinicians in judging PD from controls.
Statistical analysis
Interrater reliability reflects the variation between more than one rater measuring the same group of participants [17]. We report ICCs for both absolute agreement and consistency. Absolute agreement concerns the degree to which one rater’s score (x) is exactly equal to another’s (y), whereas consistency concerns the degree to which x can be related to y plus a systematic error (x + c).
For each ICC, we calculate scores using a standard linear model, which assumes the underlying normal distribution of MDS-UPDRS scores, and also a more sophisticated novel approached based upon cumulative linked mixed models (CLMMs), which is more appropriate for dealing with ordinal data. The normal distribution assumption of the first model is clearly incorrect but allows direct comparison to previous research. Both approaches are two-way random effects models, where each item is assessed by the same set of raters randomly selected from a larger population of raters. Note that, typically, a two-way random effects model would have all raters viewing all videos, which is impractical for our scenario. Our approach is equivalent to taking a random sample of this ‘ideal’ complete dataset, which gives unbiased estimates but enlarged confidence intervals.
The random effects models consist of a random effect for video number (capturing the tendency of a video to be scored higher or lower than expected), a random effect for rater number (to capture the tendency of a rater to under-/over-rate videos), a fixed effect for whether the video is of a patient or control participant to give a baseline score in each case, and an intercept term. If
(1)
Meanwhile the consistency ICC is calculated as follows.
(2)
We fit two models to the data for calculating the ICC. The first uses a normal approximation to the ordinal score as in previous work. Our second model keeps the dependent variable ordinal using a cumulative linked mixed model (CLMM), essentially fitting a latent normal model with the addition of “cut-points” which split the latent normal distribution into segments corresponding to the dependent ordinal variable [18].
While this latter CLMM readily gives the variance of the random effects for video numbers and raters, it is not initially clear how to define the residuals, which are required to calculate the ICC. In effect we need to define the optimal value in the latent space for each level that the ordinal variable can take. We took the following approach: after fitting the latent normal distribution and cut-points the optimal points were defined as the median of each segment of the normal distribution (calculated using Monte Carlo). With these points defined, the residual can be calculated using the latent value of the fitted model on each data point and the corresponding optimal values.
The study power calculation was done via simulation using the normal approximation to the ordinal variable, based on pilot data with two raters. Based on recruiting 20 raters and covering a variety of different strength ICC values, we determined that giving 30 random videos to each rater allows us to calculate the ICC to within 0.05 in 95% of trials and to within 0.03 in 80% of trials. Models were fitted using the R libraries ‘glmer’ and ‘clmm’, while power calculations were done using the Python library ‘statsmodels’.
Secondary analysis consisted of calculating the three Spearman correlation coefficients of the relationship between the median MDS-UPDRS score across all raters, with the each of the median MBRS speed score, amplitude score, and rhythm scores.
We also calculated the proportions of PD and control videos rated as bradykinesia, defined by MBRS subscore (i.e., > 0 score for both speed and at least one of amplitude or rhythm). Finally, for the clinician judgement of whether a hand showed PD or control, we undertook a post hoc analysis of age and disease duration in the correctly and incorrectly classified groups (t test).
RESULTS
Expert neurologists’ rating of finger tap bradykinesia in people with PD and controls
The age, gender and Hoehn and Yahr scores for the participants are given in Table 2. The median number of raters per video was 5 (range 1 to 12, interquartile range 3 to 7). In the random selection of 30 videos per rater, 4 videos from the total of 137 were not allocated to any rater, so that the total number of unique hand videos rated was 133. A total of 630 video ratings were made (21 raters, 30 videos each): 325 of these were ratings of PD videos, and 305 ratings of healthy control videos.
Table 2
People with PD | Healthy control participants | |
Age (Std. Dev.) y | 68 (9.6) | 59 (19.4) |
Male/Female | 47/26 | 22/38 |
Median years since diagnosis | 4 | n/a |
Median H&Y [IQR] | 2 [1, 3] | n/a |
H&Y = 1 | 32 | |
H&Y = 1.5 | 2 | |
H&Y = 2 | 12 | |
H&Y = 2.5 | 4 | |
H&Y = 3 | 19 | |
H&Y = 4 | 4 | |
H&Y = 5 | 0 | |
People with PDs | Healthy control participants | |
Impaired speed | 77% | 43% |
Impaired rhythm | 72% | 35% |
Impaired amplitude | 70% | 30% |
Impaired speed and rhythm | 62% | 19% |
Impaired speed and amplitude | 61% | 19% |
Bradykinesia (Impaired speed + impaired rhythm and/or impaired amplitude) | 64% | 24% |
Hand video characteristics are split by Parkinson’s disease hands (n = 73) and control hands (n = 60). H&Y: modified Hoehn and Yahr scale. [19] IQR: Interquartile Range.
The distribution of MDS-UPDRS finger tapping scores for PD and control videos are shown in Fig. 1. 53% of control participant videos were given an MDS-UPDRS finger tapping score greater than 0. The distribution of MBRS scores for finger tapping speed, amplitude and rhythm are shown in Fig. 2. Across both rating scales, scores of grade 1 (‘slight’ impairment by MDS-UPDRS, ‘mild’ impairment by MBRS) were similarly common in both control videos and PD videos. The proportion of videos scored grade 1 by MDS-UPDRS was 26% in PD and 34% in healthy controls, while the proportions of videos scored grade 1 for MBRS speed, amplitude, and rhythm were 40%, 22%, and 31% respectively in PD, compared with 31%, 21%, and 27% respectively in controls.
Fig. 1
Fig. 2
Bradykinesia is defined as slowness of movement AND decrement in amplitude or speed (or progressive hesitations/halts) as movements are continued [1]. Therefore, the MBRS subscores for finger tapping can be used to classify tapping as bradykinesia if a rater scores a video > 0 for speed and also > 0 for amplitude and/or rhythm. Table 2 shows the proportions of videos in PD and controls (respectively) with impaired speed, rhythm, and amplitude, as well as combinations of those deficits, and the specific combination that meets the definition of bradykinesia. Among PD videos, 77% were rated as slow, and 64% were rated as bradykinesia by MBRS (> 0 for speed and > 0 for one or more of amplitude or rhythm). Among videos of control participants, 43% were rated as slow, and 24% were rated as bradykinesia by MBRS (> 0 for speed and > 0 for one or more of amplitude or rhythm). Thus, one in four control participant hand videos were rated as bradykinesia by MBRS.
Interrater reliability for finger tapping bradykinesia
The intraclass correlation coefficient (ICC) for MDS-UPDRS rating of finger tapping bradykinesia for exact agreement was 0.53 using the normal model (‘fair’ [20] or ‘moderate’ [17]) and 0.65 using the cumulative linked mixed model (‘good’ [20] or ‘moderate’ [17]). The ICC for consistency (ratings related to each other with a systematic error) was 0.58 using the normal model (‘fair’ [20] or ‘moderate’ [17]), and 0.78 using the cumulative linked mixed model (‘good’ [20] or ‘moderate’ [17]).
To assess model discrimination for the CLMM, we investigated the predicted values with the original ratings. The CLMM predicts the correct MDS-UPDRS score with 70% accuracy and is accurate to within one point on the five-point MDS-UPDRS finger tapping scale 98% of the time.
Figure 3 shows the variation in clinical ratings. Each point is an individual clinical rating of a video: the x-axis orders the videos by CLMM random effect size, and the y-axis is the clinical MDS-UPDRS rating. The values are jittered in the y-axis for visual clarity. It demonstrates the considerable variation in movement disorder specialist judgement of individual videos, with disagreement common.
Fig. 3
Correlations between finger tapping MDS-UPDRS and individual MBRS elements
The Spearman correlation coefficients for MDS-UPDRS finger tapping scores and each of the MBRS subcomponent scores were R = 0.77 for speed, R = 0.78 for amplitude, and R = 0.68 for rhythm (Supplementary Table 1).
Neurologists’ judgement of whether finger tapping video shows a person with PD or control
The movement disorder specialists correctly judged PD or control status in 70% (400 of 570) videos. The median number of correct judgements was 20/30 (67%), with a range from 17/30 to 27/30, interquartile range 18.75 to 23.5 (out of 30).
In post-hoc analysis, the mean age of control hands misclassified was 63, while for those correctly classified it was 56 (p < 0.05). The mean age of PD hands misclassified was 69, compared with 68 for those correctly classified (p = NS). The disease duration was 3.9 years for PD hands misclassified as controls, compared with 5.3 years for PD hands correctly classified (p < 0.005).
Of those videos judged by clinicians to show a PD hand, only 77% were formally rated as showing bradykinesia by the relevant MBRS subscore combination. Of the videos that clinicians correctly identified as PD 84% were scored as bradykinesia. This lower than 100% concordance is not explained by visible tremor. Of 36 PD video ratings not rated as bradykinesia but judged to show a PD hand, only 8 had visible tremor in the video (while 9/69 PD videos rated no bradykinesia and judged to show a control had visible tremor). Among videos correctly judged to show a control hand, 5% were formally judged as showing bradykinesia. Of the correct control judgements 3% were scored as bradykinesia.
DISCUSSION
Our results demonstrate that even expert neurologists frequently disagree about the level of bradykinesia on finger tapping, despite clinical examination representing the gold standard for determining the presence and degree of bradykinesia [1, 2]. The 21 movement disorder specialists showed only ‘moderate’ agreement [17] for MDS-UPDRS finger tapping ratings (ICC = 0.53, CLMM-ICC = 0.65). Furthermore, the same movement disorder specialists classified one in four healthy control participants as showing bradykinesia on finger tapping (using MBRS sub-scores to match the definition of bradykinesia), and the proportions of participants showing slight or mild abnormalities on MDS-UPDRS and MBRS was similar in PD and control videos. This suggests that finger-tapping bradykinesia is a non-specific sign and overlaps with changes in movement associated with normal ageing, at least when mild. It is perhaps unsurprising that bradykinesia is difficult to judge. It is a heterogeneous clinical sign, and human vision cannot accurately measure and compare movement speed, amplitude, and rhythm in isolation, much less in simultaneous combination.
Our findings are particularly robust because they are based on a larger number of raters (21) and unique videos (137) than previous studies. Each neurologist rated 30 videos and the median number of raters per video was 5, but these numbers were based on statistical power calculations, and the random distribution of videos to raters mean that variation among the whole group is well characterised. Another strength of this study is the use of a cumulative linked mixed model, respecting the ordinal nature of MDS-UPDRS scores, a consideration that has been neglected in previous research. Furthermore, we not only reported MDS-UPDRS finger tap ratings, but also MBRS ratings, which separately score each of tap speed, amplitude, and rhythm. In contrast to a 2011 study, which found that clinicians weighted amplitude and rhythm more than speed in UPDRS bradykinesia scores [6], we found strong correlations for all MBRS subscores with MDS-UPDRS (0.68–0.78), with rhythm the weakest of the three, suggesting that clinicians do not favour any particular subcomponent of bradykinesia in finger tapping judgements. We also reported consistency ICCs, which were a little higher than agreement results (ICC = 0.58, CLMM-ICC = 0.73), but in a five-point scale, consistent inter-rater variation (a consistent difference between raters) is of little clinical relevance compared with absolute rater agreement. For example, two raters who disagreed by 2 or 3 points on every video would nevertheless show very high consistency ICC if that 2 or 3 point difference in ratings was consistently present (and in the same direction) across all the videos.
A previous study of a UPDRS ‘teaching tape’ supports the idea that finger tapping bradykinesia is difficult to judge [7]. 226 raters were tested in their UPDRS motor scores for 4 people with PD (using video recordings). A ‘pass’ in this test was defined as a score within the 95% confidence interval of 3 international PD experts for each case. Only 54.6% of raters ‘passed’ the 4 cases, and of those that ‘failed’ first time, 70.6% failed finger tapping rating.
Previous studies of finger tapping interrater reliability by UPDRS grading have reported Kendall’s W 0.84 and 0.87 [12], weighted κ of 0.53 to 0.71 [8], 0.72 to 0.86 [10], κ of 0.47, 0.44, –0.07 [4, 9], and Kendall’s τ of 0.88 and 0.84 [13], while MBRS raters showed Pearson correlations of 0.51, 0.77, and 0.69 respectively [6]. It is difficult to draw conclusions from those results because of methodological limitations that include low numbers of raters and/or people with PD [4–6, 8–10, 13] (including non-overlapping subsets of raters) [12], the absence of ‘healthy control’ participants [4–6, 10–13], participants examined ‘off’ their usual medication [5, 6, 10, 11, 13], statistical methods inappropriate for ordinal data [6], measures of simple correlation rather than agreement [13], and raters gaining additional information from the entire UPDRS or UPDRS motor exam [4, 8–12].
It could perhaps be argued that the influence of a broader UPDRS assessment upon finger tapping scores is appropriate, reflecting clinical practice, in which finger tapping would never be tested in isolation. However, busy routine clinics do not involve enough time for the complete UPDRS (a “vast instrument” [11]). Furthermore, limb bradykinesia must be documented to establish a PD diagnosis, although bradykinesia also occurs in the face, voice, and axial/gait domains [1]. In addition, UPDRS bradykinesia items are commonly analysed as a standalone ‘bradykinesia’ endpoint in trials [6], or used as a gold standard for demonstrating that technological devices ‘quantify’ bradykinesia [6, 21–55]. Most fundamentally, finger tapping bradykinesia is presented in the literature as a measure of a specific phenomenon with a specific definition. It is not defined as a surrogate for an overall impression. If the latter is to some extent true, then it becomes less clear exactly what bradykinesia actually is [56, 57], and less clear that movement disorder specialists are able to define and measure this “cardinal manifestation” [1] of PD.
In our results, one in four control videos were rated as showing finger tapping bradykinesia (using MBRS subscores). This is consistent with a previous study in which three trained nurses and one movement disorder specialist rated older people with no clinical PD, using a modified UPDRS motor score [8]. They gave 74 out of 75 participants a score greater than 0 (mean score 13.4 out of 127). Of course, the MDS diagnostic criteria for PD are not based on bradykinesia alone, and instead require a combination of clinical features to be present or absent to diagnose PD [1]. However, to some extent this only amplifies the challenge for clinician reliability, because other clinical features such as tremor are also non-specific, and there is considerable evidence that the overall diagnostic assessment of PD is difficult, with less-than-ideal sensitivity and specificity. This includes misdiagnosis rates of PD versus Essential tremor of one in three [58], as well high false positive (17.4–26.1%) and false negative (6.7–20%) rates for the diagnosis of PD based on video examinations of people with tremor [59]. One meta-analysis suggests that diagnostic accuracy for PD is only around 80% for movement disorder experts (at the first assessment) [60].
We asked the clinicians to judge whether the hand in the video was most likely to be that of a person with PD or a control. Of those videos guessed to show the tapping of a person with PD, only 77% were also judged to show bradykinesia by the appropriate combination of MBRS subscores. In other words, the movement disorders specialists’ overall perception of PD or control was not strictly related to the presence or absence of bradykinesia by MBRS subscore combination. This suggests the possibility that clinicians are forming an overall impression of finger tapping that does not purely follow the formal definition of bradykinesia: a gestalt perception or intuitive pattern recognition of finger tapping normality/abnormality beyond the presence or absence of bradykinesia as defined by formal criteria [61–63]. In support of this idea, a clinicopathological study found that experienced movement disorder specialists showed a higher accuracy than claimed for most clinical diagnostic criteria, for the diagnostic distinction of different forms of parkinsonism. The authors state that these experts, “may be using a method of pattern recognition for diagnosis that goes beyond any formal set of diagnostic criteria” [64].
The study has some limitations. The mean age of control participants (59) was younger than that of PD participants (68). However, we do not consider this to be a major weakness in our study, because we would not expect a younger age control group to show higher levels of bradykinesia or parkinsonian appearance. Impaired movement in controls is greater at older ages [65]. We have not reported non-neurological comorbidities in control participants that might affect movement, such as osteoarthritis. However, while such conditions could potentially cause slowness of movement, we would not expect them to cause bradykinesia specifically.
It is possible that some of the control participants could have been in incipient stages of a neurodegenerative condition, such as prodromal PD. Our protocol did not specifically assess for that. However, it is unlikely that incipient disease represents a major confounder. We found 24% of controls rated as bradykinesia by MBRS subscore combination, in a group with a mean age of 59. Given that the lifetime risk for PD diagnosis in the UK is 2.7%, our control results cannot be explained by incipient or prodromal disease [66].
The majority of PD participants were Hoehn and Yahr stage 1, with unilateral symptoms only. Thus, many of the PD hands will be from an unaffected side, which would make it difficult to correctly judge those hands as PD. Our results for clinician judgement of hands as PD or non-PD are limited by this. However, we would not expect this to affect our main finding of interrater reliability. If bradykinesia can be reliably seen, then unaffected hands should receive similar (presumably low) MDS-UPDRS ratings across raters. The sample size is not large enough to allow a stratified analysis of rater agreement at different Hoehn and Yahr scores.
The PD participants were ‘ON’ their usual medication (no medication was with-held), and it could be argued this would make bradykinesia harder to see than at the original diagnostic assessment, pre-medication. However, in clinical practice, patients are either assessed early in the disease course, before medication has been started, or later in the disease course, ‘ON’ the medication that they usually take for PD. The ‘OFF’ state (medication with-held) is thus a very unusual state, almost never encountered in clinical practice. The progressive nature of PD means with-holding medication cannot be assumed to be equivalent to early disease before medication is started. ‘OFF’ is likely to involve more obvious, developed parkinsonism. We therefore chose not to study a state of more obvious parkinsonism that is rarely encountered in clinical practice.
It is possible that additional rater training would have improved inter-rater reliability, but this suggests a situation in which detection and assessment of PD is a fragile and difficult process, which is not how the process is usually described in the literature [1]. Furthermore, extra training would be unlikely to create excellent reliability because it is inherently difficult to accurately judge movement by eye. Bradykinesia is a complex and heterogenous clinical sign (requiring simultaneous judgement and integration of speed, amplitude, and hesitations/halts), which places fundamental limits on training.
It could be argued that the MDS-UPDRS is still currently the best available approach to evaluate parkinsonian signs, including bradykinesia, despite the limitations we have demonstrated. However, our results suggest a need to develop new, more reliable measures of the movement impairment caused by PD. Perhaps the principles of those measures could be based upon new patterns derived from machine learning, rather than the current definition of bradykinesia. This might allow closer approximation to any intuitive pattern recognition method that expert clinicians currently employ.
In conclusion, a classic sign of a cardinal clinical feature of a common neurological disease—finger tapping bradykinesia—is not easy to reliably see, even for expert eyes. Our findings suggest that bradykinesia is to some extent a phenomenon present in the eye of the clinician rather than simply in the hand of the person with PD.
ACKNOWLEDGMENTS
We would like to thank the following UK movement disorder neurologists for providing clinical rating of videos: Dr. Sundus Alusi, Dr. Michael Bonello, Dr. Stephen Butterworth, Dr. Philip Buttery, Dr. Camille Carroll, Dr. Adam Cassidy, Dr. Jeremy Cosgrove, Dr. Richard Ellis, Dr. Jonathan Evans, Dr. Paul Goldsmith, Dr. Donald Grosset, Dr. Christopher Kobylecki, Dr. Alistair Lansbury, Dr. Peter Moore, Dr. Rachel Newby, Dr. Edward Newman, Dr. Gillian Sare, Dr. Monty Silverdale, Dr. Naomi Warren, Dr. Louise Wiblin, Dr. Caroline Williams-Gray.
CONFLICT OF INTEREST
The authors have no conflict of interest to report.
DATA AVAILABILITY
The data that support the findings of this study are available from the corresponding author, upon reasonable request.
SUPPLEMENTARY MATERIAL
[1] The supplementary material is available in the electronic version of this article: https://dx.doi.org/10.3233/JPD-223256.
REFERENCES
[1] | Postuma RB , Berg D , Stern M , Poewe W , Olanow CW , Oertel W , Obeso J , Marek K , Litvan I , Lang AE , Halliday G , Goetz CG , Gasser T , Dubois B , Chan P , Bloem BR , Adler CH , Deuschl G ((2015) ) MDS clinical diagnostic criteria for Parkinson’s disease. Mov Disord 30: , 1591–1601. |
[2] | Goetz CG , Nyenhuis D , Poewe W , Stebbins GT , Tilley BC , Lees A , Dubois B , Stern MB , Martinez-Martin P , Lang AE , Shaftman SR , Kulisevsky J , Dodel R , Sampaio C , Rascol O , Fahn S , Schrag A , van Hilten JJ , Holloway R , LaPelle N , Leurgans S , Olanow CW , Teresi JA , Jankovic J , LeWitt PA ((2008) ) Movement Disorder Society-sponsored revision of the Unified Parkinson’s Disease Rating Scale (MDS-UPDRS): Scale presentation and clinimetric testing results. Mov Disord 23: , 2129–2170. |
[3] | Fahn S , Elton R (1987) Unified Parkinson’s Disease Rating Scale. In Recent Developments in Parkinson’s Disease, vol. 2., Fahn S, Marsden C, Calne D, Goldstein M, eds. NJ. Macmillan Health Care Information, Florham Park, pp. 153-164. |
[4] | Martinez-Martin P , Gil-Nagel A , Gracia LM , Balseiro Gomez J , Martinez-Sarries J , Bermejo F , Jimenez-Rojas MC , Maranon E , Grau Veciana JM , Roig Arnall C , Bruna O , Junque Plaja C , Gimenez-Roldan S , Burguera JA ((1994) ) Unified Parkinson’s disease rating scale characteristics and structure. Mov Disord 9: , 76–83. |
[5] | Kishore A , Espay AJ , Marras C , Al-Khairalla T , Arenovich T , Asante A , Miyasaki J , Lang AE ((2007) ) Unilateral versus bilateral tasks in early asymmetric Parkinson’s disease: Differential effects on bradykinesia. Mov Disord 22: , 328–333. |
[6] | Heldman DA , Giuffrida JP , Chen R , Payne M , Mazzella F , Duker AP , Sahay A , Kim SJ , Revilla FJ , Espay AJ ((2011) ) The modified bradykinesia rating scale for Parkinson’s disease: Reliability and comparison with kinematic measures. Mov Disord 26: , 1859–1863. |
[7] | Goetz CG , Stebbins GT ((2004) ) Assuring interrater reliability for the UPDRS motor section: Utility of the UPDRS teaching tape. Mov Disord 19: , 1453–1456. |
[8] | Bennett DA , Shannon KM , Beckett LA , Goetz GG , Wilson RS ((1997) ) Metric properties of nurses’ ratings of parkinsonian signs with a modified Unified Parkinson’s Disease Rating Scale. Neurology 49: , 1580–1587. |
[9] | Camicioli R , Grossmann SJ , Spencer PS , Hudnell K , Kent Anger W ((2001) ) Discriminating mild parkinsonism: Methods for epidemiological research. Mov Disord 16: , 33–40. |
[10] | Post B , Merkus MP , de Bie RMA , de Haan RJ , Speelman JD ((2005) ) Unified Parkinson’s Disease Rating Scale motor examination: Are ratings of nurses, residents in neurology, and movement disorders specialists interchangeable? Mov Disord 20: , 1577–1584. |
[11] | Henderson L , Kennard C , Crawford TJ , Day S , Everitt BS , Goodrich S , Jones F , Park DM ((1991) ) Scales for rating motor impairment in Parkinson’s disease: Studies of reliability and convergent validity. J Neurol Neurosurg Psychiatry 54: , 18–24. |
[12] | Rabey J , Bass H , Bonuccelli U , Brooks D , Klotz P , Korczyn A , Kraus P , Martinez-Martin P , Morrish P , Van Sauten W , Van Hilten B ((1997) ) Evaluation of the Short Parkinson’s Evaluation Scale: A new friendly scale for the evaluation of Parkinson’s disease in clinical drug trials. Clin Neuropharmacol 20: , 322–337. |
[13] | Luiz LMD , Marques IA , Folador JP , Andrade AO (2021) Intra and inter-rater remote assessment of bradykinesia in Parkinson’s disease. Neurologıa (Engl Ed), doi: 10.1016/j.nrl.2021.08.005. |
[14] | Palmer JL , Coats MA , Roe CM , Hanko SM , Xiong C , Morris JC ((2010) ) Unified Parkinson’s Disease Rating Scale-Motor Exam: Inter-rater reliability of advanced practice nurse and neurologist assessments. J Adv Nurs 66: , 1382–1387. |
[15] | Landis JR , Koch GG ((1977) ) The measurement of observer agreement for categorical data. Biometrics 33: , 159. |
[16] | van Rossum G , Drake FL (2009) Python 3 Reference Manual, CreateSpace, Scotts Valley, CA. |
[17] | Koo TK , Li MY ((2016) ) A guideline of selecting and reporting intraclass correlation coefficients for reliability research. J Chiropr Med 15: , 155–163. |
[18] | Liddell TM , Kruschke JK ((2018) ) Analyzing ordinal data with metric models: What could possibly go wrong? J Exp Soc Psychol 79: , 328–348. |
[19] | Goetz CG , Poewe W , Rascol O , Sampaio C , Stebbins GT , Counsell C , Giladi N , Holloway RG , Moore CG , Wenning GK , Yahr MD , Seidl L ; Movement Disorder Society Task Force on Rating Scales for Parkinson’s Disease ((2004) ) Movement Disorder Society Task Force report on the Hoehn and Yahr staging scale: Status and recommendations. Mov Disord 19: , 1020–1028. |
[20] | Cicchetti D V ((1994) ) Guidelines, criteria, and rules of thumb for evaluating normed and standardized assessment instruments in psychology. Psychol Assess 6: , 284–290. |
[21] | Giovannoni G , van Schalkwyk J , Fritz VU , Lees AJ ((1999) ) Bradykinesia akinesia inco-ordination test (BRAIN TEST): An objective computerised assessment of upper limb motor function. J Neurol Neurosurg Psychiatry 67: , 624–629. |
[22] | Homann CN , Suppan K , Wenzel K , Giovannoni G , Ivanic G , Horner S , Ott E , Hartung HP ((2000) ) The Bradykinesia Akinesia Incoordination Test (BRAIN TEST), an objective and user-friendly means to evaluate patients with parkinsonism. Mov Disord 15: , 641–647. |
[23] | Pal PK , Lee CS , Samii A , Schulzer M , Stoessl AJ , Mak EK , Wudel J , Dobko T , Tsui JKC ((2001) ) Alternating two finger tapping with contralateral activation is an objective measure of clinical severity in Parkinson’s disease and correlates with PET [18F]-DOPA Ki. Parkinsonism Relat Disord 7: , 305–309. |
[24] | Tavares ALT , Jefferis GSXE , Koop M , Hill BC , Hastie T , Heit G , Bronte-Stewart HM ((2005) ) Quantitative measurements of alternating finger tapping in Parkinson’s disease correlate with UPDRS motor disability and reveal the improvement in fine motor control from medication and deep brain stimulation. Mov Disord 20: , 1286–1298. |
[25] | Papapetropoulos S , Jagid JR , Sengun C , Singer C , Gallo B V ((2008) ) Objective monitoring of tremor and bradykinesia during DBS surgery for Parkinson disease. Neurology 70: , 1244–1249. |
[26] | Papapetropoulos S , Katzen HL , Scanlon BK , Guevara A , Singer C , Levin BE ((2010) ) Objective quantification of neuromotor symptoms in Parkinson’s disease: Implementation of a portable, computerized measurement tool. Parkinsons Dis 2010: , 760196. |
[27] | Maetzler W , Ellerbrock M , Heger T , Sass C , Berg D , Reilmann R ((2015) ) Digitomotography in Parkinson’s disease: A cross-sectional and longitudinal study. PLoS One 10: , e0123914. |
[28] | Lee CY , Kang SJ , Hong S-K , Ma H-I , Lee U , Kim YJ ((2016) ) A validation study of a smartphone-based finger tapping application for quantitative assessment of bradykinesia in Parkinson’s disease. PLoS One 11: , e0158852. |
[29] | Kassavetis P , Saifee TA , Roussos G , Drougkas L , Kojovic M , Rothwell JC , Edwards MJ , Bhatia KP ((2016) ) Developing a tool for remote digital assessment of Parkinson’s disease. Mov Disord Clin Pract 3: , 59–64. |
[30] | Mitsi G , Mendoza EU , Wissel BD , Barbopoulou E , Dwivedi AK , Tsoulos I , Stavrakoudis A , Espay AJ , Papapetropoulos S ((2017) ) Biometric digital health technology for measuring motor function in Parkinson’s disease: Results from a Feasibility and Patient satisfaction study. Front Neurol 8: , 273. |
[31] | Lalvay L , Lara M , Mora A , Alarcón F , Fraga M , Pancorbo J , Marina JL , Mena MÁ , Lopez Sendón JL , García de Yébenes J ((2017) ) Quantitative measurement of akinesia in Parkinson’s disease. Mov Disord Clin Pract 4: , 316–322. |
[32] | Prince J , Arora S , de Vos M ((2018) ) Big data in Parkinson’s disease: Using smartphones to remotely detect longitudinal disease phenotypes. Physiol Meas 39: , 044005. |
[33] | Roalf DR , Rupert P , Mechanic-Hamilton D , Brennan L , Duda JE , Weintraub D , Trojanowski JQ , Wolk D , Moberg PJ ((2018) ) Quantitative assessment of finger tapping characteristics in mild cognitive impairment, Alzheimer’s disease, and Parkinson’s disease. J Neurol 265: , 1365–1375. |
[34] | Yokoe M , Okuno R , Hamasaki T , Kurachi Y , Akazawa K , Sakoda S ((2009) ) Opening velocity, a novel parameter, for finger tapping test in patients with Parkinson’s disease. Parkinsonism Relat Disord 15: , 440–444. |
[35] | Costa J , González HA , Valldeoriola F , Gaig C , Tolosa E , Valls-Solé J ((2010) ) Nonlinear dynamic analysis of oscillatory repetitive movements in Parkinson’s disease and essential tremor. Mov Disord 25: , 2577–2586. |
[36] | Kim J-W , Lee J-H , Kwon Y , Kim C-S , Eom G-M , Koh S-B , Kwon D-Y , Park K-W ((2011) ) Quantification of bradykinesia during clinical finger taps using a gyrosensor in patients with Parkinson’s disease. Med Biol Eng Comput 49: , 365–371. |
[37] | Stamatakis J , Ambroise J , Cremers J , Sharei H , Delvaux V , Macq B , Garraux G ((2013) ) Finger tapping clinimetric score prediction in Parkinson’s disease using low-cost accelerometers. Comput Intell Neurosci 2013: , 717853. |
[38] | Heldman DA , Espay AJ , LeWitt PA , Giuffrida JP ((2014) ) Clinician versus machine: Reliability and responsiveness of motor endpoints in Parkinson’s disease. Parkinsonism Relat Disord 20: , 590–595. |
[39] | Lee MJ , Kim SL , Lyoo CH , Lee MS ((2014) ) Kinematic analysis in patients with Parkinson’s disease and SWEDD. J Parkinsons Dis 4: , 421–430. |
[40] | Kim J-W , Kwon Y , Yun J-S , Heo J-H , Eom G-M , Tack G-R , Lim T-H , Koh S-B ((2015) ) Regression models for the quantification of Parkinsonian bradykinesia. Biomed Mater Eng 26: (Suppl 1), S2249–58. |
[41] | Martinez-Manzanera O , Roosma E , Beudel M , Borgemeester RWK , Van Laar T , Maurits NM ((2016) ) A method for automatic and objective scoring of bradykinesia using orientation sensors and classification algorithms. IEEE Trans Biomed Eng 63: , 1016–1024. |
[42] | Heldman DA , Urrea-Mendoza E , Lovera LC , Schmerler DA , Garcia X , Mohammad ME , McFarlane MCU , Giuffrida JP , Espay AJ , Fernandez HH ((2017) ) App-based bradykinesia tasks for clinic and home assessment in Parkinson’s disease: Reliability and responsiveness. J Parkinsons Dis 7: , 741–747. |
[43] | Agostino R , Curra A , Giovannelli M , Modugno N , Manfredi M , Berardelli A ((2003) ) Impairment of individual finger movements in Parkinson’s disease. Mov Disord 18: , 560–565. |
[44] | Ling H , Massey LA , Lees AJ , Brown P , Day BL ((2012) ) Hypokinesia without decrement distinguishes progressive supranuclear palsy from Parkinson’s disease. Brain 135: , 1141–1153. |
[45] | Lainscsek C , Rowat P , Schettino L , Lee D , Song D , Letellier C , Poizner H ((2012) ) Finger tapping movements of Parkinson’s disease patients automatically rated using nonlinear delay differential equations. Chaos 22: , 013119. |
[46] | Krupicka R , Szabo Z , Viteckova S , Ruzicka E ((2014) ) Motion capture system for finger movement measurement in Parkinson disease. Radioengineering 23: , 659–664. |
[47] | Ruzicka E , Krupicka R , Zarubova K , Rusz J , Jech R , Szabo Z ((2016) ) Tests of manual dexterity and speed in Parkinson’s disease: Not all measure the same. Parkinsonism Relat Disord 28: , 118–123. |
[48] | Bologna M , Leodori G , Stirpe P , Paparella G , Colella D , Belvisi D , Fasano A , Fabbrini G , Berardelli A ((2016) ) Bradykinesia in early and advanced Parkinson’s disease. J Neurol Sci 369: , 286–291. |
[49] | Bank PJM , Marinus J , Meskers CGM , de Groot JH , van Hilten JJ ((2017) ) Optical hand tracking: A novel technique for the assessment of bradykinesia in Parkinson’s disease. Mov Disord Clin Pract 4: , 875–883. |
[50] | Kandori A , Yokoe M , Sakoda S , Abe K , Miyashita T , Oe H , Naritomi H , Ogata K , Tsukada K ((2004) ) Quantitative magnetic detection of finger movements in patients with Parkinson’s disease. Neurosci Res 49: , 253–260. |
[51] | Shima K , Tsuji T , Kandori A , Yokoe M , Sakoda S ((2009) ) Measurement and evaluation of finger tapping movements using log-linearized gaussian mixture networks. Sensors 9: , 2187–2201. |
[52] | Sano Y , Kandori A , Shima K , Yamaguchi Y , Tsuji T , Noda M , Higashikawa F , Yokoe M , Sakoda S ((2016) ) Quantifying Parkinson’s disease finger-tapping severity by extracting and synthesizing finger motion properties. Med Biol Eng Comput 54: , 953–965. |
[53] | Gao C , Smith S , Lones M , Jamieson S , Alty J , Cosgrove J , Zhang P , Liu J , Chen Y , Du J , Cui S , Zhou H , Chen S ((2018) ) Objective assessment of bradykinesia in Parkinson’s disease using evolutionary algorithms: Clinical validation. Transl Neurodegener 7: , 18. |
[54] | Teo WP , Rodrigues JP , Mastaglia FL , Thickbroom GW ((2013) ) Comparing kinematic changes between a finger-tapping task and unconstrained finger flexion-extension task in patients with Parkinson’s disease. Exp Brain Res 227: , 323–331. |
[55] | di Biase L , Summa S , Tosi J , Taffoni F , Marano M , Rizzo AC , Vecchio F , Formica D , Di Lazzaro V , Di Pino G , Tombini M ((2018) ) Quantitative analysis of bradykinesia and rigidity in Parkinson’s disease. Front Neurol 9: , 121. |
[56] | Schilder JCM , Overmars SS , Marinus J , van Hilten JJ , Koehler PJ ((2017) ) The terminology of akinesia, bradykinesia and hypokinesia: Past, present and future. Parkinsonism Relat Disord 37: , 27–35. |
[57] | Bologna M , Paparella G , Fasano A , Hallett M , Berardelli A ((2020) ) Evolving concepts on bradykinesia. Brain 143: , 727–750. |
[58] | Jain S , Lo SE , Louis ED ((2006) ) Common misdiagnosis of a common neurological disorder. Arch Neurol 63: , 1100. |
[59] | Bajaj NPS , Gontu V , Birchall J , Patterson J , Grosset DG , Lees AJ ((2010) ) Accuracy of clinical diagnosis in tremulous parkinsonian patients: A blinded video study. J Neurol Neurosurg Psychiatry 81: , 1223–1228. |
[60] | Rizzo G , Copetti M , Arcuti S , Fontana A , Logroscino G ((2016) ) Accuracy of clinical diagnosis of Parkinson’s disease: A systematic review and meta-analysis. Neurology 87: , 237–238. |
[61] | Cervellin G , Borghi L , Lippi G ((2014) ) Do clinicians decide relying primarily on Bayesians principles or on Gestalt perception? Some pearls and pitfalls of Gestalt perception in medicine. Intern Emerg Med 9: , 513–519. |
[62] | Vancheri F ((2015) ) Bayesian principles or Gestalt perception for clinical judgment. Intern Emerg Med 10: , 253. |
[63] | Kahneman D (2011) Thinking, fast and slow, Macmillan. |
[64] | Hughes AJ , Daniel SE , Ben-Shlomo Y , Lees AJ ((2002) ) The accuracy of diagnosis of parkinsonian syndromes in a specialist movement disorder service. Brain 125: , 861–870. |
[65] | Homann CN , Quehenberger F , Petrovic K , Hartung HP , Ruzicka E , Homann B , Suppan K , Wenzel K , Ivanic G , Ott E ((2003) ) Influence of age, gender, education and dexterity on upper limb motor performance in Parkinsonian patients and healthy controls. J Neural Transm 110: , 885–897. |
[66] | Parkinson’s UK (2018) The incidence and prevalence of Parkinson’s in the UK. https://www.parkinsons.org.uk/sites/default/files/2018-01/CS2960%20Incidence%20and%20prevalence%20report%20branding%20summary%20report.pdf |