You are viewing a javascript disabled version of the site. Please enable Javascript for this site to function properly.
Go to headerGo to navigationGo to searchGo to contentsGo to footer
In content section. Select this link to jump to navigation

Modeling joint survival probabilities of runs scored and balls faced in limited overs cricket using copulas

Abstract

In limited overs cricket, the goal of a batsman is to score a maximum number of runs within a limited number of balls. Therefore, the number of runs scored and the number of balls faced are the two key statistics used to evaluate the performance of a batsman. In cricket, as the batsmen play as pairs, having longer partnerships is also key to building strong innings. Moreover, having a steady opening partnership is extremely important as a team aims to build such a stronger innings. In this study, we have shown a way to evaluate the performance of opening partnerships in Twenty20 (T20) cricket and the performance of individual batsmen in One Day International Cricket (ODI) by modeling the joint distribution of runs scored and balls faced using copula functions. The joint survival probabilities derived from this approach are then used to evaluate the batting performance of opening partnerships and individual batsmen for different stages of the innings. Results of the study have shown that cricket managers and team officials can use the proposed method in selecting appropriate partnership pairs and individual batsmen in an efficient manner for specific situations in the match.

1Introduction

Cricket is one of the popular sports in the world, particularly among commonwealth countries. It is a field game that consists of 11 players on each team, which has three main formats that are categorized based on the length of the game: Test cricket, One Day International (ODI) cricket, and Twenty20 (T20) cricket. In this study, we focus on ODI (in which each team gets 50 overs to bat) and T20 (in which each team gets 20 overs to bat). A typical ODI match lasts about nine hours, while a T20 match lasts about three hours. Due to the shorter time span, T20 format is becoming popular among cricket fans around the world.

In cricket, batsmen play as pairs; hence having longer partnerships is a key factor to build the momentum of the game. The goal of the batsmen is to score runs by consuming a low number of balls while protecting the wicket. Therefore, one can use the two variables, the number of balls faced and the runs scored, to model the performance of batsmen. This approach can be used to model the performance of individual batsmen as well as the performance of the partnerships. While partnership at any stage is important, it is extremely important to have a stronger opening partnership as a team plans to accumulate a competitive total (runs). Accordingly, the focus of this paper is divided into two cases: the first is to model the opening partnership scores and the second is to model the runs scored by individual players. Here we used ODI data for opening partnership modeling and T20 data for individual player batting modeling. For the game of cricket, several studies in the literature have discussed the partnership performance. Negative binomial distribution had been used by Scarf et al. (2011) to fit partnership scores and innings scores in test cricket. Valero & Swartz (2012) investigated the importance of partnerships in Test cricket and ODI cricket by comparing the performance of opening batsmen with their “synergistic” partners to the performance of these batsmen with alternative partners. A logistic regression model was applied by Talukdar (2020) to investigate whether the performance of opening partnership influences the outcome of T20 matches along with several other explanatory variables. In their paper, Bhattacharjee et al. (2018) discussed a measure to quantify the batting performance of partnerships based on 2016 T20 world cup data. Furthermore, Swartz et al. (2009) showed how the ODI cricket scores can be simulated using historical ODI data. Moreover, a generalized class of geometric distributions were proposed by Das (2011) as a model for the runs scored by individual batsmen in cricket. It appears that determining the joint distribution of the performance variables of batsmen is an insightful way to understand the overall performance of batsmen. Copula functions have been widely used in the statistical literature to model multivariate distributions. In sports statistics, McHale & Scarf (2007) applied copula functions and Poisson-related marginal distributions to fit soccer data from the English Premier League. From this approach, they highlighted the negative dependency between discrete pairs of shots-for and shots-against. In their paper, Tavassolipour et al. (2013) developed a method to detect and summarize events such as goal, corner foul, offside, and non-highlights in soccer video data using Bayesian network and Farlie-Gumbel-Morgenstern family of copulas. To detect the abnormal variation of batting averages and earned run averages of Major League Baseball (MLB) from 1998 to 2016 seasons, Kim et al. (2019) applied control charts and explored the directional dependence and tail dependence of average run lengths using Markov statistical process control and copula functions. In addition, Boshnakov et al. (2017) proposed a forecasting model using Weibull inter-arrival time count process and copula to obtain the bivariate distribution of number of goals scored at home and away in soccer.

However, to the best of our knowledge, copula functions have not been applied in cricket yet. In cricket, the runs scored and balls faced are generally correlated (positively). In this study, we have used copula functions to obtain the bivariate distribution of runs scored and balls faced by individual batsmen as well as by the opening partnerships. Moreover, these bivariate distributions were used to evaluate the performance of those batsmen or partnerships. For example, using the bivariate distribution, we were able to evaluate the probability of scoring more than 50 runs and staying in the wicket for more than 50 balls. Thereby, those estimated probabilities were then used for ranking partnerships and individual batsmen where the higher the probability for such cases is better the partnership or batsmen would be.

The rest of this paper is organized as follows. In Section 1, we provide an introduction to copula functions and their properties. In the same section, we discuss the types of copula functions that we considered in this study. The proposed method to obtain bivariate distribution and its parameter estimation procedure is explained in Section 3. In Section 4, we apply the proposed method to cricket batting data related to individual batsmen and partnerships. Section 4 is dedicated for discussions and conclusions.

1.1Data sets

1.1.1Partnership data

To evaluate the joint distribution between runs scored and balls faced, we selected 17 ODI opening partnerships across different teams. With the selection of partnerships, following restrictions were applied:

  • 1 batting average of the partnership is above 35;

  • 2 total runs scored is above 1000; and

  • 3 none of the two players have retired from ODI cricket before the year 2010.

For each of these partnerships, total number of runs scored during the partnership, number of balls faced, number of runs scored by the individual players at the end of the partnership, and the player who was first out were recorded. The data were retrieved in July 2020.

Table 1 shows the summary statistics of the 17 ODI partnerships we selected following above criteria. For each partnership, total number of innings played along with the total number of runs scored, batting average for the partnership, and number of not-outs were recorded. In addition, number of outs for each player and their respective individual batting averages (within the partnership) are also shown. As can be seen, Amla and de Kock played the highest number of partnership innings with 93 innings and 4199 total number of runs. Dhawan and Rahane recorded the highest partnership average of 67.88 in 17 innings. Furthermore, Rahane has the highest individual batting average of 33.06 in a partnership. The second highest partnership average has been recorded by Bairstow and Roy who have played 41 innings together with an average of 58.63.

Table 1

Descriptive summary of partnerships

Partner Combined StatisticsIndividual Player Statistics
Partners (Player 1 and Player 2)InningsTotalAvg.NotoutsOuts: Player 1Outs: Player 2Avg. Player 1Avg. Player 2
Amla H M and de Kock Q93419945.152454620.3822.72
Bairstow J M and Roy J J41240458.630152625.4930.85
Cook A N and Bell I R38158441.682191718.7619.89
Dhawan S and Rahane A M17115467.88011631.7133.06
Dilshan T M and Perera M D K38141837.32182915.8419.45
Fakhar Z and Imam-ul-haq34171050.290201425.7421.00
Haddin B J and Watson S R28128245.790161218.0725.54
Jayawardene D P M and Dilshan T M24112346.79114921.6721.67
McCullum B B and Ryder J D22106948.591101122.5923.64
McCullum and Guptill M J47190440.512331221.6815.83
Sharma R G and Dhawan S107480844.931525419.9522.51
Smith G C and Amla H M48194640.540331517.1021.56
Sarkar S and Iqbal T35144041.140201521.2017.00
Tendulkar S R and Sehwag V93391242.060326117.0121.59
Tharanga W U and Dilshan T M70288141.160373317.1120.90
Warner D A and Finch A J68334249.150323521.7021.33
Shewag V and Gambhir G38187049.211201725.5819.26

1.1.2Individual batting data

For the individual player batting performance modeling, top twenty batsmen in ICC T20 ranking as of July 2020 were used. Descriptive statistics of the data set are given in Table 2. As can be seen, batting averages for the top twenty T20 batsmen were distributed from 16.64 to 38.71. Babar Azam had the highest average of 38.71 and also the highest average for the number of balls faced per an innings, which was 30.21. Furthermore, Lokesh Rahul(38.45) and Hazratullah(38.00) recorded the second and the third highest batting averages, respectively. Note that in this study, the averages were calculated based on the innings played regardless the player was out or not out. Strike rate for these 20 players were distributed from 98.11 to 160.00. Glen Maxwell had recorded the highest strike rate of 160.00, and Colin Munro (156.44) and Aaron Finch(155.88) had the second and third strike rate, respectively.

Table 2

Descriptive summary of individual batting performance

Player (Rank)Max RunsMax BallsAverage RunsAverage BallsStrike RateInnings Played
Babar Azam (1)975838.7130.21128.1438
Lokesh Rahul (2)1105638.4526.32146.1038
Aaron Finch (3)1727632.6120.92155.8861
Colin Munro (4)1095828.2618.07156.4461
Dawid Malan (5)503716.6416.9698.1128
Glen Maxwell (6)1456529.1918.24160.0054
Eoin Morgan (7)915124.8618.08137.4986
Hazratullah (8)1626238.0024.47155.3115
Evin Lewis (9)1256230.1319.39155.4131
Virat Kohli (10)946135.9326.59135.1376
Rohit Sharma (11)1186628.0120.18138.7999
Martin Guptil (12)1056929.8422.16134.6185
Jason Roy (13)784524.5716.66147.5135
Quinton de Kock (14)795228.5120.95136.0743
D’Arcy Short (15)765329.6024.50120.8220
Kane Williamson (16)956028.7122.93125.1958
George Munsey (17)1275627.4217.78154.2236
David Warner (18)1006227.9419.89140.4879
Reeza Hendricks (19)745226.3921.91120.4423
Paul Stirling (20)955827.9520.07139.2876

2Copula functions

Copula functions are used to link one-dimensional marginal distributions to a multivariate distribution (see, for example, Nelsen 1999, Balakrishnan & Lai 2009). In copula functions, the marginal distributions are uniform on [0,1]. Since we expect to obtain the joint distribution of runs scored and balls faced by a batsman or a partnership of batsmen, in this study, we focus on two-dimensional copulas which allow to obtain the bivariate joint distribution of two random variables.

Suppose u, v ∈ [0, 1], then the two-dimensional copula is denoted as C (u, v) ∈ [0, 1] 2 and it has following properties:

  • 1. For every u, v ∈ [0, 1]

    (1)
    C(u,0)=0=C(0,v)andC(u,1)=uandC(1,v)=v;

  • 2. If 0 ≤ u1 ≤ u2 ≤ 1 and 0 ≤ v1 ≤ v2 ≤ 1 then

    (2)
    C(u2,v2)-C(u2,v1)-C(u1,v2)+C(u1,v1)0.

Sklar’s Theorem is one of the fundamental theorems in copula literature which provides the baseline to obtain the joint distribution of random variables using their marginal Cumulative Distribution Functions(CDF).

Theorem 1. [Sklar’s Theorem] Let H be a joint distribution function with marginal distribution functions F and G. Then, there exists a copula C (. , .) such that for all x, y ∈ (- ∞ , ∞) (Sklar 1959, Nelsen 1999)

(3)
H(x,y)=Pr(Xx,Yy)=C(F(x),G(y)).

C (. , .) is unique if F and G are continuous; otherwise, C (. , .) is uniquely determined on the (Range of F × Range of G). If C is a copula, and F and G are marginal distribution functions, then the function H defined by Equation (3) is a joint distribution functions with marginals F and G.

The joint Probability Density Function(PDF) of X and Y, h, can be derived from the joint distribution H as

(4)
h(x,y)=c(F(x),G(y))f(x)g(y)
where f (x) and g (y) are marginal PDFs of the random variables X and Y, respectively, and c (u, v) = ∂C (u, v)/∂uv is the bivariate copula density function.

2.1Archimedean copulas

Copula functions can be constructed using different methods such as inversion method, geometric method and algebraic method. Based on the copula construction methods, there are several families of copulas: Gaussian copula, Student’s t copula, Archimedean copula and extreme value copula. From those copula families, the Archimedean copulas have been gained more attention in many applications because they can be used to model multivariate joint distribution with one or few parameters. Moreover, Archimedean copulas are easy to construct and flexible to use. Frank copula, which is in Archimedean copula family, has been extensively used in the literature due to its symmetric properties and ability to estimate the joint distribution only by estimating one parameter. Thereby, in this study, we used Frank copula to model the joint distribution of runs scored and balls faced. The Frank copula (Frank 1979) function is given by

(5)
C(u,v)=-1θFln(1+(e-θFu-1)(e-θFv-1)e-θF-1),
where θF\{0} is the Frank copula parameter and u, v ∈ [0, 1]. The Kendall’s tau correlation coefficient is related to θF as τ = 1 +4 (D1 (θF) -1)/θF, where D1(θF)=0θFt/(et-1)dt/θF is the Debye function of first kind.

3Method

As mentioned in the previous sections, in this study, we expect to obtain the joint distributions of runs scored and balls faced by a batsman or within a partnership of two batsmen. Suppose X and Y are random variables that correspond to runs scored and balls faced, respectively. Let FX (x ; θx) and fX (x ; θx) denote the CDF and PDF of X, and similarly, FY (y ; θy) and fY (y ; θy) denote the CDF and PDF of Y. Here θx and θy are the vector of parameters of the corresponding distribution of X and Y, respectively. Moreover, suppose we consider a Frank copula function C (FX (x ; θx) , FY (y ; θy)) with a parameter ξ. Let (xi, yi) ,  i = 1, 2, …, m are sample bivariate data (i.e., runs scored and balls faced) of a batsman or a partnership for m innings. Thus, by using Equation (4), we can use the maximum likelihood estimation method to estimate the model parameters θx, θy, and ξ as follows:

(6)
logL(θx,θy,ξ)=i=1m[log{c(FX(xi;θx),FY(yi;θy);ξ)}+log{fX(xi;θx)}+log{fY(yi;θy)}],
where c (. , .) is the Frank Copula density function. By maximizing the log-likelihood function in Equation (6) with respect to θx, θy, and ξ, the maximum likelihood estimates (MLEs) of θx, θy, and ξ can be evaluated. To optimize the maximum likelihood function, we use nlm function in R software, which use a Newton-type algorithm (R Core Team 2018).

If the number of parameters to be estimated is large, the inference function for margin (IFM) method proposed by Joe & Xu (1996) can be used to obtain the MLEs. However, for this study, we directly optimize the likelihood function without using the IFM because we consider two parameter marginal distributions; thus, there are only five parameters to be estimated.

Suppose the parameter estimates for θx, θy, and ξ are x, y, and ξˆ , respectively. Using these parameter estimates, the estimated joint distribution based on the Frank copula function for x′ runs and y′ balls is given by

(7)
Prˆ(X<x,Y<y)=Cˆ(FˆX(x;x),FˆY(y;y);ξˆ).
For the purpose of ranking partnerships and individual players in cricket, one can use the joint distribution of scoring more than x′ runs by facing more than y′ balls (i.e., Pr(X > x′, Y > y′)). Following Nelsen (1999, Theorem 2.4.4), we can derive the corresponding joint survival distribution as

(8)
Prˆ(X>x,Y>y))=SˆX(x;x)+SˆY(y;y)-1+Cˆ(FˆX(x;x),FˆY(y;y);ξˆ),
where SˆX(x;x)=1-FˆX(x;x) and SˆY(y;y)=1-FˆY(y;y) , respectively.

In general, runs scored and balls faced are right-skewed distributions. After performing Kolmogorov-Smirnov goodness-of-fit test for runs scored and balls faced for partnership data and individual batting performance data with different distributions such as gamma, Weibull and exponential, we found gamma distribution to be the most suitable distribution generally for all the cases. Therefore, in this study, we assume the marginal distributions of both runs scored and balls faced follow gamma distributions (particularly, X ∼ Gamma (αx,  βx) and Y ∼ Gamma (αy,  βy).) Thus, we estimated parameters {αx, βx, αy, βy, ξ} from maximum likelihood approach using gamma marginals and Frank copula. Since gamma distribution has the support (0, ∞), we imputed zero runs or zero balls faced with 0.01. It is important to note that copula functions have the flexibility to choose any continuous distribution as marginals depending on the goodness-of-fit of the data.

4Results

In this section, we demonstrate and discuss the results obtained for the ODI opening batting partnership performance and for the T20 individual player batting performance based on the copula approach described in Section 3.

4.1Opening partnerships modeling

As described in the previous sections, for the 17 ODI opening partnerships shown in Table 1, the joint survival distribution of runs scored and balls faced is modeled using gamma marginals and the Frank copula. The parameter estimates of this model are derived by maximizing the likelihood function in Equation (6). For example, the MLEs for the bivariate joint distribution for the opening partnership pair Dhawan S and Rahane A M are {αˆx=0.88,βˆx=90.28,αˆy=1.18,βˆy=71.66,ξˆ=34.40} . The parameter estimates for all 17 partnerships are given in Table 6 of Appendix 6.2. Moreover, for the same partnership pair, a contour plot of the joint survival distribution of runs scored and balls faced is shown in Fig. 1.

Fig. 1

Contour plot for the joint survival probability distribution obtained using gamma marginals and Frank copula for the partnership between Dhawan D and Rahane A M.

Contour plot for the joint survival probability distribution obtained using gamma marginals and Frank copula for the partnership
between Dhawan D and Rahane A M.

For the purpose of exploring the performance of partnerships, we define five different stages of the game of which we believe crucial for evaluating partnerships. For each case, we have estimated the joint survival probability for scoring more than specific number of runs and facing more than specific number of balls.

  • 1 Case 1: Survive scoring more than 45 runs and facing more than 30 balls (Table 3)

  • 2 Case 2: Survive scoring more than 50 runs and facing more than 60 balls (Table 3)

  • 3 Case 3: Survive scoring more than 60 runs and facing more than 60 balls (Table 4)

  • 4 Case 4: Survive scoring more than 75 runs and facing more than 60 balls (Table 4)

  • 5 Case 5: Survive scoring more than 100 runs and facing more than 90 balls (Table 5)

Table 3

Marginal and joint survival probabilities for runs scored and balls faced of partnerships case 1 and case 2

Case 1Case 2
PartnershipP(R > 45)P(B > 30)P(R > 45, B > 30)RankP(R > 50)P(B > 60)P(R > 50, B > 60)Rank
Amla H M and de Kock Q0.41170.59830.411450.37770.35260.33595
Bairstow J M and Roy J J0.45220.58110.450830.42780.36560.35843
Cook A N and Bell I R0.36300.57490.3626110.33000.32860.29888
Dhawan S and Rahane A M0.54150.73690.541510.50900.51440.49141
Dilshan T M and Perera M D K0.31530.42340.3094170.28510.18570.179017
Fakhar Z and Imam-ul-haq0.46290.67000.462720.43510.45440.41802
Haddin B J and Watson S R0.37240.57780.371390.33640.29040.271610
Jayawardene D P M and Dilshan T M0.43890.63050.436440.39740.32650.31026
McCullum B B and Ryder J D0.38000.54330.376880.35150.27660.263012
McCullum and Guptill M J0.32970.43320.3203160.30010.18900.180916
Sharma R G and Dhawan S0.41150.62200.411060.37840.35940.33634
Smith G C and Amla H M0.34330.54070.3429140.30820.29270.270511
Sarkar S and Iqbal T0.33240.52090.3316150.30000.26840.249914
Tendulkar S R and Sehwag V0.37310.50750.3705100.34130.25870.251213
Tharanga W U and Dilshan T M0.36000.52070.3586130.33120.29260.27619
Warner D A and Finch A J0.39950.56180.398870.36730.31870.30827
Shewag V and Gambhir G0.36340.53050.3620120.32820.24960.241315
Table 4

Marginal and joint survival probabilities for runs scored and balls faced of partnerships case 3 and case 4

Case 3Case 4
PartnershipP(R > 60)P(B > 60)P(R > 60, B > 60)RankP(R > 75)P(B > 60)P(R > 75, B > 60)Rank
Amla H M and de Kock Q0.31850.35260.304740.24770.35260.24505
Bairstow J M and Roy J J0.38460.36560.347030.33040.36560.31723
Cook A N and Bell I R0.27360.32860.262580.20760.32860.204910
Dhawan S and Rahane A M0.45020.51440.447110.37530.51440.37501
Dilshan T M and Perera M D K0.23400.18570.1692170.17530.18570.145517
Fakhar Z and Imam-ul-haq0.38570.45440.380520.32420.45440.32312
Haddin B J and Watson S R0.27490.29040.2452110.20370.29040.194212
Jayawardene D P M and Dilshan T M0.32520.32650.284070.24000.32650.22727
McCullum B B and Ryder J D0.30200.27660.2484100.24240.27660.21768
McCullum and Guptill M J0.24960.18900.1715160.19090.18900.150216
Sharma R G and Dhawan S0.32090.35940.304550.25170.35940.24764
Smith G C and Amla H M0.24880.29270.2361130.18110.29270.178314
Sarkar S and Iqbal T0.24520.26840.2238150.18220.26840.175915
Tendulkar S R and Sehwag V0.28630.25870.2380120.22140.25870.204011
Tharanga W U and Dilshan T M0.28150.29260.254990.22230.29260.21339
Warner D A and Finch A J0.31140.31870.287060.24440.31870.23856
Shewag V and Gambhir G0.26820.24960.2251140.19880.24960.184913
Table 5

Marginal and joint survival probabilities for runs scored and balls faced of partnerships case 5

Case 5
PartnershipP(R > 100)P(B > 90)P(R > 100, B > 90)Rank
Amla H M and de Kock Q0.16400.20710.15284
Bairstow J M and Roy J J0.26040.23400.21813
Cook A N and Bell I R0.13240.18760.12188
Dhawan S and Rahane A M0.27830.35330.27621
Dilshan T M and Perera M D K0.10990.08200.062517
Fakhar Z and Imam-ul-haq0.24560.30910.23962
Haddin B J and Watson S R0.12410.14040.096813
Jayawardene D P M and Dilshan T M0.14370.15890.111410
McCullum B B and Ryder J D0.17050.13850.11459
McCullum and Guptill M J0.12390.08260.063816
Sharma R G and Dhawan S0.16930.20350.15205
Smith G C and Amla H M0.10740.15850.097112
Sarkar S and Iqbal T0.11240.13800.093214
Tendulkar S R and Sehwag V0.14580.13200.107011
Tharanga W U and Dilshan T M0.15220.16720.12797
Warner D A and Finch A J0.16480.18120.14466
Shewag V and Gambhir G0.12150.11380.086515

Case 1 represents surviving in the innings to score more than 45 runs and facing more than 30 balls (5 overs), which can be considered as minimum reasonable performance by an opening pair. Results of this case are shown in Table 3. The partnership pair Dhawan S and Rahane A M has been ranked the top in the list in terms of the joint survival probability (0.54).

Case 2, case 3, and case 4 can be considered as well-founded partnerships in ODI cricket as these last more than 60 balls (10 overs) with decent scores of 50 runs, 60 runs, and 75 runs, respectively. Moreover, the strengths of the effectiveness of these partnerships increase as it moves from case 2 through case 4. In ODI cricket, it is a known strategy for the bowling team to put all their efforts to dismiss at least one partner (if not both) of the opening pair during the first 10 overs (powerplay). On the other hand, inability to accomplish that may be considered a weakness of the bowling team, which consequently lowers the bowlers’ confidence. Given that, case 4 provides more confidence to the other top order and to the middle order batsmen as they continue to build the innings to go for a higher total. Case 5 represents scoring more than 100 runs and batting more than 15 overs (90 balls), which gives an impressive foundation to the innings of the batting team.

It is interesting to notice that Dhawan S & Rahane A M have the highest joint survival probability in all five cases. Also note that the second and the third highest joint probabilities for surviving in all five cases are recorded by the same partnership pairs: Fakhar Z & Imam-ul-haq, and Bairstow J M & Roy J J, respectively. These results indicate that these two pairs are consistently better than the other partnership pairs considered in this study. Graphical representation of the above cases for all the partnerships considered in this study is shown in Fig. 2. Note that the probabilities are steadily decreasing as the cases move from case 1 to case 5; this is because of moving from case 1 to case 5 the chance of surviving for an opening pair goes down.

Fig. 2

Joint probabilities of partnerships for different cases.

Joint probabilities of partnerships for different cases.

4.2Individual T20 batting performance

Similar to the partnership analysis, the runs scored and balls faced by T20 batsmen in Table 2 were modeled using gamma marginals and Frank copula. The parameters of the model were estimated by maximizing the likelihood function in Equation (6). For example, the MLEs for the Pakistan top order batsman, Babar Azam were {αˆx=1.06,βˆx=41.33,αˆy=1.63,βˆy=20.32,ξˆ=22.16} . Parameter estimates for all T20 batsmen considered in this study are given in Table 7 of Appendix 6.2. Moreover, for Babar Azam, a contour plot of the joint survival distribution of runs scored and balls faced is shown in Fig. 3.

Fig. 3

Contour plot for the joint survival probability distribution obtained using gamma marginals and Frank copula for T20 batsman Babar Azam.

Contour plot for the joint survival probability distribution obtained using gamma marginals and Frank copula for T20 batsman Babar Azam.

To evaluate the individual batting performance using the joint survival distribution, we considered 13 different cases based on different stages of the innings. We used the notation (r, b) to represent the case scoring more than r runs by surviving more than b balls. For example, the notation (20,30) indicates the case of scoring more than 20 runs while surviving more than 30 balls. The thirteen different cases for individual batting performance and their respective joint probabilities are shown in Fig. 4.

Fig. 4

Individual player performance.

Individual player performance.

Pakistan top order batsman Babar Azam had the highest joint survival probability in all thirteen categories. Interestingly, according to the ICC T20 batsmen ranking as of July 2020, Babar Azam was also the top ranking T20 batsman. Another interesting observation is that Virat Kohli who was the 10th ranking (based on ICC ranking) batsman has been occupying the number 2 rank in 9 cases and number 3 rank in 4 cases in the joint survival probability list for the thirteen cases. Lokesh Rahul who was the second ranking (based on ICC ranking) batsman in our data set, has been occupying ranks number 2 in 4 cases, rank number 3 in 6 cases, and rank number 4 in 3 cases in our joint survival probability ranking. In this study, we particularly focused on the batsman’s ability to survive in different conditions in the game than evaluating the aggressiveness of scoring runs. Nevertheless, the results in the study fairly match with the ICC T20 batsmen ranking, which indicates that the ICC batsmen ranking system is an effective tool to evaluate the performance of batsmen in cricket.

5Conclusions

Copula applications are common in the fields such as finance and reliability engineering; nevertheless, in sports literature, copula applications are limited. In this article, we have shown the effectiveness of copula methods as an application to the game of cricket. As the main contribution of this study, we were able to model the bivariate joint distribution of the number of runs scored and the number of balls faced using copula functions for the partnerships in ODI cricket and individual batsmen in T20 cricket. Furthermore, these bivariate distributions were used to rank the batting performance of the opening partnerships as well as the batting performance of individual T20 batsmen for different stages in the game.

Based on the joint survival probability estimates, the partnership pair, Dhawan S & Rahane A M, have been ranked the top in all the five partnership cases considered in this study. The second and the third highest joint survival probabilities in all five cases were recorded by the partnership pairs: Fakhar Z & Imam-ul-haq, and Bairstow J M & Roy J J, respectively. These results indicated that above three partnership pairs were consistently better than the other partnership pairs considered in this study.

Babar Azam, who is number one in the ICC T20 batsmen ranking, has been ranked first based on the joint survival probability approach. Furthermore, Virat Kohli and Lokesh Rahul were ranked as the next two highest ranking T20 batsmen with respect to the thirteen joint survival probability cases we considered. It is important to note that the probability based ranking method proposed in this study closely complies with the ICC batting ranking. Nevertheless, there are some noticeable deviations between the two ranking approaches, for example, Virat Kohli is in the tenth place in the ICC ranking; however, he has been ranked as the second or the third from joint survival probability approach.

We believe that the results of this paper would be useful for team managers. In particular, they can use the proposed method to select the best opening pair or individual batsmen from a pool of players for a given match. Furthermore, in this study, copula applications are shown to be a useful tool to evaluate player performance in the game of cricket, and thus, this article may bring attention to the area and open up opportunities for future research. As a future extension, we are expecting to apply a similar approach to evaluate the bowling performance in cricket. In addition, another interesting future research would be to expand for multivariate approach considering other performance indicators such as the number of fours, number of sixes, and number of dot balls.

References

1 

Balakrishnan, N. , & Lai, C. D. , (2009) . Continuous Bivariate Distributions, Springer Science & Business Media, New York, NY.

2 

Bhattacharjee, D. , Lemmer, H. H. , Saikia, H. , & Mukherjee, D. , (2018) . Measuring performance of batting partners in limited overs cricket, South African Journal for Research in Sport, Physical Education and Recreation 40: , 1–12.

3 

Boshnakov, G. , Kharrat, T. , & McHale, I. G. , (2017) . A bivariate Weibull count model for forecasting association football scores, International Journal of Forecasting 33: , 458–466.

4 

Clayton, D. G. , (1978) . A model for association in bivariate life tables and its application in epidemiological studies of familial tendency in chronic disease incidence, Biometrika 65: , 141–151.

5 

Das, S. , 2011. On generalized geometric distributions: Application to modeling scores in cricket and improved estimation of batting average in light of notout innings, IIM Bangalore Research Paper.

6 

Frank, M. J. , (1979) . On the simultaneous associativity of F(x, y) and x + y - F(x, y), Aequationes Mathematicae 19: , 194–226.

7 

Gumbel, E. J. , (1960) . Distributions des valeurs extremes en plusiers dimensions, Publ Inst Statist Univ Paris 9: , 171–173.

8 

Joe, H. , & Xu, J. J. , (1996) . The estimation method of inference functions for margins for multivariate models, Technical report, University of British Columbia, Department of Statistics.

9 

Kim, J.-M. , Baik, J. , & Reller, M. , 2019. Control charts of mean and variance using copula markov SPC and conditional distribution by copula, Communications in Statistics-Simulation and Computation pp. 1-18.

10 

McHale, I. , & Scarf, P. , (2007) . Modelling soccer matches using bivariate discrete distributions with general dependence structure, Statistica Neerlandica 61: , 432–445.

11 

Nelsen, R. B. , (1999) . An Introduction to Copulas, Springer-Verlag New York, Inc., New York, NY.

12 

RCore Team, 2018. R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna, Austria. URL: https://www.R-project.org/

13 

Scarf, P. , Shi, X. , & Akhtar, S. , (2011) . On the distribution of runs scored and batting strategy in test cricket, Journal of the Royal Statistical Society: Series A (Statistics in Society) 174: , 471–497.

14 

Sklar, M. , (1959) . Fonctions de repartition an dimensions et leurs marges, Publ Inst Statist Univ Paris 8: , 229–231.

15 

Swartz, T. B. , Gill, P. S. , & Muthukumarana, S. , (2009) . Modelling and simulation for one-day cricket, Canadian Journal of Statistics 37: , 143–160.

16 

Talukdar, P. , (2020) . Investigating the role of opening partners while chasing on the outcome of twenty20 cricket matches, Management and Labour Studies 45: , 222–232.

17 

Tavassolipour, M. , Karimian, M. , & Kasaei, S. , (2013) . Event detection and summarization in soccer videos using Bayesian network and copula, IEEE Transactions on Circuits and Systems for Video Technology 24: , 291–304.

18 

Valero, J. , & Swartz, T. B. , (2012) . An investigation of synergy between batsmen in opening partnerships, Sri Lankan Journal of Applied Statistics 13: , 87–98.

Appendices

6 Appendix

6.1 Examples of other Archimedean copula types

6.1.1 Clayton copula

The Clayton copula (Clayton 1978) function is expressed as

(9)
C(u,v)=max([u-θC+v-θC-1]-1/θC,0),
where θC ∈ [-1, ∞) \ {0} is the Clayton copula parameter. The Clayton copula parameter, θC, has relationship to the Kendall’s tau correlation coefficient as τ = θC/(θC + 2).

6.1.2 Gumbel copula

The Gumbel copula (Gumbel 1960) function can be obtained as

(10)
C(u,v)=exp(-[(-lnu)θG+(-lnv)θG]1/θG),
where θG ∈ [1, ∞) is the Gumbel copula parameter. The Gumbel copula parameter parameter θG has relationship to the Kendall’s tau correlation coefficient as τ = (θG - 1)/θG.

6.2 Parameter estimates for partnerships and individual batsmen

Tables 6 and 7 show the parameter estimates for the joint survival distribution obtained using gamma marginals with Frank copula for ODI opening partnerships and T20 individual batsmen, respectively.

Table 6

Parameter estimates for the joint survival distribution obtained from Gamma Marginals and Frank copula for ODI opening partnerships

Partnershipsαxβxαyβyξ
Amla H M and de Kock Q0.8166.561.0355.4425.43
Bairstow J M and Roy J J0.51154.930.8374.0825.74
Cook A N and Bell I R0.7662.001.0153.2122.67
Dhawan S and Rahane A M0.8790.281.1871.6634.40
Dilshan T M and Perera M D K0.6662.300.9537.4319.53
Fakhar Z and Imam-ul-haq0.62116.660.9779.4827.20
Haddin B J and Watson S R0.8853.031.3136.8318.68
Jayawardene D P M and Dilshan T M1.1446.191.5034.7816.51
McCullum B B and Ryder J D0.6089.261.1341.1917.49
McCullum and Guptill M J0.6468.870.9936.3716.91
Sharma R G and Dhawan S0.7770.721.1848.6921.90
Smith G C and Amla H M0.8550.771.0048.9623.68
Sarkar S and Iqbal T0.7457.821.0244.6821.91
Tendulkar S R and Sehwag V0.7268.620.9944.7121.27
Tharanga W U and Dilshan T M0.6181.540.8657.3821.73
Warner D A and Finch A J0.7472.610.9853.6524.84
Shewag V and Gambhir G0.8554.071.2435.1121.02
Table 7

Parameter estimates for the joint survival distribution obtained from Gamma Marginals and Frank copula for T20 individual batsmen

Batsmanαxβxαyβyξ
Babar Azam1.0641.331.6320.3222.16
Virat Kohli1.0439.771.5918.1120.27
D’Arcy Short0.8342.431.3620.2718.87
Lokesh Rahul1.0340.871.7216.0721.26
Aaron Finch0.6955.491.1620.2724.61
Hazratullah0.7254.611.3219.4213.37
Reeza Hendricks0.8439.861.3918.7620.13
Rohit Sharma0.7542.511.1120.5421.39
Kane Williamson0.9733.161.5216.0716.90
Martin Guptil0.8737.871.5515.1017.11
Paul Stirling0.6548.811.3315.6820.43
David Warner0.7841.261.3116.1314.96
George Munsey0.7045.881.1816.3418.57
Evin Lewis0.5659.581.3015.2616.51
Quinton de Kock0.8335.271.2716.0915.89
Glen Maxwell0.7642.791.2914.7418.34
Colin Munro0.6246.681.0516.8015.97
Jason Roy0.6640.341.2413.4017.77
Eoin Morgan0.8331.971.6311.4515.05
Dawid Malan1.689.542.237.2510.75