You are viewing a javascript disabled version of the site. Please enable Javascript for this site to function properly.
Go to headerGo to navigationGo to searchGo to contentsGo to footer
In content section. Select this link to jump to navigation

A multi-criteria approach for evaluating major league baseball batting performance

Abstract

The evaluation of player performance typically involves a number of criteria representing various aspects of performance that are of interest. Pareto optimality and weighted aggregation are useful tools to simultaneously evaluate players with respect to the multiple criteria. In particular, the Pareto approach allows trade-offs among the criteria to be compared, does not require specifications of weighting schemes, and is not sensitive to the scaling of the criteria. The Pareto optimal players can be scored according to their ranks or according to their distance from the global optimum for informative comparisons of performance or for evaluating trade-offs among the criteria. These multi-criteria approaches are defined and illustrated for evaluating batting performance of Major League Baseball players.

1Introduction

Albert (2010) defines sabermetrics as the science of learning about baseball through objective evidence. Grabiner (2014) indicates that the basic goal of sabermetrics is to evaluate past player performance and to predict future performance of player contributions to their teams. The information can be useful for determining who wins season awards and when determining the value of making a certain trade. The sabermetrician looks to contribute to this field through creating new statistics to better assess player performance (Albert, 2010). Often, these new statistics are aggregations or combinations of existing statistics.

This paper describes a multi-criteria approach in which the sabermetrician can evaluate and rank player performance using a simultaneous evaluation of multiple criteria. Two popular approaches are adopted from the multiple optimization literature: (1) Pareto optimization discussed by Marler and Arora (2004) and (2) Weighted aggregation discussed by Ngatchou et al. (2005). Pareto optimal solutions are those which are not dominated or which cannot be bettered with respect to all of the criteria under consideration. Once Pareto solutions are determined, the sabermatrician can examine trade-offs between the criteria and identify the collection of players that cannot be beat with respect to the specified criteria. Weighted aggregation develops linear combinations of the optimization criteria which can then be used to rank or score the players. Weighted aggregation can also be helpful for characterizing the performance of those players that are Pareto optimal. As a result, the proposed approach has the following advantages:

  • 1. Allows for simple and informative simultaneous comparisons of numerous players in terms of multiple performance criteria.

  • 2. Avoids having to combine multiple criteria into single metrics based upon complex specifications for weighting and scaling of the criteria.

  • 3. Allows the trade-offs among the criteria to be compared.

Koop (2002) recognizes that evaluation of baseball players is a difficult task since “baseball is fundamentally a multiple-output sport”. This author uses frontier models to create an aggregator of the multiple outputs pertaining to hitting in Major League Baseball (MLB). A Bayesian modeling approach is then implemented to estimate player efficiencies. The proposed approach in this paper does not rely on complicated modeling of aggregated outputs. Rather, the multiple hitting criteria are evaluated directly with observed or predicted data using Pareto optimality. Weighted aggregations, in the form of ranks, are then used to characterize the Pareto optimal players. Efficiencies are also calculated directly from the best possible performance with respect to each of the criteria. The proposed approach would be useful to sabermetricians, general managers, and fantasy baseball players for assessing player performance with respect to multiple criteria.

2Multi-Criteria Optimization

Suppose there is interest in evaluating player performance according to c criteria for a particular collection of players ℵ. Let fi(x) denote a criterion value for i = 1, 2, …, c involving a player x∈ ℵ. Furthermore, suppose each fi(x) ≡ - gi(x) is to be maximized. As in Ardakani and Wulff (2013), the multi-criteria setting can be stated as the optimization problem

(1)
Maximize f(x)=(f1(x), f2(x),  fs(x))subject to x,
where f(x) denotes the criterion vector evaluated at x. The utopia point corresponds to the criterion vector associated with the player who simultaneously maximizes all criterion values. That is, xu is a utopia point provided fi(xu) ≥ fi(x) for all x∈ ℵ, for all i = 1, 2, …, c. For c ≥ 2, the utopia point rarely exists due to conflicts of simultaneously maximizing all the criteria. However, it is hoped that players can be identified that are ‘close’ to the utopia point. For an application to baseball hitters, the utopia point could be better referred to as Batman since such a player would be a mythical batting superhero.

2.1Pareto optimality

Consider the problem of finding a solution (player) according to (1). While Batman may not exist, a collection of players may be identified that cannot be bettered, or dominated, according to the c criteria. A player x*∈ ℵ is said to dominate another player x provided the following two conditions are met:

(2)
1. fi(x*)fi(x) for all i1,2, , c,(2)2. fi(x*)>fi(x) for all at least one i1,2, , c,

A player is said to be Pareto optimal provided they are not dominated by any other player in ℵ. The set of Pareto optimal players constitutes the Pareto optimal set (POS). The corresponding criteria vectors f(x) comprise the Pareto front (PF). An overview of Pareto-related concepts is available in Coello Coello et al. (2007).

By definition, the Pareto optimal set is defined as those players that are not Pareto dominated in the sense of (2). Thus, the POS can be generated by first forming the complement POSc, which is the set of dominated players. This is done in the following steps:

(3)
1. evaluate f(x) for all x,(3)2. initialize POSc to be  and perform thefollowing for each za. check if condition (2) is satisfied with x*such that x* zb. if (2) is satisfied, add z to POSc, otherwisedo not add z to POSc.

The steps in (3) identify the set POSc, so that POS=(POSc)c is the set of non-dominated solutions or the Pareto optimal set. The Pareto front consists of f(x) for all x ∈ POS.

Figure 1 shows two performance criteria values f1 and f2 so that c = 2. Batman corresponds to that player who simultaneously maximizes both criteria. The Pareto optimal set consists of the players lying on the boundary of the criterion space closest to Batman. This boundary forms the Pareto front. There are two points which lie on an axis. These players maximize either f1 or f2. Note that none of the Pareto optimal players can bettered in one criterion without deteriorating in the other criterion. If the two criteria are highly positively correlated, then the Pareto front will likely consist of only a few players. Otherwise, the criteria conflict and the Pareto front will then likely consist of numerous players.

Fig. 1

Illustration of the Pareto front and utopia point (Batman) in a two-criteria maximization problem.

Illustration of the Pareto front and utopia point (Batman) in a two-criteria maximization problem.

2.2Scoring

A popular approach for finding an optimal solution to (1) is weighted aggregation as identified by Ngatchou et al. (2005) or the weighting method as identified by Ardakani and Wulff (2013). In this approach, all c criteria are combined to form a single objective function. The optimization problem is thus reduced to one function from which a batter can be scored and an optimal batter can be identified from this score. In particular, the formulation for weighted aggregation can be expressed as

(4)
Maximize h(x)=i=1c wi|ki(x)| subject to x,
where ki(x) is a function of the criterion value and the collection {wi} are weights for the contribution of criterion i. Often, the weights satisfy wi ≥ 0 and . There are many approaches for obtaining a single objective function (Ardakani and Wulff, 2013). The expression in (4) can be extended using the weighted p-norm method with powers p greater than or equal to 1 in which |ki(x) | is replaced by |ki(x) |p (Marler and Arora, 2004). Using (4), players are then ranked by the values of h, and the best player is the one that maximizes h A disadvantage of weighted aggregation is that it depends upon the selected weights {wi}, and the selection of the weights depends upon the scaling of the criteria. The task of selecting the weights, and interpreting h, is made easier when the criteria are placed on comparable scales. However, specifying a meaningful scale can be difficult. These disadvantages are not present with the Pareto approach since POS and PF do not depend upon weights or the scaling of the criteria.

A naive approach is to let ki(x) = fi(x). The sabermetrician can prespecify the weights according to their preferences, and rank players accordingly. However, the criteria could be on different scales which leads to the scaling problems mentioned previously. One approach to deal with differences in scales, and which is consistent with player rankings, is to let ki(x) = - ri(x) where ri(x) denotes the rank of player x∈ ℵ. with respect to criterion i. Then weights can be assigned in relation to the importance that is to be placed upon the rankings for criterion i. However, the use of rankings can mask differences of magnitude within the criteria values between players. In this study, players with ties in their criterion values are assigned the maximum of the ranks.

It is also possible to use (4) to assess the distance a player is from Batman, or from the hypothetical hitter xu who maximizes each criterion value separately. Marler and Arora (2004) recommend that the criteria be on the same scale before measuring distance. In particular, these authors consider the two scalings:

(5)
ki(x)=fi(x)fi(xu),
(6)
ki(x)=fi(x)-fi(xl)fi(xu)-fi(xl).

Equation (5) is the ratio between a player and Batman for criterion i. Equation (6) represents the desirability between a player and Batman for criterion i where xl denotes the hypothetical player with lowest criterion value in each of the criterion. This is the opposite of Batman, or the Joker. Thus, (6) compares the difference of a player from Joker relative to the difference of Batman from Joker for criteria i. As previously mentioned, equation (4) can be generalized using a power p to represent a weighted p-norm metric. Equation (4) assumes p = 1 which corresponds to the 1-norm or sum norm. Thus, (5) can be interpreted as the sum norm distance a player is to 0 relative to Batman. Equation (6) can be interpreted as the sum norm distance a player is to Joker relative to Batman.

As previously mentioned, a disadvantage of weighted aggregation is that it depends upon the selected weights {wi}. An experienced sabermetrician would pre-specify weights according to the specific objectives of the player performance evaluation. If there is no justified apriori rationale for specifying the weights, then equal weights could be used with wi = 1/c for i = 1, …, c. Equal weighting amounts to finding the average of the criteria.

In this study, there was no rationale for pre-speci-fying the weights. Thus, the weights were determined objectively using exploratory factor analysis (EFA). EFA hypothesizes a model in which the criteria are a linear combination of unobserved factors and coefficients in this model, or loadings, that are estimated to approximately reproduce the covariance matrix among the criteria (Rencher, 2012, pp. 435–441). A single factor model is hypothesized in this study, where that factor represents player performance. The loadings are estimated using maximum likelihood (Rencher, 2012, pp. 452), and then standardized to obtain the weights {wi} used in (4). The loading equals the covariance between the corresponding criterion and the factor representing player performance (Rencher, 2012, pp. 440). Thus, the higher the loading, the higher the weight for that criterion since it is most linearly related to player performance. The weights from the EFA approach are used for both the scoring in (4) and scored ratio to Batman in (5).

3Primary Pareto Optimal Set

MLB hitting data for 2016 is taken from baseballguru.com which has annual hitting data in EXCEL files under the player forecast section. Abbreviations for various performance variables are given in Table 1 for convenience. As in Koop (2002), pitchers and hitters with few AB are removed from the dataset. The removal of pitchers results in 620 hitters. Hitters with fewer than 60 AB are also removed to focus on full time hitters and to eliminate possible anomalies. The cut off of 60 AB is close to the first quartile of AB for the 620 hitters (65 AB). The final data set consists of n = 473 hitters. The hitting measures considered here are given by the six performance variables (y) in Table 1. These are the traditional well-known statistics to assess offensive performance, and are listed on popular baseball websites such as mlb.com. These statistics measure various aspects of offensive performance, including the ability of a hitter to get on base, generate runs, and hit for power. These criteria are also used in specific aggregations to formulate several sabermetrics. To avoid concerns with using the counting statistics (R, H, HR, RBI), these values are scaled by AB as recommended by Grabiner (2014).

Table 1

Baseball abbreviations for variable names (n), performance variables (y), sabermetric variables (s), predictor variables (x)

VariableDescription
n1= ABAt Bat
n2= BBBase on Balls
n3= CSCaught Stealing
n4= HHits
n5= HRHome Runs
n6= MVPMost Valuable Player
n7= PAPlate Appearance
n8= RRuns Scored
n9= RBIRuns Batted In
n10= ROYRookie of the Year
n11= SBStolen Base
n12= SFSacrifices
n13= SSSilver Slugger
n14= TBTotal Bases
y1= Rr = R/ABRuns per At Bat
y2= HRr = HR/ABHome Runs per At Bat
y3= RBIr = RBI/ABRuns batted In per At Bat
y4= AVG = H/ABBatting Average
y5= OBP =  (H + BB + HBP)/(AB + BB + HBP + SF)On Base Percentage
y6= SLG = TB/ABSlugging Percentage
s1= wOBAWeighted On-base Average
s2= wRC+Weighted Runs Created Plus
s3= WARWins Above Replacement
x1= GGames Played In
x2= ABAt Bats
x3= AGEPlayer Age
x4,j4= TEAM (reference = COL)Team of Player
x5,j5= FP1 (reference = 1B)Player Fielding Position
x6,j6= BATS (reference = R)Batting Side of Plate

Table 2 shows the correlation matrix for the 2016 hitting performance measures for players in the candidate set ℵ. All correlations are positive. SLG is moderately correlated with all the hitting measures, including OBP. HRr is correlated with RBIr as expected. The highest correlation is 0.78 which is observed between AVG and OBP as well as HRr and SLG. These correlations are moderate and would result in only mild concerns about multicollinearity according to Kutner et al. (2004, pp. 406–410). It is expected that statistics that are highly correlated will be coherent or produce similar rankings of player performance. Thus, the presence of these correlations is not an impediment to this multi-criteria approach.

Table 2

Correlation matrix for the six hitting performance measures

RrHRrRBIrAVGOBPSLG
Rr10.39630.34850.49200.59180.6037
HRr0.396310.76090.13840.29190.7769
RBIr0.34850.760910.35670.42270.7595
AVG0.49200.13840.356710.77910.6899
OBP0.59180.29190.42270.779110.6593
SLG0.60370.77690.75950.68990.65931

The Pareto optimal set of 2016 MLB hitters with respect to these six performance criteria is shown in Table 3. This collection of 19 hitters lie on the primary Pareto optimal set (POS1) since they are non-dominated according to (2) and as such cannot be bettered by any other hitter in the candidate set with respect to these criteria. Within the Pareto optimal set, players are ranked according to their scored rankings across the six performance criteria using the weights obtained from EFA. The weights determined from EFA for the 2016 data are 0.14 for Rr, 0.17 for HRr, 0.17 for RBIr, 0.15 for AVG, 0.15 for OBP, and 0.22 for SLG. The mythical Batman consists of a mixture of Sanchez (HRr, SLG), Trout (Rr, OBP), Ortiz (RBIr), and LeMahieu (AVG). All batters have a scored ratio to Batman using (5) and the EFA weights which ranges from 0.69 (LeMahieu) to 0.90 (Sanchez) for players in POS1.

Table 3

Pareto optimal hitters, Pareto Front shown as ranks, scored rankings, scored ratios to Batman, and comments related to the evaluation of player performance

nameteamposbatsagegamesabRr.RHRr.RRBIr.RAVG.ROBP.RSLG.Rs.rankr.ranks.Batmancomment
SanchezNYACR23532013544126[1]16.02[1]0.9018AL rookie 2
ArenadoCOL3BR25160618102125344420.7420.8266NL MVP 5, SS
TroutLAAOFR24159549[1]732016[1]1321.3630.8311AL MVP, SS
OrtizBOS1BL4015153712512[1]166223.4540.8645Best Hitter, AL MVP 6, SS
VottoCIN1BL32158556157735621325.250.8017NL MVP 7
CabreraDET1BR331585958530241210826.1460.7963AL MVP 9, SS
BryantCHN3BR241556033284758191026.9270.7985NL MVP
MurphyWAS2BL31142531461017213327.7180.8069NL MVP 2, SS
DonaldsonTOR3BR30155577229398581428.8790.8076AL MVP 4
FreemanATL1BL26158589304792346534.93100.7769NL MVP 6
BraunMILOFR3213551178423227371736.84120.7631NL MVP 24
RizzoCHN1BL2615558368611358181637.02130.7676NL MVP 4, SS
BlackmonCOLOFL291435786831408221145.67150.7605NL MVP 26, SS
StoryCOLSSR23973721999131128746.11160.7978NL Rookie 4
EncarnacionTOR1BR3316060149133183822655.05200.7844AL MVP 14
AltuveHOU2BR26161640371741074102460.33240.7362AL MVP 3
TurnerWASOFR237330732130197346768.96290.7345NL Rookie 2
JoycePITOFL3114023155223289710981.83360.7449Free Agent
LeMahieuCOL2BR271465529329239[1]466113.09580.6928NL MVP 15

POS1 identifies many well-known hitters in MLB for 2016. In fact, POS1 identifies the 2016 hitting award winners including the AL MVP, NL MVP, Silver Sluggers, vote-getters for MVP, and vote-getters for ROY. As previously mentioned, this is one of the objectives of sabermetrics. Sanchez had an incredible rookie season and received the lowest scored rank of all hitters given his production in Rr, RBIr, SLG. He is also the closest to Batman. Ortiz, a veteran hitter who had a fantastic year, received the Best Hitter award which is well deserved given that he is the third closest to Batman and had more AB than Sanchez. Four COL players are in POS1 (Arenado, Blackmon, Story, LeMahieu) where they each rank highly in a couple of the performance measures. While these are good hitters, increased performance might be expected at a hitter friendly park such as Coors Field. Joyce, a free agent who plays in OAK for 2017, might be an unexpected hitter to be included in POS1. However, he ranks fifth in Rr. Of those who rank higher, Trout has lower HRr, Donaldson has lower RBIr, Bryant has lower OBP, and DeShields (Table 4) has lower values in all other categories. By the definition in (2), Joyce is non-dominated and so is included in POS1.

Table 4

Secondary Pareto optimal hitters, Secondary Pareto Front shown as ranks, scored rankings, scored ratios to Batman, and comments related to the evaluation of player performance

nameteamposlbatsagegamesabRr.RHRr.RRBIr.RAVG.ROBP.RSLG.Rs.rankr.ranks.Batmancomment
CruzSEAOFR3515558962830716036.77[11]0.7859AL MVP 15
BettsBOSOFR23158672161095011651944.85[14]0.7503AL MVP 2, SS
BeltreTEX3BR3715358398612939653151.44[17]0.7425AL MVP 7
CanoSEA2BL3316165557418344882353.92[18]0.7428AL MVP 8
CespedesNYNOFR301324791052727101672554.58[19]0.7527NL MVP 8, SS
DiazSTLSSR25111404241347539333757.83210.7231NL ROY 5
RamirezBOS1BR3214754912363[5]75574658.70220.7453
GoldschmidtARI1BR28158579131376445[3]7158.81230.7359AL MVP 11
RodriguezPIT1BR311403005839141411073762.47250.7436
MachadoBAL3BR231576405246107531122363.10260.7316AL MVP 5
MartinezDETOFR281204601089611824371864.61270.7185
DozierMIN2BR291556153615741521281565.47280.7514AL MVP 13
SeagerSEA3BL281585971138260107495775.90310.7081AL MVP 12, SS
NaquinCLEOFL251163216612317947283278.87320.6981AL ROY 3
TolesLANOFL2448105172539717374680.10330.7012
CarpenterSTL3BL3012947333118137134204681.19340.7011
Ho KangPIT3BR29103318153228224963482.00370.7418
SeagerLANSSL221576274213626121463591.12410.6856NL ROY
YelichMIAOFL241555781851784444277891.45420.6851NL MVP 19, SS
SantanaCLE1BB301585829644109203576091.65430.7032
SchimpfSDN2BL28892762810163951402393.65450.7513
BradleyBOSOFL2615655838107881581047494.05460.6928
TrumboBALOFR3015961392[2]332182532394.54470.7479AL SS
KinslerDET2BR341536187113177681217695.35480.6945
HealyOAK3BR2472269193931562714929102.13510.6850
PearceBAL1BR338526420186187683169104.58530.6772
BourMIA1BL289028026270211775592107.19540.6957
DavisOAKOFR281505559431826530329108.31550.7414
HarperWASOFL23147506441004328316155109.42560.6904
NapoliCLE1BR34150557483425306161105109.90570.7089
DahlCOLOFL22632228228298168253116.90670.6727
RosalesSDN3BR3310521431356535121666120.91720.7019
BautistaTOROFR3511642370796833355132122.03740.6811
CarterMIL1BR291605499564137422557123.68780.7164
RossCHNCR3967166137371035188148125.58800.6989
SeguraARI2BL261536377223032694957125.84810.6535NL MVP 13
VargasMIN1BB2547152212319134616153127.03830.6969
DavisBAL1BL301575662619115378169115133.77950.6913
GrandalLANCB27126390254141735414186134.00960.7012NL MVP 22
PedroiaBOS2BR32154633452932481137138135.831020.6409
SwansonATLSSR2238129832971903424154136.991030.6372
FowlerCHNOFB301254561225430811516142148.111170.6391
HazelbakerSTLOFL28114200253914633133383152.811260.6752
RiveraNYN2BR273310540925397[5]12891156.731340.6358
MaybinDETOFR2994349113992241624209159.431380.6279
ZuninoSEACR255516440071242024596180.101550.6745
ReckerATLCR3233904583085510713175182.331580.5976
PerazaCINSSR22722413673913128112231239.712430.5553
DeShieldsTEXOFR2374182[4]310428417414419342.853750.4991

The multi-criteria approach is also helpful for learning about trade-offs among the criteria. Figure 2 shows plots involving just two criteria which can be compared to the idealized plot in Fig. 1. The criteria HRr and SLG have the second highest correlation (Table 2) which is evident by the linear relationship shown in Fig. 2 (b). Based upon just these two criteria, a reduced POS (POSr) would just consist of Sanchez, who is also the reduced Batman. Nevertheless, other POS1 hitters are rather scattered throughout these two criteria. The criteria OBP and SLG have the sixth highest correlation (Table 2) and the linear relationship can be seen in Fig. 2 (d). Now, the POSr from just these two criteria would consist of Sanchez, Ortiz, and Trout. The POS1 hitters are all located in the upper right of the plot. There are indications that SLG and OBP are redundant statistics in identifying these best hitters. The criteria Rr and RBIr have the third smallest correlation (Table 2) as is evident by the larger cloud of points in Fig. 2 (a). The POSr consists of a longer front than that in Fig. 2 (d) which contains Ortiz, Arenado, and Trout. Even though the POS1 hitters are all located in the upper right of the plot, the criteria values are also more spread out than they are in Fig. 2 (d). The criteria HRr and AVG have the smallest correlation (Table 2) as might be expected given the differences between power hitters and base hitters. Due to this conflict, the PF shown in Fig. 2 (b) is the longest, and the POSr consists of the six hitters LeMahieu (more of a base hitter), Murphy, Votto, Cabrera, Ortiz, and Sanchez (more of a power hitter). The criteria HRr and AVG demonstrate the most conflict among any pair of these criteria.

Fig. 2

Plots of the Pareto front involving (a) Rr and RBIr, (b) HRr and SLG, (c) HRr and AVG, (d) OBP and SLG.

Plots of the Pareto front involving (a) Rr and RBIr, (b) HRr and SLG, (c) HRr and AVG, (d) OBP and SLG.

4Secondary Pareto Optimal Set

Another tier of hitters can be identified in a secondary Pareto optimal set (POS2). The second-tier players are non-dominated according to (2) using a restricted candidate set in which the POS1 players are first removed, or ℵ2 = ℵ - POS1. The POS2 in this section is developed from the same MLB hitting data considered in section 3 for the 2016 season which consists of the n = 473 hitters, but without the 19 hitters on POS1 identified in Section 3.

POS2 is shown in Table 4 and consists of 49 well known hitters where the scored ratio to Batman ranges up to 0.79 (Cruz) and down to 0.50 (DeShields). POS 2 also identifies several award winners which is consistent with one of the objectives of sabermetrics. POS2 contains the NL ROY (Seager), the AL ROY 3 (Naquin), and the NL ROY 5 (Diaz). It also contains 5 Silver Slugger award winners (Betts, Cespedes, Seager, Yelich, Trumbo) along with various other hitters who received MVP votes. POS2 notably also contains a number of well-known hitters including Beltre (2004, 2010, 2011, 2014 SS), Goldschmidt (2013, 2015 NL MVP 2 and SS), Cano (2005 AL ROY 2 and 2006, 2010, 2011, 2012, 2013 SS), Machado (2015 AL MVP 4 and 2016 AL MVP 5), Harper (2012 NL ROY, 2015 NL MVP), Bautista (2010, 2011, 2014 SS), and Pedroia (2007 AL ROY, 2008 AL MVP and SS). These batters had good hitting seasons in 2016, but may not have received the same accolades as they did in previous seasons.

A few players have scored ratio to Batman greater than 0.75 (Cruz, Betts), and a few more have scored rankings in the top 19 (Beltre, Cano, Cespedes). POS2 contains players who have Rr ranked 4 (DeShields), HRr ranked 2 (Trumbo), RBIr ranked 5 (Ramirez), AVG ranked 5 (Rivera), OBP ranked 3 (Goldschmidt), and SLG ranked 9 (Cruz). While POS2 contains good hitters according to some of these metrics, these players are dominated by players in POS1 in terms of the other metrics. For example, Goldschmidt performs well in OBP, but is dominated by Trout and Votto. Trumbo hit the most HR in 2016 (47) which produces the second highest HRr (0.0767), but Sanchez has higher HRr (0.0995), and Sanchez dominates Trumbo in all other categories.

5Predicted Pareto Optimal Set

It is expected that particular hitters have an advantage to be Pareto optimal due to hitter friendly parks, team affiliations, or through regular playing time. Certain fielding positions are also reserved for the bigger and better hitters. A predicted Pareto optimal set (PPOS) can be constructed from multivariate predictions that are adjusted for these variables. The Multivariate Analysis of Variance (MANOVA) (Rencher and Christensen, 2012) is a tool that can be used to assess the predictor variable contributions and to obtain the predicted criteria values. Pareto optimality and weighted aggregation is then applied to these multivariate predicted values and the resulting uncertainty is propagated through the analyses using repeated sampling or the parametric bootstrap (Efron and Tibshirani, 1993, Section 6.5) from the predictive distribution.

The same MLB hitting data for 2016 is used here as that in sections 3 and 4. However, only qualifying hitters are included who meet the minimum plate appearance (PA) requirement of 502. For validation, MLB hitting data for 2017 is taken from baseballguru.com. There are 146 qualifying hitters for the 2016 season and 144 qualifying hitters for the 2017 season. Predictions are based upon the data from the 2016 season. The predictor variables (x variables) considered here are listed in Table 1. Factors, such as Team, Fielding Position, and Bats, include indicator variables for each level, except for the reference level (Kutner et al., 2004, pp. 313–324). The multiple criteria, or multiple response variables (y variables) are listed in Table 1 which can be denoted as the matrix Y. The partial Wilks’ Lambda test is conducted for the six predictors and for quadratic terms involving Age, G, AB to identify statistically important predictors. The test results are shown in Table 5 for the reduced model containing the statistically important predictors. Based upon these statistical test results, the predictions are based upon TEAM, FP1, BATS, AB, G, and G2 where the latter term denotes the quadratic trend in games played. Table 6 gives the estimated coefficients (Bˆ) from which the predicted hitting performance values are obtained as Yˆ=XBˆ where X is the design matrix containing the values of the predictors for all of the qualifying hitters.

Table 5

Partial MANOVA tests of the statistically important predictor variables using the Wilks’ Lambda test

predictordftest statapprox Fnum Dfden DfPr(> F)
TEAM290.1091.59174603< 0.0001
FP150.5002.5630406< 0.0001
BATS20.6613.8712202< 0.0001
AB10.53614.596101< 0.0001
G10.78314.5961010.0003
G210.7665.1561010.0001
Table 6

MANOVA regression coefficient estimates

predictorRrHRrRBIrAVGOBPSLG
(Intercept)1.31210.43110.65490.48092.07550.4453
TEAM ARI–0.01780.0003–0.0199–0.0277–0.0250–0.0146
TEAM ATL–0.0392–0.0151–0.0371–0.0347–0.0423–0.0459
TEAM BAL–0.03530.0041–0.0335–0.0580–0.0746–0.0399
TEAM BOS–0.0163–0.0037–0.0052–0.0224–0.0238–0.0188
TEAM CHA–0.0545–0.0148–0.0375–0.0451–0.0550–0.0548
TEAM CHN–0.0098–0.0020–0.0100–0.0409–0.0154–0.0160
TEAM CIN–0.0252–0.0040–0.0235–0.0343–0.0461–0.0545
TEAM CLE–0.0193–0.0067–0.0253–0.0434–0.0420–0.0353
TEAM DET–0.0284–0.0013–0.0276–0.0265–0.0348–0.0436
TEAM HOU–0.0303–0.0118–0.0350–0.0341–0.0338–0.0455
TEAM KCA–0.0585–0.0109–0.0449–0.0573–0.0803–0.0617
TEAM LAA–0.0226–0.0141–0.0283–0.0189–0.0185–0.0552
TEAM LAN–0.0275–0.0094–0.0383–0.0297–0.0376–0.0293
TEAM MIA–0.0498–0.0203–0.0438–0.0221–0.0362–0.0557
TEAM MIL–0.02500.0069–0.0172–0.0377–0.0351–0.0449
TEAM MIN–0.01930.0012–0.0360–0.0559–0.0307–0.0365
TEAM NYA–0.0431–0.0120–0.0446–0.0383–0.0492–0.0550
TEAM NYN–0.02270.01930.0024–0.0374–0.0290–0.0204
TEAM OAK–0.0382–0.0027–0.0330–0.0413–0.0659–0.0502
TEAM PHI–0.0559–0.0152–0.0567–0.0406–0.0538–0.0576
TEAM PIT–0.0346–0.0130–0.0261–0.0293–0.0322–0.0433
TEAM SDN–0.0133–0.0139–0.0464–0.0621–0.0697–0.0384
TEAM SEA–0.03220.0057–0.0184–0.0355–0.0432–0.0326
TEAM SFN–0.0335–0.0157–0.0244–0.0379–0.0357–0.0413
TEAM STL–0.0307–0.0114–0.0282–0.0252–0.01930.0059
TEAM TBA–0.04970.0033–0.0254–0.0481–0.0660–0.0488
TEAM TEX–0.03010.0014–0.0122–0.0232–0.0406–0.0333
TEAM TOR–0.02680.0037–0.0108–0.0460–0.0397–0.0494
TEAM WAS–0.01830.00420.0044–0.0302–0.0168–0.0126
FP1 2B0.0073–0.0158–0.0341–0.0010–0.0186–0.0352
FP1 3B0.0088–0.0062–0.0191–0.0061–0.0132–0.0131
FP1 C0.0010–0.0073–0.01800.0077–0.0010–0.0047
FP1 OF0.0091–0.0075–0.0260–0.0135–0.0199–0.0295
FP1 SS–0.0085–0.0189–0.0394–0.0132–0.0324–0.0643
BATS B–0.0012–0.0043–0.01300.01160.01050.0050
BATS L0.0014–0.0035–0.00950.00050.00870.0025
AB0.00010.00010.00000.00040.00010.0006
G–0.0169–0.0059–0.0078–0.0039–0.0245–0.0288
G20.0000590.0000220.0000300.0000090.0000850.000098

It is also necessary to account for uncertainty in the predictions of future performance. This is accomplished using a parametric bootstrap where the performance criteria are obtained using replicate draws (REP) for each player i(yiREP) from the MANOVA predictive distribution

(5)
Normal(yˆi,[1+xi(XX)-1xi]S),
where yˆi is the predicted criteria for player i, xi contains the covariate values for player i, and S is the estimated covariance matrix from MANOVA (Rencher and Christensen, 2012, pp. 370–371). The predictive distribution in (7) accounts for the uncertainty in the predictions and the uncertainty in the responses (performance criteria). A total of 500 replicate draws are taken from (7). For each draw, players on the primary predicted Pareto optimal set (PPOS1) are identified and their scored ranking of the criteria is calculated. The weights for the scoring are based upon EFA for the 2016 data. Uncertainty is then assessed by examining the proportion of replicate draws in which a player in included in PPOS1 (pPPOS1) and a 90% percentile interval is calculated for the scored ranking of the replicate draws.

Validation of the predictions is performed using the data from the 2017 season. Table 7 shows the 19 players who are actually Pareto optimal (POS1) for the 2017 season. Uncertainty in the predictions is shown in the percent of replicate draws in which these players are predicted to be Pareto optimal (pPPOS1) and the 5%, 50%, 95% percentiles of the scored rankings from the replicate draws using the predictive distribution in (7). The criteria pPPOS1 provides a summary on a 0-1 scale where higher percentages denote a higher probability of that player being PPOS1. The ranking of the players in terms of pPPOS1 (pPPOS1r) is also shown in Table 7. The percentile intervals can be quite wide demonstrating large variability in the scored rankings across all of the criteria.

Table 7

Pareto optimal hitters along with covariate values, ranks of the performance criteria, and scored ratios to Batman for the 2017 season. Prediction information from the 2016 season is included using the proportion of time the player is predicted to be Pareto optimal (pPPOS1), player ranking in terms of pPPOS1 (pPPOS1r), and the 5%, 50%, 90% percentile values of the scored rankings from the parametric bootstrap. Comments are included to characterize the POS results

nameLastTEAMFP1BATSAGEGABRr.RHRr.RRBIr.RAVG.ROBP.RSLG.Rs.rankr.ranks.BatmanpPPOSlpPPOSlr5%50%95%comment
TroutLAAOFR26114402242117828.7210.83570.1564310.0064.00122.00POS1
JudgeNYAOFR25155542[1]25469310.2420.8736NANANANANANA
VottoCIN1BL3416255911172277912.2530.77180.182315.0041.50113.95POS1
GoldschmidtARI1BR30155558516229121112.3340.78870.3702.0020.0082.95PPOS1, POS2
StantonMIAOFR281595976[1][1]5123[1]12.5050.8624NANANANANANA
BlackmonCOLOFL3115964442840217415.8560.76350.410[5]2.0026.5091.00POS1
FreemanATL1BL28117440918411511616.5170.74740.152487.0050.00120.95POS1
ArenadoCOL3BR26159606332331329616.6680.76360.798[1]1.004.0043.95PPOS1, POS1
ZimmermanWAS1BR331445242510622521019.5290.7638NANANANANANA
CruzSEAOFR37155556378440281320.28100.75620.158416.0554.00120.95POS1, PPOS2
OzunaMIAOFR271596136424711311523.83110.73290.02011537.05107.50143.00
MurphyWAS2BL321445342069245191725.95140.70150.0985910.0567.00130.00
RendonWAS3BR2714750847521023162127.59150.70860.302173.0028.0091.95PROS2
AltuveHOU2BR2715359010777911141632.19200.77830.222256.0040.00114.95POS1
GonzalezHOUOFB281344557347922282432.52220.70020.1145411.0561.00124.95
TurnerLAN3BR331304575262485102433.51240.67970.0747013.0072.00133.00
DavisOAKOFR301535664461311762633.56250.81580.0429220.0081.00136.95POS2
AlonsoSEA1BL301424514621618324142.15320.75550.03610023.0589.00140.00traded
FreesePIT3BR3413042614112610588[1]138102.721140.5888NANANANANANA

Four of the players in Table 7 (Judge, Stanton, Zimmerman, and Freese) are not qualifiers in the 2016 season and so do not have predictions. Goldschmidt, Blackmon, and Arenado are on the actual POS1 for 2017 and are in the top 10 in terms of pPPOS1 based upon the 2016 predictions. The percentile intervals for these players contain low scored rankings. In particular, Arenado is predicted to be POS1 in nearly 80% of the replicate draws and 90 percent of his scored rankings are between 1 and 44. Arenado and Blackmon likely have high predictions since they play for COL. Goldschmidt and Rendon are POS1 for 2017 and also have pPPOS1 values greater than 0.3. These predictions are helpful since these players are not POS1 for 2016 as shown in Table 3. Cruz and Altuve are POS1 for 2017 and these players do have a relatively high pPPOS1 of 0.158, and 0.222, respectively.

As would be expected, not all players on POS1 for 2017 are predicted well based upon just the 2016 performance data. Ozuna, Murphy, Turner, Gonzalez, Davis, Alonso had surprising seasons in 2017, even though pPPOS1 for Gonzalez and Murphy is about 0.1. The scored rankings of Ozuna and Trout for 2017 do not fall within the 90% percentile interval based upon the predictions. The pPPOS1 values are 0.156 for Trout and 0.182 for Votto which do not rank overly high even though they are POS1 for 2017. The prediction model penalizes Trout in terms of R, HR, SLG and penalizes Votto in terms of AVG, OBP, SLG, so that these players perform better than expected according to the predictions. On the other hand, there are some players in the top 10 of pPPOS1 from the 2016 predictions who are not POS1 for 2017. Cabrera of DET (pPPOS1 = 0.372), Cano of SEA (pPPOS1 = 0.338), Gonzalez of COL (pPPOS1 = 0.412) each spent time on the disabled list in 2017. Such injuries may have impacted their 2017 season and some of these players did not perform as well as expected. Betts of BOS has a high pPPOS1 of 0.554, but had a disappointing 2017 season, particularly in terms of AVG and OBP. Encarnacion (CLE) and Rizzo (CHN) also have high pPPOS1 values of 0.468 and 0.386, respectively. While they do not appear on POS1 for 2017, they do appear on POS2 for 2017. Thus, their performance was good, even though it may not have been as good as predicted.

6Pareto Optimal Set with Other Criteria

It is important to recognize that the proposed multi-criteria approach can be easily implemented with any collection of criteria that a manager believes best characterizes player performance. This is the first advantage of the proposed multi-criteria approach mentioned in section 1. However, selection of the criteria is critical in that it must represent the player performance characteristics that are of specific interest to the performance assessment. A few considerations are presented below when it comes to thinking about the multiple criteria and how they compare to some popular sabermetrics. The multi-criteria approach is also demonstrated in this section using wOBA, wRC+, and WAR.

The multiple criteria previously considered are the widely available traditional measures of batting performance (R, HR, RBI, H, OBP, SLG). There are concerns about these traditional hitting statistics. Even though Grabiner (2014) mildly endorses AVG, it has been criticized by some, such as Albert (2010), for not addressing other ways to get on base, and for not distinguishing the type of hit. The statistics R and RBI have been criticized because they depend upon other factors that may not directly reflect player contribution such as scoring off the hit of another batter or needing to have runners on base (Grabiner 2014). Albert (2010) presents strong arguments that better measures of hitting are OBP, SLG, and OPS. On the other hand, rather than focusing on arguments that a statistic is flawed, it can be informative to recognize that statistics measure different aspects of hitting that cannot be captured in a single sabermetric. As stated by Grabiner (2014), “Batting average does fairly well because it counts hits, but it ignores power and walks, which are also important.” That point does not necessarily mean AVG is flawed, but that it measures a different aspect of hitting than does OBP or SLG. An advantage of the multi-objective approach is in its ability to work with a multitude of statistics that account for different aspects of player performance and for its ability to evaluate the trade-offs among these different aspects. This is the third advantage of the multi-criteria approach that is mentioned Section 1.

Concerns about traditional hitting statistics could lead a sabermatrician to apply the proposed approach with a different set of hitting performance measures. Some popular sabermetric statistics can be formulated from (4). In particular, this includes the following:

(8)
On-Base Plus Slugging (OPS)=OBP+SLG,(8)Gross Production Average (GPA)=(1.8×OBP)+SLG)/4,Runs produced (RP)=R+RBI--HR,Isolated Power (ISO)=SLG-AVG,Secondary Average (SecA)=BBAB+TBAB-HAB+SBAB-CSAB.

Another more complicated statistic, which is similar in form to SecA, is Weighted On-Base Average (wOBA) that is scaled by PA rather than AB. In particular, OPS is touted by Albert (2010) as a modern sabermetric. Yet, OPS is merely a specific weighted aggregation of OBS and SLG. However, this may not the best combination of OBP and SLG to represent hitting performance for a particular group of hitters. For example, Grabiner (2014) mentions the linear combination should be 1.2×OBP + SLG where the value 1.2 is usually ignored. The second advantage of the multi-criteria approach mentioned in section 1 is that such combinations and weights do not have to be specified as the performance in terms of OBP and SLG can be simultaneously considered and evaluated.

Some managers have become quite accustomed to particular sabermetrics for measuring player performance. As mentioned previously, the multi-criteria approach can be applied to any collection of criteria. As a demonstration, consider the collection wOBA, wRC+, and WAR defined by Fangraphs (2018b). Weighted On-Base Average (wOBA) combines different aspects of hitting and weights them according to the actual run value. Weighted Runs Created Plus (wRC+) attempts to credit a batter for the value of a hitting outcome while controlling for park, league, and year effects. Wins Above Replacement (WAR) is designed to measure overall player contribution, beyond just hitting, by comparing team wins compared to a replacement player. The data for these three sabermetrics are taken from Fangraphs (2018a) for the 2016 season and contains 146 hitters who are qualifying hitters with more than 502 plate appearances (PA).

For these three sabermetrics, Table 8 gives the Primary Pareto optimal set (POSS1) and the Secondary Pareto optimal set (POSS2). The weights for these three criteria using EFA for the 2016 data were determined to be 0.36 for wOBA, 0.36 for wRC+, and 0.28 for WAR. The players are ordered in Table 8 according to the scored rankings. Ortiz is included on POSS1 since he had the highest observed wOBA which is slightly higher than that for Trout. If it were not for this value, then Trout would be the Batman with respect to these three criteria for which he still is very close (0.9991). POSS2 consists of Donaldson, Bryant, Murphy, Votto, and Betts. All these players are dominated by Trout and Ortiz who both have higher wOBA and wRC+ than any player on the secondary front. Figure 3 shows the players in POSS1 and POSS2 in terms of just two criteria. The pairwise correlation between wOBA and wRC+ is evident as expected since wRC+ is similar to wOBA, but controls for park, league, and year effects. Thus, player rankings for 2016 using wOBA are quite similar to those using wRC+. There is some conflict between wOBA and WAR since WAR measures additional player contributions beyond just hitting. The plot of wRC+ and WAR looks quite similar. The Pareto optimal sets POSS1 and POSS2 are much smaller than those presented in section 4 (POS1) and section 5 (POS2). This is due to fact that only three sabermetrics are used as the criteria and that two of these are highly correlated (0.9837). Nevertheless, all the players identified in POSS1 and POSS2 are included in POS1, except for Betts who is included in POS2. Betts is in POSS2 in large part due to his high value for WAR, which is the second highest.

Table 8

Primary and Secondary Pareto optimal hitters based upon the sabermetrics wOBA, wRC+, WAR listed in order of the scored ranking (s.rank)

nameTeamGPAwOBAwRC+WARs.rankr.ranks.BatmanPOS
TroutAngels1596810.4181709.61.3610.9991primary
DonaldsonBlue Jays1557000.4031557.44.7220.8903secondary
MurphyNationals1425820.4081555.76.6030.8450secondary
BryantCubs1556990.3961487.86.9640.8812secondary
VottoReds1586770.4131595.37.2060.8461secondary
OrtizRed Sox1516260.4191644.511.1680.8385primary
BettsRed Sox1587300.3791378.312.0890.8578secondary
Fig. 3

Plots of the players in terms of (a) wOBA and WAR, (b) wOBA and wRC+with labels for those players on the Primary and Secondary front. In (a), players on the Primary Pareto front are connected with a solid line while players on the Secondary Pareto front are connected with a dotted line.

Plots of the players in terms of (a) wOBA and WAR, (b) wOBA and wRC+with labels for those players on the Primary and Secondary front. In (a), players on the Primary Pareto front are connected with a solid line while players on the Secondary Pareto front are connected with a dotted line.

It should be noted that there are also concerns about the use of these sabermetrics since wOBA does not adjust for hitting friendly parks, wRC+ does not differentiate positions, and WAR may not be developed enough to conduct player rankings (Fangraphs, 2018b). On the other hand, players have been ranked based upon WAR by the Baseball-Reference (2018). Due to these types of concerns, Fangraphs (2018) recommend in their discussion of WAR that one “should always use more than one metric at a time when evaluating players”. The proposed multi-criteria approach conducts this very task conveniently and efficiently.

7Summary

A multi-objective approach is proposed in this paper that allows for informative comparisons of players in terms of multiple performance criteria, avoids complex combinations of the criteria into single metrics, and allows trade-offs among the criteria to be evaluated. The approach is demonstrated for evaluating baseball player batting performance through simultaneous consideration of multiple performance metrics. Traditional metrics, such as R, H, HR, RBI, AVG, OBP, SLG, are initially used for these evaluations. The primary Pareto optimal set (POS1) identifies those batters who are non-dominated or who cannot be beat with respect to these criteria. The secondary Pareto optimal set (POS2) identifies a second group of batters who are non-dominated apart from those batters in POS1.The Multiple Analysis of Variance (MANOVA) is used to generate predictions while also addressing the uncertainty in the predictions and the uncertainty associated with the criteria. The uncertainty can then be propagated to the Pareto optimal sets and to the scored rankings for predicting performance results in an upcoming season. Weighted rankings or the relative distance to the utopia point (Batman) are also shown to be helpful when it comes to ordering players with respect to the multiple criteria.

As an implementation example, the website mlb.com/stats contains statistics from which selected players can be ranked. Figure 4 shows a default view from this website for the 2016 MLB season. A few differences in the rankings can be observed from Fig. 4 and the previous rankings presented here due to their restriction to players who are qualifying hitters. However, simple adjustments to Fig. 4 can be made so that it can accommodate multiple criteria. That is, the user could be allowed to select multiple criteria items from among the hitting criteria such as R, HR, RBI, AVG, OBP, and SLG. Then the proposed calculations can be quickly applied so that players who are in POS1 are highlighted and players are then ordered according to their scored ranking across the criteria or according to their relative distance from Batman. As a result, Fig. 4 would more closely match that in Table 3.

Fig. 4

Screen shot of www.mlb.com/statsfor2016.

Screen shot of www.mlb.com/statsfor2016.

The proposed approach is useful to casual fans, fantasy baseball players, and general managers for identifying the top players according to multiple criteria and to evaluate trade-offs among the criteria. The approach can be implemented rather easily with any collection of criteria that is perceived to best represent the type of player performance that is of interest. The criteria can be traditional, modern, or some combination. In addition, it is often necessary to fill out baseball rosters by position, in which case it makes more sense to implement the multi-criteria selection procedures separately for each position and likely with different criteria according to the expectations pertaining to that position. This approach could even involve a combination of hitting and fielding criteria. The proposed approach is also not limited to baseball. Baseball is often regarded as the sabermetric sport due to the wide availability of data, but other sports are now collecting various types of performance-based data. These multi-objective techniques would be useful tools to enhance sabermetrics.

Acknowledgments

The authors would like to thank the two anonymous referees for their suggestions and corrections that led to substantial improvements in the manuscript.

References

1 

Albert, J. , 2010, Sabermetrics: The Past, the Present, and the Future. URL: https://ww2.amstat.org/mam/2010/essays/AlbertSabermetrics.pdf.

2 

Ardakani, M. K. and Wulff, S. S. , (2013) , An overview of optimization formulations for multiresponse surface problems, Quality and Reliability Engineering International, 29: , pp. 3–16.

3 

Baseball-Reference. 2018, URL: https://www.baseball-reference.com/leaders/.

4 

Coello Coello, C. A. , Lamont, G. B. and Van Veldhuizen, D. A. , (2007) , Evolutionary Algorithms for Solving Multi-Objective Problems, 2nd edn. New York: Springer.

5 

Efron, B. and Tibshirani, R. J. , (1993) , An Introduction to the Boot-strap. New York: Chapman & Hall.

6 

FanGraphs. 2018a, FanGraphs Leaders. URL: https://www.fangraphs.com.

7 

FanGraphs. 2018b, FanGraphs Sabermetrics Library. URL: https://www.fangraphs.com/library/offense.

8 

Grabiner, D. , 2014, The sabermetric manifesto. The Base-ball Archive. URL: https://seanlahman.com/baseball-archive/sabermetrics/sabermetric-manifesto/.

9 

Koop, G. , (2002) , Comparing the performance of baseball players: A multiple-output approach, Journal of the American Statistical Association, 97: , pp. 710–720.

10 

Kutner, M. H. , Nachtsheim, C. J. and Neter, J. , (2004) , Applied Linear Regression Models, 4th edn. Boston: McGraw Hill.

11 

Marler, R. T. and Arora, J. S. , (2004) , Survey of multi-objective optimization methods for engineering, Structural Multidisciplinary Optimization, 26: , pp. 369–395.

12 

Ngatchou, P. , Zarei, A. and El-Sharkawi, M. A. , (2005) , Pareto multi objective optimization, Proceedings of IEEE International Conference on Intelligent Systems ApplicationToPower Systems; Washington, DC, pp. 84–91.

13 

Rencher, A. C. and Christensen, W. F. , (2012) , Methods of Multivariate Analysis, 3rd edn. Hoboken, NJ: Wiley.