You are viewing a javascript disabled version of the site. Please enable Javascript for this site to function properly.

# AAD and least-square Monte Carlo: Fast Bermudan-style options and XVA Greeks

#### Abstract

We show how Adjoint Algorithmic Differentiation (AAD) can be used to calculate price sensitivities in regression-based Monte Carlo methods reliably and orders of magnitude faster than with standard finite-difference approaches. We present the AAD version of the celebrated least-square algorithms of Tsitsiklis and Van Roy (2001) and Longstaff and Schwartz (2001). By discussing in detail examples of practical relevance, we demonstrate how accounting for the contributions associated with the regression functions is crucial to obtain accurate estimates of the Greeks, especially in XVA applications.

## 1Introduction

The efficient calculation of the risk factor sensitivities of financial derivatives, also known as the “Greeks”, is an essential component of modern risk management practices. Indeed, the aftermath of the recent financial crisis has seen remarkable changes in the market practice whereby financial institutions need to quantify (and risk-manage) counterparty, funding and capital exposures, collectively known as XVA, in large portfolios, see e.g., Crépey et al. (2014).

The traditional approach for the calculation of the Greeks is the so-called bump and reval or bumping technique. This comes with a significant computational cost as it generally requires repeating the calculation of the P&L of a portfolio under hundreds of market scenarios in order to form finite-difference estimators. As a result, in many cases, even after deploying vast amounts of computer power, these calculations cannot be completed in a practical amount of time.

Conversely, Adjoint Algorithmic Differentiation (AAD), a numerical technique recently introduced to financial engineering (see e.g., Capriotti (2011); Capriotti et al. (2011); Capriotti and Giles (2012, 2010); Henrard (2014)), has proven to be effective for speeding up the calculation of risk factor sensitivities, both for Monte Carlo (MC) and deterministic numerical methods, see Capriotti and Lee (2014); Savickas et al. (2014); Capriotti et al. (2015); Xu et al. (2016); Geeraert et al. (2017).

The main ideas underlying AAD can be illustrated by considering a computer implemented function of the form

##### (1)
Y=FUNCTION(X)
mapping a vector Xn to a vector Ym through a sequence of intermediate steps
##### (2)
XUVY.

Here, the real-valued vectors U and V represent intermediate variables utilised in the calculation. Each step may be a distinct high-level function or even a specific instruction.

AAD, sometimes also simply known as adjoint mode of Algorithmic Differentiation (AD), results from “propagating” the derivatives of the final output with respect to all the intermediate variables – the so called adjoints – until the derivatives with respect to the independent variables are formed. Using the standard AD notation, the adjoint of any intermediate variable Vk is defined by

##### (3)
V¯k=j=1mY¯jYjVk,
where Y¯ is a vector in m . For each of the intermediate variables Ui, by applying the chain rule, we get
U¯i=j=1mY¯jYjUi=j=1mY¯jkYjVkVkUi,
which corresponds to the adjoint mode equation for the intermediate step represented by the function V = V (U). We thus have a function of the form U¯=V¯(U,V¯) where
U¯i=kV¯kVkUi.

Starting from the adjoint of the outputs Y¯ , we may apply this rule to each step in the calculation, working from the right to the left,

##### (4)
X¯U¯V¯Y¯
until we obtain X¯ , namely, the linear combination of the rows of the Jacobian of the function X → Y:
##### (5)
X¯i=j=1mY¯jYjXi
for i = 1, …, n.

One particularly important theoretical result is that given a computer program performing some high-level function (1), the execution time of its adjoint counterpart

##### (6)
X¯=FUNCTION_b(X,Y¯)
(with suffix _b for “backward” or “bar”) that computes the linear combination (5), is bounded by three to four times the cost of execution of the original one. That is,
##### (7)
Cost[FUNCTION_b]Cost[FUNCTION]ωA
where ωA ∈ [3, 4], see Griewank and Walther (2008).

In this paper we present the application of AAD to regression-based MC approaches (also known as least-square MC) such as those that are widely utilised for Bermudan-style options, see Carriere (1996); Tsitsiklis and Van Roy (2001); Longstaff and Schwartz (2001), or for XVA applications, see Cesari et al. (2009) and Joshi and Kwon (2016). We develop the AAD implementation of the well-known least-square algorithm for the computation of conditional expectations, and we investigate numerically the impact on the Greeks arising from the sensitivities of the regression functions, a component that is generally ignored for Bermudan-style options by invoking arguments of quasi-optimality of the exercise boundary.

The paper is organised as it follows: In the next section, the regression-based MC algorithm for both Bermudan-style options and XVA is presented. In Section 3 we discusses the AAD algorithms for the regression-based MC method. We give two numerical examples, the best of two stocks Bermudan-style call and its corresponding XVA in Section 4. Here we show how smoothening out discontinuities associated with suboptimal exercise boundaries improves the accuracy of the Greeks of Bermudan-style options, and why the contribution to the sensitivities arising from the regression boundaries is essential for an accurate computation of XVA sensitivities. The efficiency and accuracy of AAD is also compared with bump and reval approaches in the same section.

## 2Valuation of Bermudan-style options and XVA by regression-based Monte Carlo

### 2.1Bermudan-style options

While European-style options can be exercised only at final maturity, Bermudan-style options can be exercised on multiple dates up to trade expiry.

We denote by T1, …, TM the exercise dates of the option and define D(t)={Tmt} . We denote by η (t) the smallest integer such that Tη(t)+1 > t. An exercise policy is represented mathematically by a stopping time taking values in D(t) . We denote by T(t) the set of stopping times taking values in D(t) , see e.g., Andersen and Piterbarg (2010).

A rational investor will exercise the option that he holds in such a way as to maximise its economic value. As a result, the value of a Bermudan-style option is the supremum of the option value over all possible exercise policies. With the notation introduced above, the value of a Bermudan-style option at time t can thus be expressed by

##### (8)
V(t)N(t)=supτT(t)𝔼t[E(τ)N(τ)],
where E (t) is the exercise value of the option, and N (t) is the chosen numéraire1, Andersen and Piterbarg (2010). In this equation, V (t) is to be interpreted as the value of the option conditional on exercise not having taken place strictly before time t. Here and below we indicate with 𝔼t the expectation, under the risk-neutral measure, conditional on the filtration of the random process driving the option price, up to time t. For notations and standard mathematical assumptions of this conventional setting see, e.g., Wilmott et al. (1995).

A useful concept is the hold value of the Bermudan-style option: We denote by H (t) the value of the Bermudan-style option when the exercise dates are restricted to D(Tη(t)+1) , that is

##### (9)
H(t)N(t)=𝔼t[V(Tη(t)+1)N(Tη(t)+1)],
where we have assumed, for simplicity of exposition, no cashflow between t and Tη(t)+1.

The option holder, following an optimal exercise policy, will exercise his option if the exercise value is larger than the hold value, i.e.,

##### (10)
V(Tη(t))=max(E(Tη(t)),H(Tη(t))).

This, when combined with Equation (9), leads to the so-called dynamic programming formulation:

##### (11)
H(t)N(t)=𝔼t[max(E(Tη(t)+1)N(Tη(t)+1),H(Tη(t)+1)N(Tη(t)+1))],
for Tη ≤ t < Tη+1, and η = 1, …, M - 1. Starting from the terminal condition H (TM) ≡0, Equation (11) defines a backward iteration in time for H (t). By definition, this is also equal to V (t) if t is not an exercise date, i.e., if Tη(t) < t < Tη(t)+1. Conversely, if t is an exercise date, t = Tη(t), then V (Tη(t)) = max(E (Tη(t)) , H (Tη(t))).

The dynamic programming formulation above implies that the stopping time, defining the optional exercise date as seen at time t, is given by

##### (12)
τ=inf[Tmt:E(Tm)H(Tm)].

The optimal exercise strategy defined by Equation (12) requires the computation of the hold value H (t), m = η (t) + 1, …, M - 1. In a setting in which the underlying risk factor process {X (t) } 0≤tT is a generic k-dimensional Markov process, the hold value H (t) is a function of the state vector at time t. That is,

##### (13)
Ht(x):=𝔼[N(X(t))N(X(Tm+1))V(X(Tm+1))|X(t)=x].

When the dimension of the Markov process k is small enough, the conditional expectation value in Equation (13) can be computed in a straightforward way by discretising the risk-factor process and performing standard backward induction on a tree or a grid, or by discretising an associated Partial Differential Equation (PDE). Here we refer to, e.g., Wilmott et al. (1995). However, the complexity of grid-based calculations is exponential in the dimension of the Markov process and numerical implementations become infeasible when k ≥ 4.

As we will review in Section 2.3, regression-based MC techniques provide an effective way of computing conditional expectation values of the form (13).

### 2.2XVA

We next consider the computation of the Credit Valuation Adjustment (CVA) and the Debt Valuation Adjustment (DVA) as the main measures of a dealer’s counterparty credit risk, see e.g., Cr'epey et al. (2014). For a given portfolio of trades with the same investor or institution, the CVA (resp. DVA) aims to capture the expected loss (resp. gain) associated with the counterparty (resp. dealer) defaulting in a situation in which the position, netted for any collateral posted, has a positive mark-to-market for the dealer (resp. counterparty).

This can be evaluated at time T0 = 0 by

##### (14)
XVA=-𝔼[𝕀(τcT)LcN(τc)(V(τc))++𝕀(τdT)LdN(τd)(V(τd))-],
where τc (resp. τd) is the default time of the counterparty (resp. the dealer), V (t) is the net present value of the portfolio or netting set at time t from the dealer’s point of view (the so-called conditional future exposure), Lc (resp. Ld) is the loss given default of the counterparty (resp. the dealer), and 𝕀(τcT) (resp. 𝕀(τdT) ) is the indicator that the counterparty’s (resp. dealer’s) default happens before the longest deal maturity T in the portfolio. Here, for simplicity of notation, we consider the unilateral CVA and DVA, the generalization to the bilateral formulation, see e.g., Crépey et al. (2014), being straightforward.

Equation (14) is typically computed on a discrete time grid of “horizon dates” 0 = T0 < T1 < … < TM = T. For instance, we may have

##### (15)
XVA-m=1M𝔼[Lc(SPc(Tm-1)-SPc(Tm))(V(Tm))+N(Tm)+Ld(SPd(Tm-1)-SPd(Tm))(V(Tm))-N(Tm)],
where SPc (t) (resp. SPd (t)) is the survival probability of the counterparty (resp. the dealer) up to time t, e.g., conditional on a realization of the default intensity (or hazard rate) process in a Cox framework, see Lando (1998). Here we assume that the default times τc and τd are independent of the portfolio values Vτc and Vτd, respectively. In general, the right hand side of Equation (15) depends on several correlated random market factors, including interest rate, recovery amounts, and all the market factors the net conditional future exposure of the portfolio, V (t), depends on. As such, its calculation typically requires a MC simulation.

In the k-dimensional Markov setting introduced above the conditional future exposure V (Tm) is a function V (X (Tm)) of the state vector at time Tm. However, only for vanilla securities and simple models for the evolution of the risk factors, such conditional future exposures can be expressed in closed form, and regression based Monte Carlo is commonly employed to produce approximate estimators see, e.g., Cesari et al. (2009).

### 2.3Conditional expectation values and Bermudan-style options by regression

Regression methods are based on the observation, see e.g., Friedman et al. (2001), that given a real-valued random input vector Xd and Y a real valued random output, the conditional expectation 𝔼[Y|X] is the function of X that best approximates in the least-square sense the output Y. That is,

##### (16)
𝔼[Y|X]=argminc𝔼[(Y-c)2].

In particular, assuming that the conditional expectation is a linear function of some unknown vector of parameters β,

##### (17)
𝔼[Y|X]=βTX,

Equation (16) reduces to the well-known linear regression conditions, giving for the optimal vector of parameters

##### (18)
β=𝔼[XXT]-1𝔼[XY].

In the context of the valuation of Bermudan-style options, the hold value (13) on an exercise date Tm is assumed to be of the form

##### (19)
Hˆm(x):=HˆTm(x)=βmTψ(x)
where ψ (x) = (ψ1 (x) , …, ψd (x)) T is a vector of d basis functions and βm = (β1m, …, βdmT is the vector of coefficients to be determined by regressing N (X (Tm))/N (X (Tm+1)) V (X (Tm+1)) versus ψ (X (Tm+1)). This gives
##### (20)
βm=Ψm-1Ωm,
where we define the d × d matrix
##### (21)
Ψm=𝔼[ψ(X(Tm))ψT(X(Tm))]
and the d × 1 vector
##### (22)
Ωm=𝔼[N(X(Tm))V(X(Tm+1))N(X(Tm+1))ψ(X(Tm))].

These equations provide a straightforward recipe to compute the regression coefficients βm by substituting Ψm and Ωm with their sample average over NMC MC replications, namely:

• (R1) Simulate NMC independent MC paths Xm(n) of X (Tm) by the recursion

##### (23)
Xm+1(n)=F(Tm,Xm(n),θ)
for m = 0, …, M - 1, and n = 1, …, NMC. Here F is a function based on the chosen models for the risk factors, and θNθ is a vector of Nθ model parameters.

• (R2) For n = 1, …, NMC, compute the terminal payoff of the contract by setting

##### (24)
VM(n)=EM(n),
where EM:=E(XM(n)) is the final exercise value of the option.

• (R3) Apply the following backward induction steps for m = M - 1, …, 1:

• (a) Compute the MC sample average of Ψm and Ωm2 by

##### (25)
Ψm=1NMCn=1NMCψm(n)(ψm(n))T,
##### (26)
Ωm=1NMCn=1NMCψm(n)Nm(n)Vm+1(n)Nm+1(n),
where ψm(n):=ψ(Xm(n)) and Nm(n):=N(Xm(n)) .

• (b) Compute the regression coefficients βm by matrix inversion and multiplication:

##### (27)
βm=Ψm-1Ωm.

• (c) For the estimate of the hold value Hm(n):=Hm(Xm(n)) , set

##### (28)
Hm(n)=βmTψm(n),
for n = 1, …, NMC.

• (d) For the estimate of the Bermudan-style option value at time Tm, set

##### (29)
Vm(n)=max(Em(n),Hm(n)),
where Em(n):=E(Xm(n)) is the exercise value at time Tm, for n = 1, …, NMC.

• (R4) Compute the MC estimate of the Bermudan-style option at time T0 by

##### (30)
V0=1NMCn=1NMCV1(n)N1(n).

This approach was introduced by Tsitsiklis and Van Roy (2001) and they showed that the estimator V0 converges for n→ ∞ to the true value V (0) provided that the representation (19) holds exactly.

A modification of this algorithm was proposed by Longstaff and Schwartz (2001) and it entails replacing Equation (29) in Step R3 (d) with

##### (31)
Vm(n)={Em(n)ifEm(n)>Hm(n),Nm(n)Vm+1(n)/Nm+1(n)otherwise,
which, in the examples considered, was shown to lead to more accurate results. In the following, however, for simplicity of exposition, we will consider the estimator in Equation (29).

### 2.4Lower bound algorithm for Bermudan-style options

The hold value obtained by regression as described in the previous section defines an exercise policy whereby on each exercise date Tm the option is exercised if

##### (32)
E(X(Tm))>βmTψ(X(Tm)).

Such policy, being an approximation of the solution of the dynamic programming Equation (11), will in general correspond to a suboptimal stopping time. As a result, when utilised in a second, independent, MC simulation, the exercise policy obtained by regression, will result in a lower-bound estimator for the Bermudan-style option value. The corresponding algorithm can be schematically described as it follows.

For each MC replication indexed by n = 1, …, NMC perform steps (L1) to (L4) below:

• (L1) Simulate the path Xm(n) of the risk factor vector X (Tm) as in (R1).

• (L2) For m = 1, 2, . . . , M - 1, compute the approximate hold value of the option at time Tm using the associated regression vector βm, and regression functions ψ, by

##### (33)
Hm(n)=βmTψm(n)
with the hold value at expiry TM set to zero.

• (L3) Compute the path-wise estimator for the discounted cash-flows of the option

##### (34)
P(n)=m=1M[1(n)(t1,tm)1(Em(n)>Hm(n))Em(n)Nm(n)],
where
##### (35)
1(n)(t1,tm)=(i=1m-11(Hi(n)>Ei(n))),
for m > 1, and the convention 1(n) (t1, t1) =1.

• (L4) Compute the MC estimate of the Bermudan-style option at time T0 = 0 by

##### (36)
V0=1NMCn=1NMCP(n).

### 2.5XVA by regression

As described in Section 2.2, the calculation of the XVA in Equation (15) requires the conditional future exposure V (t) on a set of dates determined by a discretisation time grid T1 … TM. The regression algorithm described in the previous section can be easily adapted to compute such quantity. Indeed, the conditional value of each of the options contained in the netting set can be obtained using the same least-square procedure.

Once the regression algorithm is completed, we can use the regression functions to compute the hold value of each option in the portfolio on the discretisation time grid by Hm=βmTψm . If the discretisation time Tm is not an exercise opportunity for the option under consideration, then this is also its conditional future exposure. Conversely, the conditional future exposure is obtained by comparing the hold value to the exercise value as in Equations (29) and (31). These observations translate in the following algorithm.

For each MC replication indexed by n = 1, …, NMC perform steps (X1) to (X3) below:

• (X1) Simulate the path Xm(n) of the risk factor vector by the recursion:

##### (37)
Xm+1(n)=F(Tm,Xm(n),θ),
for m = 0, …, M - 1. Simulate the path of the counterparty’s and the dealer’s default intensity, λmd,c=λd,c(Tm) by the recursions
##### (38)
λm+1c,(n)=Gc(Tm,λmc,(n),θ),
##### (39)
λm+1d,(n)=Gd(Tm,λmd,(n),θ),
for m = 0, …, M - 1, where Gc (resp. Gd) is the function describing the dynamics of the counterparty’s (resp. the dealer’s) hazard rate.

• (X2) Compute the (discretised) path-wise survival probabilities for the counterparty and the dealer by

##### (40)
SPmc,(n)=exp[-j=0m-1λjc,(n)(Tj+1-Tj)],
##### (41)
SPmd,(n)=exp[-j=0m-1λjd,(n)(Tj+1-Tj)],
for m = 1, 2, . . . , M.

• (X3) For m = 1, 2, . . . , M - 1, approximate the hold value of the p-th option in the portfolio at time Tm using the associated regression vector βp,m and regression functions ψp by

##### (42)
Hp,m(n)=βp,mTψp,m,
with p = 1, …, P. The hold value at the expiry date TM is set to zero. The conditional expectation value of the portfolio is given by Vm(n) = p=1PVp,m(n) , where
##### (43)
Vp,m(n)={max{Hp,m(n),Ep,m(n)},ifTmisanexercisedateforthep-thoptionHp,m(n),otherwise,
for m = 1, 2, . . . , M, where Ep,m(n) is the exercise value of the p-th option at time Tm on the n-th path.

• (X4) Compute the path-wise XVA by

##### (44)
XVA(n)=-m=1M[Lc(SPm-1c,(n)-SPmc,(n))(Vm(n))+Nm(n)+Ld(SPm-1d,(n)-SPmd,(n))(Vm(n))-Nm(n)].

• (X5) Form the MC estimator

##### (45)
XVA=1NMCn=1NMCXVA(n).

## 3The AAD algorithm for regression-based Monte Carlo

### 3.1Function regularizations

In the context of MC methods, Capriotti and Giles (2012) show that AAD allows to calculate the sensitivities by differentiating the relevant estimator on a path by path basis. As a path-wise method, the MC estimators must satisfy specific regularity conditions, see Glasserman (2003). For instance, all the functions appearing in each step leading to the computation of the payout estimator must be Lipschitz continuous. A practical way of addressing non-Lipschitz estimators is to smoothen out the singularities they contain. This can be achieved by observing that in most cases the singularities in the payout functions, although not necessarily implemented as such, can be expressed in terms of Heaviside functions. For instance, the payoff function of a digital option (for a one-dimensional underlying asset) is,

##### (46)
P(X(T))=1(X(T)>K),
while the payoff of a knock-out, path-dependent option with barrier monitored at the time T1, …, TM is of the form
##### (47)
P(X(T1),,X(TM))=m=1M1(X(Tm)>Bm)
where Bm is the the barrier level at time Tm.

The singularities in such payoff functions can be regularized by replacing the indicator function with one of its smoothened counterparts. A very common choice, for instance, is to approximate the step function with a “call spread” payoff functions,

##### (48)
1(x>K)Hδ(x-K)=(min(x-(K-δ)2δ,1))+,
where 0 < δ ⪡ K. This is a standard choice for digital options, because it has a useful interpretation in terms of the hedging portfolio of a long and a short position in two calls with strike price K - δ and K + δ. This regularization gives rise to functions that are Lipschitz continuous with respect to x and can be differentiated in a straightforward manner. In particular, the adjoint regularized Heaviside function (48) is given by
##### (49)
H¯δ(x-K,x¯)={x¯/2δifK-δxK+δ0otherwise
where x¯ is the adjoint of the input variable. When δ → 0, (48) gives the correct derivative of the Heaviside function in the distributional sense, i.e., the Dirac delta function. However, while approaching this limit, its derivative is zero or vanishingly small, apart from a very small portion of the sample space where instead it is very large. This leads to exceedingly large variances in the MC sampling of the estimators expressed in terms of such functions, signalling the breakdown of the Lipschitz continuity condition. Hence, the choice of the smoothening parameter δ is necessarily a tradeoff between the bias, vanishing for δ → 0, and the statistical errors of the MC sampling, diverging in the same limit.

In general, the payoff estimator for Bermudan-style options (34) is not differentiable with respect to the pathwise value of the approximate exercise boundary Hm(n) , and it requires the regularization described above. A common approximation among practitioners, see e.g., Leclerc et al. (2009), is to assume that that exercise boundary implied by the rule (32) is close to optimality so that the value of the contract is approximatively continuous across the exercise boundary and no regularization is required. Under this assumption, no contribution to the sensitivities is associated with the perturbations of the exercise boundary, and one can therefore keep the regression coefficients fixed while calculating the sensitivities, see Andersen and Piterbarg (2010). As discussed in Section 4, depending on the accuracy of the basis functions in representing the exercise boundary, this may or may not be an accurate approximation.

### 3.2AAD for the lower bound algorithm for Bermudan-style options

The AAD implementation of the lower bound algorithm for Bermudan-style options described in Section 2.4, producing the MC estimators for the sensitivities of the estimator (36) with respect to a set of model parameters θk, k = 1, …, Nθ, given by

##### (50)
θ¯k=V(θ)θk,
consists of the adjoint of steps (L1) to (L4) executed backwards for each MC replication n = 1, …, NMC. After setting the adjoint of the option value V¯0=1 , the adjoint of the model parameters θ¯=0 , and the adjoint of the regression coefficients β¯m=0 , the steps to perform are:
• ( L4¯ ) Set

##### (51)
P¯(n)=V¯01NMC.

• ( L3¯ ) Assuming the indicator functions in the estimator in (34) have been regularized as discussed in Section 3.1, compute

##### (52)
H¯m(n)=H¯δ(Hm(n)-Em(n),P¯(n))(Jm(n)-Qm(n)),E¯m(n)=P¯(n)Rm(n)+H¯δ(Em(n)-Hm(n),P¯(n))(Jm(n)-Qm(n)),
N¯m(n)=-P¯(n)Qm(n)Hδ(Em(n)-Hm(n))Nm(n)
for m = M, …, 1 and where
Jm(n)=k=m+1MEk(n)Nk(n)1(Ek(n)>Hk(n))j=2,jmk-11(Hj(n)>Ej(n)),Qm(n)=Em(n)Nm(n)1(n)(t1,tm),Rm(n)=1Nm(n)1(Em(n)>Hm(n))1(n)(t1,tm).

Here we also adopt the conventions j=2,jm1=1 , M+1M=0 . Initialise the adjoints of the risk factors so that

##### (53)
X¯m(n)=E¯m(n)Em(n)Xm(n)+N¯m(n)Nm(n)Xm(n).

• ( L2¯ ) For m = M, …, 1, update the adjoint of the regression coefficients βm to give

##### (54)
β¯m+=ψm(n)H¯m(n),
as well as the adjoints of the basis functions ψm(n) ,
##### (55)
ψ¯m(n)=βmH¯m(n),
and update the adjoints of the state vector
##### (56)
X¯m(n)+=(ψ¯m(n))Tψm(n)Xm(n),
where we use the standard notation “+ =” for the addition assignment operator.

• ( L1¯ ) For m = M, …, 0 compute the adjoint of the risk factor evolution such that

##### (57)
X¯m(n)+=X¯m+1(n)FXm(n)(Tm,Xm(n);θ),θ¯+=X¯m+1(n)Fθ(Tm,Xm(n),θ),
where the gradients are computed by applying the rules of adjoint differentiation following the instructions that implement the function F. Finally, the adjoint X¯0 is utilised to populate the component of θ¯ corresponding to the adjoint of the model parameter X0.

### 3.3AAD for XVA by regression

The AAD implementation of the algorithm for the calculat ion of XVA described in Section 2.5, producing the MC estimators for the sensitivities of the estimator (45) with respect to a set of model parameters θk, k = 1, …, Nθ, given by

##### (58)
θ¯k=XVA(θ)θk,
consists of the adjoint of steps (X1)-(X5) executed backwards for each MC replication n = 1, …, NMC:
• ( X5¯ ) Set the adjoint of the XVA value, XVA¯=1 , the adjoint of the model parameters θ¯=0 and

##### (59)
XVA¯(n)=XVA¯1NMC.

• ( X4¯ ) For m = M, …, 1 compute:

##### (60)
V¯m(n)=-XVA¯(n)Nm(n)[Lc(SPm-1c,(n)-SPmc,(n))1(Vm(n)>0)+Ld(SPmd,(n)-SPmd,(n))1(Vm(n)<0)],
##### (61)
N¯m(n)=XVA¯(n)(Nm(n))2[Lc(SPm-1c,(n)-SPmc,(i))(Vm(n))++Ld(SPm-1d,(n)-SPmd,(n))(Vm(n))-],
##### (62)
SP¯mc,(n)=XVA¯(n)[Vm(n)Nm(n)(1-δm,0)-Vm+1(n)Nm+1(n)]1(Vm(n)>0),
##### (63)
SP¯md,(n)=XVA¯(n)[Vm(n)Nm(n)(1-δm,0)-Vm+1(n)Nm+1(n)]1(Vm(n)<0),
where δm,n is the Kronecker symbol, and VM+1(n)/NM+1(n)=0 .

• ( X3¯ ) For m = M, …, 1, set V¯p,m(n)=V¯m(n) and compute the adjoint of Equation (43) by

##### (64)
H¯p,m(n)=V¯p,m(n)1(Hp,m(n)>Ep,m(n)),E¯p,m(n)=V¯p,m(n)1(Hp,m(n)<Ep,m(n)),
if Tm is an exercise date for the p-th option, and
##### (65)
H¯p,m(n)=V¯p,m(n),E¯p,m(n)=0,
otherwise. Initialise the adjoints of the risk factors,
##### (66)
X¯m(n)=E¯p,m(n)Ep,m(n)Xm(n)+N¯m(n)Nm(n)Xm(n).
Initialise the adjoint of the regression coefficients βp,m, and update them together with the adjoints of the basis functions ψp,m, and the adjoints of the state vector as in Equations (54) to (56).

• ( X2¯ ) Update the adjoint of the simulated hazard rates according to

##### (67)
λ¯mc,(n)+=-(Tm+1-Tm)j=m+1MSP¯mc,(n)SPmc,(n),
##### (68)
λ¯md,(n)+=-(Tm+1-Tm)j=m+1MSP¯md,(n)SPmd,(n).

• ( X1¯ ) For m = M, …, 0 compute the adjoint of the hazard rate evolution

##### (69)
λ¯mc,(n)+=λ¯m+1c,(n)Gcλmc,(n)(Tm,λmc,(n);θ),λ¯md,(n)+=λ¯m+1d,(n)Gdλmd,(n)(Tm,λmd,(n);θ),
##### (70)
θ¯+=λ¯m+1c,(n)Gcθ(Tm,λmc,(n);θ),θ¯+=λ¯m+1d,(n)Gdθ(Tm,λmd,(n);θ),
where the gradients can be computed through the AAD implementation of the function G and of the risk factor evolution as in step ( L1¯ ).

### 3.4AAD for the regression algorithm

In the AAD implementations presented in Sections 3.2 and 3.3 the adjoint of the regression coefficients in Equation (54) do not contribute to the calculation of sensitivities. As mentioned in Section 3.1, and illustrated with numerical examples in Section 4, this can be justified as a reasonable approximation in the case of Bermudan-style options when the exercise boundary approximated by regression is close to the optimal one. However, such arguments are generally approximations for Bermudan-style options, and cannot be invoked at all when regression is utilised for XVA. In this case, the contributions associated with the sensitivities of the regression coefficients must be taken into account in order to obtain accurate estimates of the model parameters sensitivities. In this section, we discuss how these contributions can be computed by the AAD implementation of the least-square MC algorithm described in Section 2.3.

• ( R4¯ ) Skip this step if the regression algorithm is utilised in conjunction with a second independent simulation for Bermudan-style options or in the context of XVA. Initialise the adjoint of the option value V¯0=1 , the adjoint of the model parameters θ¯=0 , the adjoint of the regression coefficients β¯m=0 , m = 1, …, M - 1 and set

##### (71)
V¯1(n)=V¯0NMC1N1(n),N¯1(n)=-V¯0NMCV1(n)(N1(n))2,
for n = 1, …, NMC.

• ( R3¯ ) For m = 1 to M - 1, we have:

• ( d¯ ) For n = 1, …, NMC, compute

##### (72)
E¯m(n)=V¯m(n)1(Em(n)>Hm(n)),H¯m(n)=V¯m(n)1(Em(n)<Hm(n)),
and initialise the adjoint of the risk factor path value Xm(n) by
##### (73)
X¯m(n)=E¯m(n)Em(n)Xm(n).

• ( c¯ ) Set:

##### (74)
β¯m+=n=1NMCψm(n)H¯m(n),
and initialise the adjoint of the basis functions ψ as in Equation (55)

• ( b¯ ) Compute the adjoint of the two intermediate variables Ωm and Ψm in Equation (27) using the results in Giles (2008) by

##### (75)
Ω¯m=Ψm-Tβ¯m,Ψ¯m=-Ω¯mβmT.

• ( a¯ ) For n = 1, …, NMC compute the adjoint of Equation (26) by

##### (76)
ψ¯m(n)+=1NMCNm(n)Vm+1(n)Ω¯m,V¯m+1(n)=1NMCNm(n)(ψm(n))TΩ¯m,
N¯m(n)=1NMCVm+1(n)(ψm(n))TΩ¯m,
where Nm(n)=Nm(n)/Nm+1(n) , and compute the adjoint of Equation (25) by
##### (77)
ψ¯m(n)+=12NMC(Ψ¯m(n)+(Ψ¯m(n))T)ψm(n).

Then update the adjoint of the risk factor vectors by

##### (78)
X¯m(n)+=N¯m(n)Nm+1(n)Nm(n)Xm(n)+ψ¯Tψ¯Xm(n),X¯m+1(n)+=-N¯m(n)Nm(n)Nm+1(n)Nm+1(n)XTm+1(n).

• ( R2¯ ) Compute the adjoint of the risk factor vector at expiry by

##### (79)
X¯M(n)=V¯M(n)EM(n)XM(n),
for n = 1, …, NMC.

• ( R1¯ ) For m = M, …, 0, compute the adjoint of the risk factor evolution as in step ( L1¯ ).

## 4Numerical results

### 4.1Best of two assets Bermudan-style option

As a first example, we consider the classical case of a Bermudan-style option on the maximum of two assets under a standard lognormal model for the asset price processes {X1 (t)} and {X2 (t)}. The payoff function at exercise time t is given by

##### (80)
(max{X1(t),X2(t)}-K)+,
where K is the strike price. A similar example is also studied in Glasserman (2003); Broadie and Glasserman (1997) and Broadie and Glasserman (2004). Andersen and Broadie (2004) obtain nearly exact estimates for this example using twelve basis functions, including the value of a European-style option. We assume the option can be exercised every three months up to 3 years. We assume the underlying assets are independent geometric Brownian motion processes with the same initial values X (0), volatilities σ, and dividend rates d. In particular, we choose X1 (0) = X2 (0) =1, σ1 = σ2 = 0.2 and d1 = d2 = 0.1 and the risk-free rate r = 0.05. We compute the Bermudan-style option prices, the Deltas (i.e., the sensitivity with respect to the spot values, Xi (0)) and the Vegas (i.e., the sensitivity with respect to the volatilities σi) for K = 0.9, 1.0 and 1.1 with AAD and bumping. We also compare these with their “exact” value, which is obtained by a PDE approach (using finite differences to estimate sensitivities). The numerical results are found in Table 1 and in Fig. 1 demonstrating an excellent agreement between the AAD and PDE results. The regression basis functions we adopt are the terms of the third-order polynomial
##### (81)
{1,x1,x2,x12,x22,x1x2,x13,x12x2,x1x22,x23}
and the payoff functions up to power three, i.e.,

##### (82)
{(max(x1,x2)-K)+,((max(x1,x2)-K)+)2,((max(x1,x2)-K)+)3}.

##### Fig.1

Deltas (left panel) and Vegas (right panel) for the Bermudan-style option in (80) as a function of strike. The smoothening parameter in the call spread regularization (48) is δ = 0.005. The number of MC paths is 400,000. The results are obtained with the setting (81) and (82). The MC uncertainty (in parenthesis) is computed using the binning technique with 20 bins for each set of simulations.

##### Table 1

Prices, Deltas and Vegas for the Bermudan-style option in (80) with three different strike values. The smoothening parameter in the call spread regularization (48) is δ = 0.005. The number of MC paths employed in the AAD approach is 400,000. The results are obtained with the setting (81) and (82). The MC uncertainty (in parenthesis) is computed using the binning technique of Capriotti and Giles (2010) with 20 bins for each set of simulations

 K = 0.9 Price Delta Vega AAD 0.2012(2) 0.415(3) 0.456(2) PDE 0.20107 0.41423 0.45740 K = 1.0 AAD 0.1396(1) 0.337(2) 0.486(2) PDE 0.13959 0.33588 0.48440 K = 1.1 AAD 0.0941(2) 0.258(1) 0.463(2) PDE 0.09431 0.25635 0.46253

Here the smoothening parameter for the calculation of the Greeks, discussed in Section 3.1, δ = 0.005, was chosen as a reasonable compromise between variance and bias of the estimator. This is illustrated in Fig. 2, showing how for δ = 0.005 the bias introduced by the finite δ is of the same order of magnitude of the statistical uncertainty for the chosen computational budget, and is negligible for any practical application.

##### Fig.2

AAD Delta (left panel) and Vega (right panel) of the Bermudan-style option in (80) for K = 1 vs the smoothening parameter δ for the call spread regularization (48). The number of simulated paths is 3,000,000 for δ = 0.01 and is increased as δ is decreased in order to keep statistical uncertainties roughly constant. The results are obtained with the setting (81) and (82). The values in the graphs are fitted based on a quadratic polynomial function (green lines).

As discussed in Section 3.1, neglecting to smoothen out the exercise boundary, although common in the financial practice, introduces a bias in the computation of sensitivities because the exercise boundary is in general not optimal. This is illustrated in Fig. 3, in which we compare the Delta, with and without smoothening, for different choices of the basis functions. These results are obtained with the inputs (81) and (82). Here for Delta, the smoothened estimator turns out to be more accurate, especially when the exercise boundary obtained by regression is a less accurate approximation of the real one. However, it is difficult to establish a priori whether the unsmoothened estimator provides a smaller or a larger bias than the smoothened one. This is because the bias introduced by the lack of smoothening may be offset by the bias arising from the sub-optimality of the exercise boundary. This is illustrated in the right panel of Fig. 3 showing that for Vega the smoothened and unsmoothened estimators have a comparable accuracy. In any case, a consideration to keep in mind is that smoothening the exercise boundary is generally required to obtain stable second-order risk values.

##### Fig.3

AAD Delta (left panel) and Vega (right panel) of the Bermudan-style option in (80) for K = 1 as obtained with the unsmoothened and the smoothened estimators with the call-spread regularization (48) for five choices of the regression basis functions. The number of simulated paths is 1,000,000 with 20 bins and the smoothening parameter is δ = 0.005.

The quasi-optimality of the exercise boundary is also generally invoked among industry practitioners as a justification for neglecting the contributions to the sensitivities arising from the exercise boundary. Clearly, the quality of this approximation is dependent on the accuracy of the regression functions in reproducing the actual exercise boundary. This is illustrated in Fig. 4 where we plot Delta and Vega of the Bermudan-style option (80) for different choices of the regression basis functions. Here we compare the results obtained by the AAD algorithm, as described in Section 3.4, in the case where a) the exercise boundary is kept fixed, and b) when accounting instead for its contributions to the sensitivities. As expected, the difference between the two approaches vanishes as the regression functions become more accurate. However, as it is shown for the Vega, it can lead to a significant bias if a simple (e.g., linear or quadratic) representation of the exercise boundary is adopted.

##### Fig.4

AAD Delta (left panel) and Vega (right panel) of the Bermudan-style option in (80) for K = 1 as obtained by keeping the exercise boundary fixed (Fixed) and accounting instead for its contributions to the sensitivities (Flexible). The number of simulated paths is 1,000,000 with 20 bins and the call-spread smoothening parameter is δ = 0.005.

### 4.2XVA sensitivities

As another example, we compute the sensitivities of XVA (14) for the same option defined in Equation (80). Here, for simplicity, we assume that the hazard rate and the volatility are piecewise constants. The following results are obtained with the specifications (81) and (82). The XVA sensitivities with respect to some of the model parameters, namely the term structure of hazard rates of the counterparty and of volatilities of the underlying, obtained with the AAD algorithm described in Section 2.5 are compared with the ones obtained by standard bumping in Tables 2 and 3. As expected, the AAD sensitivities are in excellent agreement with those obtained by bumping with any discrepancies attributable to the bias of the finite-difference approach completely masked in this example by the MC uncertainties.

##### Table 2

The XVA sensitivities with respect to the piecewise volatility. The results are obtained with the setting (81) and (82). Both sets of the results are computed with 1,000,000 MC paths and 25 bins

 σ0 σ1 σ2 σ3 σ4 σ5 AAD 0.00298(1) 0.00297(2) 0.00294(1) 0.00283(1) 0.00271(1) 0.00255(1) Bumping 0.00298(1) 0.00297(2) 0.00294(1) 0.00283(1) 0.00271(1) 0.00255(1) σ6 σ7 σ8 σ9 σ10 σ11 AAD 0.00236(1) 0.00218(1) 0.00201(1) 0.00183(1) 0.00165(1) 0.001450(8) Bumping 0.00236(1) 0.00218(1) 0.00201(1) 0.00183(1) 0.00165(1) 0.001450(8)
##### Table 3

The XVA sensitivities with respect to the piecewise hazard rates. The results are obtained with the setting (81) and (82). Both sets of the results are computed with 1,000,000 MC paths and 25 bins

 λ0 λ1 λ2 λ3 λ4 λ5 AAD 0.03337(1) 0.03336(2) 0.03329(2) 0.03306(3) 0.03269(3) 0.03218(3) Bumping 0.03337(1) 0.03336(2) 0.03329(1) 0.03306(3) 0.03269(3) 0.03218(3) λ6 λ7 λ8 λ9 λ10 λ11 AAD 0.03151(3) 0.03074(4) 0.02989(4) 0.02893(4) 0.02784(4) 0.02669(4) Bumping 0.03151(3) 0.03074(3) 0.02989(4) 0.02893(4) 0.02784(4) 0.02669(4)

Similar to the case of Bermudan-style option Greeks, keeping the regression boundary fixed while computing the sensitivities, introduces a bias. However, for XVA, this issue is much more severe than in the case of the Bermudan-style option Greeks because no quasi-optimality argument can be invoked. As shown in Fig. 5, the volatilities sensitivities obtained by bumping and AAD keeping into account of the regression contributions are almost identical. Instead, the results of AAD keeping the regression coefficients fixed are remarkably different and, if utilised for risk management, would lead to significant mis-hedging. On the other hand, as expected, the XVA sensitivities with respect to the hazard rates are not affected by the contribution arising from the regression functions. This is because the hazard rates do not enter the computation of the portfolio value conditional on default, and hence do not appear in the regression parameters.

##### Fig.5

XVA sensitivities with respect to the piecewise volatility (left panel) and hazard rate (right panel), computed by AAD keeping into account of the contributions of the regression coefficients (Flexible), by AAD keeping the regression coefficients fixed (Fixed), and by finite-differences (Bumping). The number of MC paths is 1,000,000 and the number of bins is 25. The results are obtained with the setting (81) and (82).

Finally, the remarkable computational efficiency of the AAD approach is illustrated in Fig. 6. Here we plot the cost of computing the XVA sensitivities with respect to the term structure of the counterparty hazard rate and the underlying volatility, relative to the cost of performing a single valuation. The calculation of the sensitivities by means of AAD can be performed in about three times the cost of computing the XVA value, that is, well within the theoretical bound (7). In contrast, the cost of bumping, for one-sided finite-difference estimators, is in general (1 + N) times the cost of as single valuation, where N is the number of model parameters, i.e., in this case over 20 times the cost of computing the value of the XVA.

##### Fig.6

Ratio of the CPU time required for the calculation of the XVA, and its sensitivities, and the CPU time spent for the computation of the XVA alone.

## 5Conclusions

We have shown how Adjoint Algorithmic Differentiation (AAD) can be utilised to implement efficiently the computation of sensitivities in regression-based MC methods. We have derived the AAD version of the celebrated least-square algorithms of Tsitsiklis and Van Roy (2001) and Longstaff and Schwartz (2001) and, by discussing in detail examples of practical relevance, we have demonstrated how accounting for the sensitivities contributions associated with the regression functions is crucial to obtain accurate estimates of the Greeks in XVA applications and for Bermudan-style options, especially when the exercise boundary is not particularly accurate. We have also shown how smoothening out the discontinuities associated with suboptimal exercise boundaries can lead in some situations to more accurate estimates of the sensitivities.

From a computational stand point, similarly to what was previously found in other MC and PDE settings, see e.g., Capriotti and Giles (2012); Capriotti et al. (2015), the proposed method allows the computation of the complete first-order risk at a cost which is at most four times the cost of calculating the value of the portfolio itself. This typically results in orders of magnitude of savings in computational time with respect to standard finite-difference approaches, thus making AAD an ever so indispensable technique in the toolbox of modern financial and insurance engineering.

## Acknowledgments

It is a pleasure to thank Jacky Lee, for numerous useful discussions, and for a careful reading of the manuscript. We are also grateful to the anonymous referee for the feedback and comments. After completion of this manuscript we became aware of the work performed simultaneously by Antonov on a similar application of AAD to Callable Exotics, see Antonov (2016). The opinions and views expressed in this paper are uniquely those of the authors, and do not necessarily represent those of Credit Suisse Group.