# Andersen-Piterbarg Integration Limits for the Time Dependent Heston Model

The normalized characteristic function $\phi_t(z)$ of a piecewise constant time dependent Heston model

$\begin{array}{rcl} d \ln S_t&=& \left(r_t - q_t - \frac{\nu_t}{2}\right)dt + \sqrt\nu_t dW^{S}_t \nonumber \\ d\nu_t&=& \kappa_t\left(\theta_t-\nu_t \right ) dt + \sigma_t\sqrt\nu_t dW^{\nu}_t \nonumber \\ \rho_t dt &=& dW^{S}_tdW^{\nu}_t \end{array}$

with $n$ time intervals $[t_0=0, t_1], ... ,[t_{n-1}, t_n]$ and constant parameters within these intervals

$\kappa_t = \kappa_j \wedge \theta_t = \theta_j \wedge \sigma_t=\sigma_j \wedge \rho_t=\rho_j \forall j\in [1, n] \wedge t\in [t_{j-1}, t_j]$

is given by the recurrence relation

$\begin{array}{rcl} k_j &=& \kappa_j - iz\rho_j\sigma_j \nonumber \\ d_j &=& \sqrt{k_j^2 +\sigma_j^2(z^2+iz)} \nonumber \\ g_j &=&\displaystyle \frac{k_j- d_j}{k_j + d_j} \nonumber \\ \tilde{g_j} &=& \displaystyle \frac{k_j- d_j - D_{j+1}\sigma_j^2}{k_j + d_j - D_{j+1}\sigma_j^2} \nonumber \\ D_j &=& \displaystyle \frac{k_j + d_j}{\sigma_j^2}\frac{g_j-\tilde{g_j} e^{-d_j \tau_j}}{1-\tilde{g_j} e^{-d_j\tau_j}} \nonumber \\ C_j &=& \displaystyle \frac{\kappa_j\theta_j}{\sigma_j^2} \left( (k_j- d_j )\tau_j - 2\ln\left(\frac{1-\tilde{g_j}e^{-d_j\tau_j}}{1-\tilde{g_j}}\right)\right) + C_{j+1} \nonumber \\ \tau_j &=& t_j - t_{j-1} \nonumber \\ \phi_{t_n}(z) &=& \exp{\left(C_1(z)+D_1(z)\nu_0\right)}\end{array}$

and the initial condition

$C_{n+1} = D_{n+1} = 0$

It should be noted that the complex logarithm in this formulation can be restricted to the principal branch without introducing any discontinuities [1]. It is important to know the asymptotic behaviour of $C_1(z)+D_1(z)\nu_0$ in order to calculate the truncation point for the integral over the characteristic function when using the Andersen-Piterbarg approach with control variate [2]. A Mathematica script gives

$\begin{array}{rcl} \displaystyle \lim_{u\to\infty}\frac{D_1(u)}{u} &=& \displaystyle-\frac{i\rho_1 + \sqrt{1-\rho_1^2} }{\sigma_1} \nonumber \\ \displaystyle\lim_{u\to\infty}\frac{C_1(u)}{u} &=& \displaystyle -\sum_{j=1}^n \frac{\kappa_j\theta_j}{\sigma_j}\left(\sqrt{1-\rho_j^2} + i\rho_j\right)\tau_j\end{array}$

The implementation of the Andersen-Piterbarg method for the piecewise constant time dependent Heston model is part of the pull request #251.

[1] Afshani, S. (2010) Complex logarithms and the piecewise constant extension of the Heston model.

[2] Andersen, L. and Piterbarg, V. (2010) Interest Rate Modeling, Volume I: Foundations and Vanilla Models, (Atlantic Financial Press London).

# New Semi-Analytic Heston Pricing Algorithms

The pricing engines for the Heston model in QuantLib have aged a bit. Meanwhile newer and better algorithms have been developed and discussed in the literature. For a comprehensive review please see [1][2]. Time to refurbish the existing engines.

The Heston model is defined by the following stochastic differential equation of the log spot

$\begin{array}{rcl} d \ln S_t&=& \left(r_t - q_t - \frac{\nu_t}{2}\right)dt + \sqrt\nu_t dW^{S}_t \nonumber \\ d\nu_t&=& \kappa\left(\theta-\nu_t \right ) dt + \sigma\sqrt\nu_t dW^{\nu}_t \nonumber \\ \rho dt &=& dW^{S}_tdW^{\nu}_t \end{array}$

The normalized characteristic function in the Gatheral formulation is given by

$\phi_t(z) = \exp\left\{\frac{v_0}{\sigma^2}\frac{1-e^{-Dt}}{1-Ge^{-Dt}}\left(\kappa-i\rho\sigma z-D\right) + \frac{\kappa\theta}{\sigma^2}\left(\left(\kappa-i\rho\sigma z-D\right)t-2\ln\frac{1-Ge^{-Dt}}{1-G}\right) \right\}$

$\begin{array}{rcl} D&=&\sqrt{\left(\kappa - i\rho\sigma z\right)^2+\left(z^2+iz\right)\sigma^2} \nonumber \\ G&=&\displaystyle\frac{\kappa -i\rho\sigma z-D}{\kappa -i\rho\sigma z + D}\end{array}$

Andersen and Piterbarg [3] introduced a Black-Scholes control variate to improve the numerical stability of Lewis’s formula (2001) for the price of a vanilla call option

$\begin{array}{rcl} C(S_0, K, T)&=&BS\left(S_0, K, T, \sigma_{BS}\right) \nonumber \\ &+& \frac{\sqrt{Se^{(r-q)t}K}e^{-rt}}{\pi}\displaystyle\int\limits_{0}^{\infty}{\Re\left( \frac{\phi^{BS}_T\left(u-\frac{i}{2}\right) - \phi_T\left(u-\frac{i}{2}\right)}{u^2+\frac{1}{4}} e^{i u \left(\ln\frac{S}{K}+(r-q)T\right) } \right) \mathrm{d}u}\end{array}$

with the Black-Scholes price $BS\left(S_0, K, T, \sigma_{BS}\right)$ of a vanilla call option with volatility $\sigma_{BS}$ and the characteristic function

$\phi^{BS}_T(z)=e^{-\frac{1}{2}\sigma_{BS}^2 T (z^2 + iz)}$

Different proposals for the volatility of the vanilla option have been brought up in order to achieve an optimal control variate:

• $\sigma_{BS}^2 = \nu_0$
• $\sigma_{BS}^2 = \frac{1}{\kappa t}\left(1-e^{-\kappa t}\right)\left(\nu_0-\theta\right) + \theta$
• $\sigma_{BS}^2 = c_2$ with $c_2$ the second cumulant of $\ln \frac{F}{K}$ [1].

The diagram below shows the resulting integrand for the different control variate volatilities, the Heston model parameters

$r=7.5\%, q=5\%, S_0=100, \nu_0=0.08, \rho=-80\%,\sigma=0.5, \kappa=4, \theta=0.05$

and for a vanilla call option with strike $K=200$ and maturity $t=2.0$.

The best choice depends on the Heston- and option parameters but it seems that

$\sigma_{BS}^2 = \frac{1}{\kappa t}\left(1-e^{-\kappa t}\right)\left(\nu_0-\theta\right) + \theta$

is a good option for a large variety of parameters. A direct comparison of the integrand with and without control variate shows how effective the control variate is.

The central trick of Andersen-Piterbarg is now to truncate Lewis’s infinite integral at a finite $u_{max}$ such that the remaining part is smaller than a given threshold based on the following inequation

$\displaystyle\int_{u_{max}}^\infty \left\vert \frac{\phi_B(u-\frac{i}{2}) - \phi(u-\frac{i}{2})}{u^2+\frac{1}{4}} \right\vert du \le e^{-C_\infty u_{max}}\displaystyle\int_{u_{max}}^\infty \frac{1}{u^2} du$

Please see the original paper or [2] for further details on how to get from here to an algorithm for $u_{max}$, especially for short dated options.

As Andersen and Piterbarg have pointed out, the simple trapezoidal rule works surprisingly well to carry out the resulting integral if a control variate is used. Alan Lewis’s test cases with high precision Heston reference prices should serve as a test bed here, in particular the call with strike 100. The Gauss-Laguerre method is very thankful for this particular test case but the point here is that the trapezoidal rule converges much faster than the higher order Simpson rule or the even more complex adaptive Gauss-Lobatto method.

The diagram below compares the recently added COS engine with the Andersen-Piterbarg method using the trapezoid rule for this test-case. The Andersen-Piterbarg method is more accurate for a similar number of points.

The implementation is part of the pull request #251.

[2] F. Le Floc’h, Fourier Integration and Stochastic Volatility Calibration.

[3] L. Andersen, and V. Piterbarg, 2010,  Interest Rate Modeling, Volume I: Foundations and Vanilla Models,  Atlantic Financial Press London.

# QuantLib User Meeting 2016

Please find here my talk about the Collocating Local Volatility Model at this year’s QuantLib User Meeting in Düsseldorf, Germany. The source code is also available.

# The CLV Model with a Square Root Kernel Process

The kernel process of the Collocating Local Volatility (CLV) model [1] defines the forward skew dynamics of the model. Although the Ornstein-Uhlenbeck process has some nice analytical features it offers sometimes too little control over the forward skew dynamics. The square root process

$d\nu_t=\kappa(\theta-\nu_t) dt + \sigma \sqrt{\nu_t}dW_t$

is a promising kernel to get more control over the forward skew. First, exact sampling is pretty easy for the square root process. The probability density function of $\nu_t$ given $\nu_0$ is

$\nu_t = \frac{\sigma^2 (1- e^{-\kappa t})}{4\kappa} \chi_d^{'2}\left(\frac{4\kappa e ^{-\kappa t}}{\sigma^2(1-e^{-\kappa t})}\nu_0\right)$

where $\chi_d^{'2}$ denotes the noncentral chi-squared distribution with

$d=\frac{4\theta\kappa}{\sigma^2}$

degrees of freedom and the noncentrality parameter

$\lambda = \frac{4\kappa e^{-\kappa t}}{\sigma^2(1-e^{-\kappa t})}\nu_0$

The boost library provides an efficient and accurate implementation of the inverse of the cumulative noncentral chi-squared distribution function, which can be used for an exact Monte-Carlo sampling scheme.

The optimal collocation points are given by the Gaussian quadrature points, which are defined by the zeros of the corresponding orthogonal polynomials. The orthogonal polynomials are defined by a recurrence relation and the collocation points are given by the eigenvalues of a symmetric, tri-diagonal matrix with the diagonal $\{\alpha_i\}$ and the minor diagonal $\{\sqrt{\beta_i}\}$. Again these vectors are defined by a recurrence relation [2]

$\begin{array}{rcl} z_{k, i} &=& z_{k-1, i+1} + \alpha_k z_{k-1,i} - \beta_k z_{k-2,i} \\ \nonumber \alpha_{k+1} &=& \frac{z_{k-1,k}}{k-1,k-1}-\frac{z_{k,k+1}}{z_{k,k}} \\ \nonumber \beta_{k+1} &=& \frac{z_{k,k}}{z_{k-1,k-1}}\\ \nonumber z_{-1,i} &=& 0 \\ \nonumber z_{0,i} &=& \mu_i = \int_0^\infty x^i\chi_{d,\lambda}^{'2}(x) dx \\ \nonumber \alpha_1 &=& -\frac{\mu_1}{\mu_0} \\ \nonumber \beta_1 &=& \mu_0 = 1\end{array}$

The first n moments $\mu_i(d, \lambda), i=0,..,n$ of the noncentral chi-squared distribution can be calculated using Mathematica and exported as plain C code in order to be integrated into QuantLib.

m[n_] := CForm[ Expectation[X^n, X \[Distributed]
NoncentralChiSquareDistribution[d, lambda]] // Simplify]


Solving the recurrence relation is often ill-conditioned when using double precision. Therefore the resulting equations will be solved using the Boost.Multiprecision package with the cpp_dec_float back-end (header only and dependency free). The results have been tested accordingly to the proposal in [3]. For any reasonable number of collocation points a precision of up to 100 digits seems to be sufficient. On the other side the computation even with 100 digits is really fast. The eigenvalue calculation is carried out using standard double precision.

The implementation of the calibration routine follows the same way as for the Ornstein-Uhlenbeck kernel process. Again the calibration is pretty fast and accurate compared with other structured models even though it involves multiple precision arithmetic.

Example: Forward volatility skew

• Market prices are given by a Heston model with

$S_0=100, r=0.1, q=0.05, \nu_0=0.09, \kappa=1.0, \theta=0.06, \sigma=0.4, \rho=-0.75$.

• SquareRoot-CLV kernel process parameters are given by

$\kappa=0.2, \theta=0.09, \sigma=0.1, x_0=0.09$

The diagram below shows the implied volatility of an forward starting European option with moneyness varying from 0.5 to 2 and maturity date six month after the reset date.

Same plot but with the following parameters of the square root kernel process

$\kappa=0.1, \theta=0.09, \sigma=0.1, x_0=0.25$

Source code is available here. The code is in an early state and needs some more testing/clean-up before a pull request can be made out of it. Next step will be to calibrate such a CIR-CLV model to the forward skew dynamics of an Heston Stochastic Local Volatility model.

[1] A. Grzelak, 2016, The CLV Framework – A Fresh Look at Efficient Pricing with Smile

[2] M. Morandi Cecchi and M. Redivo Zaglia, Computing the coefficients of a recurrence formula for numerical integration by moments and modified moments.

[3] Walter Gautschi, How and How not to check Gaussian Quadrature Formulae.

# The Collocating Local Volatility Model

In a recent paper [1] Lech Grzelak has introduced his Collocating Local Volatility Model (CLV). This model utilises the so called collocation method [2] to map the cumulative distribution function of an arbitrary kernel process onto the true cumulative distribution function (CDF) extracted from option market quotes.

Starting point for the collocating local volatility model is the market implied CDF of an underlying $S_t$ at time $t_i$:

$F_{S(t_i)}(x) = 1 + e^{r t_i}\frac{\partial V_{call}(t_0, t_i, K)}{\partial K}\mid_{K=x} = e^{r t_i}\frac{\partial V_{put}(t_0, t_i, K)}{\partial K}\mid_{K=x}$

The prices can also be given by another calibrated pricing model, e.g. the Heston model or the SABR model. To increase numerical stability it is best to use OTM calls and puts.

The dynamics of the spot process $S_t$ should be given by some stochastic process $X_t$ and a deterministic mapping function $g(t, x)$ such that

$S_t=g(t, X_t)$

The mapping function $g(t, x)$ ensures that the terminal distribution of $S_t$ given by the CLV model matches the market implied CDF. The model then reads

$\begin{array}{rcl} S_t &=& g(t,X_t) \nonumber \\ dX_t &=& \mu(X_t)dt + \sigma(X_t)dW_t\nonumber \end{array}$

The choice of the stochastic process $X_t$ does not influence the model prices of vanilla European options – they are given by the market implied terminal CDF – but influences the dynamics of the forward volatility skew generated by the model and therefore the prices of more exotic options. It is also preferable to choose an analytical trackable process $X_t$ to reduce the computational efforts.

The collocation methods outlined in [2] defines an efficient algorithm to approximate the mapping function $g(t, x)$ based on a set of collocation points $x_{i,j}=x_j(T_i)$ for a given set of maturities $T_i, i=1,...,m$ and $j=1,...,n$ interpolation points per maturity $T_i$. Let $F_{S_{T_i}}(s)$ be the market implied CDF for a given expiry $T_i$. Then we get

$\begin{array}{rcl} F_{X_{T_i}}\left(x_{i,j}\right) &=& F_{S_{T_i}}\left(g(T_i, x_{i,j})\right) = F_{S_{T_i}} \left(s_{i,j}\right) \nonumber \\ \Rightarrow s_{i,j}&=&F^{-1}_{S_{T_i}}\left(F_{X_{T_i}}(x_{i,j})\right) \nonumber \end{array}$

for the collocation points with $s_{i,j}=g(T_i, x_{i,j})$.

The optimal collocation points are given by the abscissas of the Gaussian quadrature for the terminal distribution of $X_{T_i}$. The simplest choice is a normally distribute kernel process $X_t$ like the Ornstein-Uhlenbeck process

$dX_t = \kappa(\theta-X_t)dt + \sigma dW_t$.

The corresponding collocation points of the Normal-CLV model are then given by

$\begin{array}{rcl} x_j(t) &=& \mathbb{E}\left[X_t\right] + \sqrt{\mathbb{V}ar\left[X_t\right]} x_j^{\mathcal{N}(0,1)} \nonumber \\ &=& \theta + \left(X_0 - \theta)e^{-\kappa t}\right) + \sqrt{\frac{\sigma^2}{2\kappa}\left(1-e^{-2\kappa t}\right)} x_j^{\mathcal{N}(0,1)}, \ j=1,...,n\end{array}$

in which the collocation points $x_j^{\mathcal{N}(0,1)}$ of the standard normal distribution can be calculated by QuantLib’s Gauss-Hermite quadrature implementation

Array abscissas = std::sqrt(2)*GaussHermiteIntegration(n).x()


Lagrange polynomials [3] are an efficient interpolation scheme to interpolate the mapping function $g(t, x)$ between the collocation points

$g(t, X_t) = \sum_{j=1}^N s_j (t)\prod_{k=1, j\neq k}^N \frac{X(t)-x_j(t)}{x_k(t)-x_j(t)}$

Strictly speaking Lagrange polynomials do not preserve monotonicity and one could also use monotonic interpolation schemes supported by QuantLib’s spline interpolation routines. As outlined in [2] this method can also be used to approximate the inverse CDF of an “expensive” distributions.

Calibration of the Normal-CLV model to market prices is therefore pretty fast and straight forward as it takes the calibration of $g(t, x_t)$.

Monte-Carlo pricing means simulating the trackable process $X_t$ and evaluate the  Lagrange polynomial if the value of the spot process $S_t$ is needed. Pricing via partial differential equation involves the one dimensinal PDE

$\frac{\partial V}{\partial t} + \mu(x)\frac{\partial V}{\partial x} + \frac{1}{2}\sigma^2(x)\frac{\partial^2 V}{\partial x^2}-rV = 0$

with the terminal condition at maturity time $T$

$V(T, x_T) = \text{Payoff}\left(S_T=g(T,x_T)\right)$

For plain vanilla options the upper and lower boundary condition is

$\frac{\partial^2 V}{\partial x^2} = 0 \ \ \forall x\in\left\{x_{min},x_{max}\right\}$

Example 1: Pricing error for plain vanilla options

• Market prices are given by the Black-Scholes-Merton model with

$S_0=100, r=0.1, q=0.04, \sigma=0.25$.

• Normal-CLV process parameters are given by

$\kappa=1.0, \theta=0.1,\sigma=0.5,x_0=0.1$

Ten collocation points are used to define the mapping function $g(t, x)$ and the time to maturity is one year. The diagram below shows the deviation of the implied volatility of the Normal-CLV price based on the PDE solution from the true value of 25%

Even ten collocation points are already enough to obtain a very small pricing error. The advice in [2] to stretch the collocation grid has turned out to be very efficient if the number of collocation points gets larger.

Example 2: Forward volatility skew

• Market prices are given by the Heston model with

$S_0=100, r=0.1, q=0.05, \nu_o=0.09, \kappa=1.0, \theta=0.06, \sigma=0.4, \rho=-0.75$.

• Normal-CLV process parameters are given by

$\kappa=-0.075, \theta=0.05,\sigma=0.25,x_0=0.05$

The diagram below shows the implied volatility of an forward starting European option with moneyness varying from 0.5 to 2 and maturity date six month after the reset date.

The shape of the forward volatility surface of the Normal-CLV model shares important similarities with the surfaces of the more complex Heston or Heston Stochastic Local Volatility (Heston-SLV) model with large mixing angles $\eta$. But by the very nature of the Normal-CLV model, the forward volatility does not depend on the values of $\theta, \sigma$ or $x_0$, which limits the variety of different forward skew dynamics this model can create. CLV models with non-normal kernel processes will support a greater variety.

Example 3: Pricing of Double-no-Touch options

• Market prices are given by the Heston model with

$S_0=100, r=0.02, q=0.01, \nu_o=0.09, \kappa=1.0, \theta=0.06, \sigma=0.8\rho=-0.8$.

• Normal-CLV process parameters are given by different $\kappa$ values and

$\theta=100,\sigma=0.15,x_0=100.0$

Unsurprisingly the prices of 1Y Double-no-Touch options exhibit similar patterns with the Normal-CLV model and the Heston-SLV model as shown below in the “Moustache” graph. But the computational efforts using the Normal-CLV model are much smaller than the efforts for calibrating and solving the Heston-SLV model.

The QuantLib implementation of the Normal-CLV model is available as a pull request #117, the Rcpp based package Rclv contains the R interface to the QuantLib implementation and the demo code for all three examples.

[1] A. Grzelak, 2016, The CLV Framework – A Fresh Look at Efficient Pricing with Smile

[2] L.A. Grzelak, J.A.S. Witteveen, M.Suárez-Taboada, C.W. Oosterlee,
The Stochastic Collocation Monte Carlo Sampler: Highly efficient sampling from “expensive” distributions

[3] J-P. Berrut, L.N. Trefethen,  Barycentric Lagrange interpolation, SIAM Review, 46(3):501–517, 2004.

# Building R Packages with Rcpp and QuantLib on Windows

Using Rcpp and QuantLib to build new R packages is straight forward on Unix/Linux. More steps are needed to achieve the same on Windows. Please find here a short manual taking the RHestonSLV packages as an example.

# R/Finance 2016

Update 04.09.2016: Prebuild Windows x64 R package is available.

Please find here my talk about the Heston Stochastic Local Volatility Model at this year’s R/Finance 2016 conference in Chicago, USA. The source code of the RHestonSLV package including all examples is also available.

# LSMC: Need for fast and accurate Generalized Least Squares

Update 21.02.2016: Added values for QR decomposition with pivoting and QuantLib performance improvements.

Least Squares Monte Carlo simulations spend a significant amount of the total computation time on the generalized least squares especially if the problem itself has a high dimensional state.  Preferred techniques to solve the normal equations are the QR decomposition

$A = Q\cdot R$

where $Q$ is orthogonal and $R$ is upper triangular or the singular value decomposition (SVD)

$A = U\cdot w \cdot V^T$

where $U$ is column-orthogonal, $V$  is an orthogonal matrix and $w$ is a positive semi-definite diagonal matrix. The Cholesky Factorization

$A = L \cdot L^T$

where $L$ is a lower triangular is the fasted method but often numerically unstable. The author in [1] summaries all methods and also outlines the computational efforts involved for $m$ observations and $n$ parameters as

• Chlolesky Factorization: costs $\sim mn^2 + n^3/3$ flops
• QR decomposition: costs $\sim 2mn^2 - 2n^3/3$ flops
• Singular value decomposition: costs $2mn^2 + 11n^3$ flops

For LSMC simulations we have $m\gg n$ and therefore the QR decomposition has no computational advantages over the SVD.  Since a QR decomposition without column pivoting has numerical problems if $A$ is rank-deficient, the singular value decomposition is often the method of choice for LSMC simulations.

All three decomposition methods are available in QuantLib, in LAPACK (with or without optimized OpenBLAS library) and in Intel’s MKL library. A standard Swing option valuation via LSMC should serve as a test bed to measure the performance of the QR decomposition with and without column pivoting and of the SVD algorithm. For LAPACK and MKL the methods dgels and dgesvd have been used to implement the SVD and the QR decomposition without pivoting whereas QR with pivoting is based on dgeqp3, dormqr and dtrsm.  In order to keep results comparable the single thread performance was measured in all cases. The reference prices are calculated via finite difference methods and all LSMC implementations have led to the same price in line with the reference price. The current QR implementation in QuantLib 1.7 has a performance issue if the number of rows is much larger than the number of columns. For these tests an improved version of QuantLib’s QR decomposition has been used.

As expected MKL is often the fasted library but the difference between MKL and  LAPACK plus OpenBLAS is small.

[1] Do Q Lee, 2012, Numerically Efficient Methods for Solving Least Squares Problems

# Monte-Carlo Calibration of the Heston Stochastic Local Volatiltiy Model

Solving the Fokker-Planck equation via finite difference methods is not the only way to calibrate the Heston stochastic local volatility model

$\begin{array}{rcl} d \ln{S_t} &=& \left(r_t - q_t - \frac{1}{2}L\left(S_t, t\right)^2\nu_t \right)dt + L(S_t, t)\sqrt{\nu_t} dW_t^S \\ \nonumber d\nu_t &=& \kappa\left(\theta - \nu_t\right)dt + \eta\sigma\sqrt{\nu_t}dW_t^\nu \\ \nonumber \rho dt &=& dW_t^\nu dW_t^S \end{array}$

The basic equation to calibrate the leverage function $L(S_t,t)$ for a local volatility surface $\sigma_{LV}(S_t,t)$ and a set of Heston parameters $\{\nu_0, \kappa, \theta, \sigma, \rho, \eta \}$ is given by

$L(S_t, t)=\frac{\sigma_{LV}(S_t, t)}{\sqrt{\mathop{\mathbb{E}}[\nu_t|S=S_t]}}= \sigma_{LV}(S_t, t)\sqrt{\frac{\int_{\mathop{\mathbb{R}}^+} p(S_t,\nu, t) d\nu}{\int_{\mathop{\mathbb{R}}^+}\nu p(S_t,\nu,t)d\nu}}$

Key problem here is to calculate the expectation value $\mathop{\mathbb{E}}[\nu_t|S=S_t]$. This can either be done via the Fokker-Planck equation as outlined in [3] and the references in there or via Monte-Carlo simulations as shown in [2].  Getting the solution of the Fokker-Planck equation via finite difference methods is anything but easy if the Feller constraint is violated. In this case the Monte-Carlo approach promises less numerical problems to overcome. Starting point for an efficient Monte-Carlo calibration is a fast and accurate simulation scheme for a stochastic local volatility (SLV) model. The variance part of the SLV can be sampled exactly using the non-central $\chi^2$ distribution.

$Pr(\nu_{t+\Delta t} < \nu | \nu_t) = F_{\chi^{\prime }2}\left(\nu \frac{4\kappa}{\eta^2\sigma^2\left(1-e^{-\kappa \Delta t}\right)}; \frac{4\kappa\theta}{\eta^2\sigma^2}, \frac{4\kappa}{\eta^2\sigma^2\left(1-e^{-\kappa \Delta t}\right)}\nu_t e^{-\kappa \Delta t}\right)$

The boost library supports the inverse of the cumulated non-central $\chi^2$ distribution but the calculation itself is rather slow. The Heston QE scheme outlined in [1] comes with an fast and accurate sampling scheme for the square root process. In the case $\gamma_1=\gamma_2=\frac{1}{2}$ it reads

$\begin{array}{rcl} m &=& \theta+(\nu_t - \theta)e^{-\kappa \Delta t} \\[6pt] \nonumber \psi &=& \frac{\nu_t \frac{\eta^2\sigma^2 e^{-\kappa\Delta t}}{\kappa}\left(1-e^{-\kappa\Delta t} \right) + \frac{\theta\eta^2\sigma^2}{2\kappa}\left(1-e^{-\kappa\Delta t}\right)^2}{m^2}\\[6pt] \nonumber \beta &=& \frac{2}{\psi} - 1 + \sqrt{\frac{2}{\psi}\left(\frac{2}{\psi}-1 \right)} \\[6pt] \nonumber p &=& \frac{\psi-1}{\psi+1} \\[6pt] \nonumber Z_\nu &\sim& \mathcal{N}(0,1) , u \sim \mathcal{U}(0,1) \\[6pt] \nonumber \nu_{t+\Delta t} &=& \begin{cases} \frac{m}{1+\beta}\left(\sqrt{\beta}+Z_\nu\right)^2 & \text{if } \psi < 1.5 \\[6pt] \begin{cases} 0, & \text{if } u \leq p \\[6pt] \ln{\left(\frac{1-p}{1-u}\right)}\frac{m}{1-p} & \text{otherwise} \end{cases} & \text{if } \psi \geq 1.5 \end{cases} \end{array}$

The authors in [2] suggesting a sampling scheme for $X_t = \ln{S_t}$. The scheme below is a bit closer to the original QE scheme and without a rigorous proof it has shown slightly smaller discretisation  errors for the examples.

$\begin{array}{rcl} X_{t+\Delta t} &=& X_t + \left(r_t - q_t - \frac{1}{4}\left(\nu_t+\nu_{t+\Delta t}\right)L(S_t, t)^2 \right)\Delta t \\[6pt] \nonumber &&+ \frac{\rho}{\sigma}L(S_t, t) \left(\nu_{t+\Delta t} -\nu_t - \kappa\theta\Delta t + \frac{1}{2}\left(\nu_t + \nu_{t + \Delta t}\right)\kappa\Delta t\right) \\[6pt] \nonumber &&+\sqrt{\frac{1}{2}\left(1-\rho^2\right)\left(\nu_t + \nu_{t+\Delta t}\right) Z_x \Delta t} \\[6pt] \nonumber Z_x &\sim& \mathcal{N}(0,1) \end{array}$

To get from the Monte-Carlo paths to the leverage function the authors in [2] using a binning technique in every time step. QuantLib supports two ways to generate the random draws, either based on i.i.d normal variables or based on Sobol quasi Monte-Carlo simulations with brownian bridges.

Test case definition: Feller constraint is violated.

$\begin{array}{rcl} S_{0} &=& 100 \\ \nonumber r_t &=& 5\% \\ \nonumber q_t &=& 2\% \\ \nonumber \sigma_{LV}(S_t, t) &=& 30\% \\ \nonumber \kappa &=& 1.0\\ \nonumber \theta &=& 0.06\\ \nonumber \rho &=& -0.75\\ \nonumber \sigma &=& 0.4\\ \nonumber \nu_{0} &=& 0.09 \end{array}$

Effectively the leverage function $L(S_t, t)$ will remove the skew and term structure introduced by the Heston model and shifts volatility to 30%. In order to measure the calibration quality the price mismatches expressed in volatility basis point differences are calculated for OTM calls and puts with prices above 0.02 and for maturity from one month until two year. The SLV prices are calculated by solving the Feynman-Kac backward equation using finite difference methods.

The diagram above shows the average pricing error for different number of Monte-Carlo calibration paths and for different bin sizes. As expected the optimal bin size depends on the number of Monte-Carlo simulations. Especially deep OTM options benefit from a large number of bins. The usage of Sobol quasi Monte-Carlo simulations with brownian bridges does not give a significant advantage.

The average pricing error is decreasing with the square root of the number of Monte-Carlo calibration paths if one always choses the optimal bin size as shown in the diagram below.  Quasi Mont-Carlo path generation with brownian bridges gives a small advanage.

As expected the leverage function itself exposes some Monte-Carlo noise, see diagram below. The corresponding leverage function based on the finite difference method is smooth but the overall calibration error is of the same size.

The calibration for different mixing factors $\eta$ is straight forward. The diagram below e.g. shows the prices of a double no touch option for different distances of the (symemtric) lower and upper barriers expressed in terms of the corresponding Black-Scholes prices. This diagram type is derived from figure 8.8 in [4].

The source code is available on github as part of the test suite HestonSLVModelTest.

[1] Leif Andersen, Efficient Simulation of the Heston Stochastic Volatility Model

[2] Anthonie Van der Stoep, Lech Grzelak, Cornelis Oosterlee,
The Heston Stochastic-Local Volatility Model: Efficient Monte Carlo Simulation

[3] Johannes Goettker-Schnetmann, Klaus Spanderen, Calibrating the Heston Stochastic Local Volatility Model using the Fokker-Planck Equation

[4] Iain J. Clark, Foreign Exchange Option Pricing: A Practitioner’s Guide

# Parallel Unit Test Runner using MPI

Update: 10.03.2016: added performance results of latest QuantLib test-suite build on a 32 Core SMP machine using boost interprocess instead of MPI.

Running QuantLib’s test suite on a recent computer takes around 10min. The situation will improve a lot if the test runner utilises more than one core of today’s multi-core CPUs to run the tests in parallel. Unfortunately multi-threading won’t work because the boost unit test framework is not thread safe.  A reasonable way forward to speed-up the test suite is via multiprocessing using message passing between the compute nodes based on the master-slave paradigm. The cross platform standard MPI together with boost::mpi is tailor-made for this task.

The missing piece is an external, parallel unit test runner, which uses MPI for load balancing and to collect the test results. The runner for QuantLib ‘s test suite needs boost version 1.59 or higher and can be found here.  Please replace in quantlibtestsuite.cpp line 24

 #include <boost/test/unit_test.hpp>

by

 #include <mpiparalleltestrunner.hpp>

and do not link the executable with libboost_unit_test_framework.so because the new header file includes the header only version of the boost test framework (Thanks to Peter Caspers for the hint). Load balancing is a crucial point for the overall speed-up because  QuantLib’s test suite has a handful of long running tests (max. around 90 seconds), which should be scheduled first. Therefore the MPI test runner collects the statistics of every unit-test’s runtime and stores these in a local file to plan the schedule of the next runs.

The diagram above shows the runtime of QuantLib’s test suite for a different number of parallel processes and CPU configurations. The minimum runtime is set by the longest running test case, which is around 50 seconds. On a single CPU the performance scales pretty linear with the number of cores being utilised and also hyper-threading cores help. Using more than sixteen real cores does not improve the situation any further because the overall runtime is already dominated by the longest running test case.