High Performance American Option Pricing

A couple of years ago Andersen, Lake and Offengenden have introduced a new high performance algorithm to price American options for the Black-Scholes-Merton model [1][2]. Basis of the algorithm is an interpolation of the early exercise boundary

S^\star(t) = B(\tau),\ \tau=T-t

for a time reversed boundary function B(\tau) with B(0)=K. Knowing the boundary function B(\tau), the value of an American put option is given by

\begin{array}{rcl}\displaystyle V(\tau, S)&=& \displaystyle e^{-r\tau}K\Phi\left(-d_-(\tau,S/K)\right)-Se^{-qt}\Phi\left(-d_+(\tau, S/K)\right) \\[5pt] &&\displaystyle + \int_0^\tau r K e^{-r(\tau - u)} \Phi(-d_-(\tau-u, S/B(u) )) du \\[11pt] &&\displaystyle - \int_0^\tau q S e^{-q(\tau - u)} \Phi(-d_+(\tau-u, S/B(u))) du \\[10pt] d_\pm(\tau, x) &=& \displaystyle \frac{\ln{x}+(r-q)\tau \pm\frac{1}{2}\sigma^2\tau}{\sigma\sqrt{\tau}}\end{array}

and the value of an American call option can be evaluated via the call-put symmetry [3]. The algorithm consists of four separate tasks and the precision of each task is steered by a separate parameter and the choice of an integration algorithm.

  1. The early exercise boundary B(\tau) is interpolated on n Chebyshev nodes [4].
  2. The early exercise boundary B(\tau) is approximated using m fixed point iteration steps. The first iteration is a partial Jacobi-Newton step, the other m-1 steps are ordinary fixed point iterations.
  3. Integrations during the fixed point iterations are either carried out using Gauss-Legendre integration of order l or a tanh-sinh integration with given tolerance ε.
  4. A last integration along the early exercise boundary B(\tau) to get the American option value is using Gauss-Legendre integration of order p or a tanh-sinh integration with given tolerance ν.

It is important for the overall performance that all sub steps lead to a similar accuracy. The parameter space may be best explored with help of a benchmark portfolio, e.g. following [1]

\begin{array}{rcl} T&\in& \{ 30, 91, 182, 273, 365 \} \\ r &\in&  \{2\%, 4\%, 6\%, 8\%, 10\% \} \\ q&\in&\{ 0\%, 4\%, 8\%, 12\%\} \\ S&\in& \{25, 50, 80, 90, 100, 110, 120, 150, 175, 200\} \\ \sigma&\in&\{10\%, 20\%, 40\%, 60\%\}\end{array},

and the best accuracy per runtime ratios for different parameter configurations. Each dot in the diagram below corresponds to the root mean square error (RMSE) of a different parameter configuration. The RMSE is calculated for all 6000 option prices in the benchmark portfolio and plotted against the runtime of a single option pricing. For comparison the results of the finite difference method have been added but PDEs can not compete with this algorithm (nor can the tree based algorithms in QuantLib).

This data set can be used to define the efficient parameter frontiers and take optimal parameter configurations from the frontiers. A fast configuration is Legendre integration for step 3 and 4 and

(l, n, m, p) = (7,2,7,27),

accurate is Legendre integration for step 3 and tanh-sinh integration for step 4 with

(l, n, m, \nu) =(25, 5, 13, 10^{-8}),

high precision is achieved with tanh-sinh for step 3 and 4 with

(\epsilon, n, m, \nu) = (10^{-10}, 10, 30, 10^{-10})

The implementation of this algorithm is not always straight forward and part of the PR#1495.

[1] L. Andersen, M. Lake and D. Offengenden: High Performance American Option Pricing

[2] L. Andersen and M. Lake: Fast American Option Pricing: The Double Boundary Case

[3] R. McDonald and M. Schroder: A parity Result for American Options

[4]. S.A. Sarra: Chebyshev Interpolation: An Interactive Tour

Python, QuantLib and Multi-Threading

Python is – thanks to the GIL – not the language of choice for modern multi-threading programming and also is QuantLib not really thread safe, even though the thread safe observer pattern implementation and the thread safe singleton initialization are helping to mitigate the risks (In short as a general advise do not share objects between threads, do not change global state like e.g. the evaluation date in an uncontrolled manner and enable QL_ENABLE_THREAD_SAFE_OBSERVER_PATTERN.).

Even then – other than in Java/Scala/C# or C++ – there is not much to gain in Python by default because the SWIG generated interface code will not release the GIL while running C++ code. Hence all threads except one will block when running long lasting QuantLib computations. But not all hope is lost, SWIG can be instructed to release the GIL before calling QuantLib methods with the command line parameter “-threads”. The only caveat here is that some of QuantLib’s SWIG interface files contain low level Python C API calls like e.g. Py_XDECREF. Every block with low level Python C API calls has to be guarded with the SWIG_PYTHON_THREAD_BEGIN_BLOCK macro at the beginning of the block to acquire the GIL, in total 60 times.

As a reward, the GIL will be released when running QuantLib code in Python and multi-threading can be used to speed-up multiple, long lasting QuantLib calls in the same manner as in the other languages.

Heston PDE Grid: How small can we make it?

The aim is to apply the techniques used in Arbitrary Number of Stencil Points to the partial differential equation (PDE) of the Heston model with x_t = \ln S_t

\displaystyle  0 = \frac{\partial u}{\partial t} +\frac{1}{2}\nu\frac{\partial^2 u}{\partial x^2}  + \left(r_t-q_t -\frac{\nu}{2}\right)\frac{\partial u}{\partial x} + \rho\sigma\nu \frac{\partial^2 u}{\partial \nu \partial x} +\frac{1}{2}\sigma^2\nu\frac{\partial^2 u}{\partial \nu^2} + \kappa(\theta - \nu) \frac{\partial u}{\partial \nu}-r_tu

to get – for a given accuracy – to the smallest possible lattice grids. The Heston model parameters at hand are

\displaystyle S_0 = 100, v_0 = 0.04, \kappa = 1, \theta = v_0, \sigma=0.2, \rho=-0.75, r=5\%, q=0\%,

the strikes of the benchmark options are given by

\displaystyle k\in \{50, 75, 90, 100, 110, 125, 150, 200\}

and the maturity of the options should be one year. First let’s use a five point stencil in x direction. As expected the order of convergence is increasing from second to fourth order.

The same experiment but now with five point stencil also in \nu direction is shown below. Finite lattice effects show-up already at relatively small lattices due to the high convergence speed.

In time direction Richardson extrapolation can be used to increase the order of convergence from two to three.

So finally for this parameter set if the aim is to get the average pricing error below \displaystyle 10^{-3} then a lattice size of

\displaystyle x\times\nu\times T = 50\times 10 \times 20

seems to be sufficient when using five point stencil operators in x and \nu direction and Richardson extrapolation in time direction. Unfortunately for a given accuracy this technique is slower than the usual suspects, e.g. Operator splitting because the matrix inversion for the Crank-Nicolson scheme has to be carried out by an iterative solver, namely by BiCGstab. The code for the numerical experiments can be found in the test case

NthOrderDerivativeOpTest::testHigerOrderHestonOptionPricing().

A novel Control Variate for the Heston Model

Semi analytical valuation methods for the Heston model often fall short in very high vol of vol scenarios. The reason for these difficulties becomes clearer when looking at the asymptotic behavior of the integrand for a call option price in the Carr-Maden formulation for \alpha=1/2 (model definition and further references can be seen here):

\begin{array}{rcl} C(S_0, K, T) &=& Fe^{-rT} - \frac{\sqrt{Se^{(r-q)t}K}e^{-rt}}{\pi}\displaystyle\int\limits_{0}^{\infty}{\Re\left( e^{i u \left(\ln\frac{S}{K}+(r-q)T\right) } \frac{\phi_T\left(u-\frac{i}{2}\right)}{u^2+\frac{1}{4}} \right)  \mathrm{d}u}\nonumber \\ \phi_T(z) &=& \exp\left\{\frac{v_0}{\sigma^2}\frac{1-e^{-DT}}{1-Ge^{-DT}}\left(\kappa-i\rho\sigma z-D\right) + \frac{\kappa\theta}{\sigma^2}\left(\left(\kappa-i\rho\sigma z-D\right)T-2\ln\frac{1-Ge^{-DT}}{1-G}\right) \right\} \nonumber \\  D&=&\sqrt{\left(\kappa - i\rho\sigma z\right)^2+\left(z^2+iz\right)\sigma^2} \nonumber \\ G&=&\displaystyle\frac{\kappa -i\rho\sigma z-D}{\kappa -i\rho\sigma z + D}\end{array}.

A lengthy calculation of the asymptotic expansion up to \mathcal{O}(u) gives

\phi_T\left(u-\frac{i}{2}\right)\xrightarrow[]{u\rightarrow \infty} e^{\phi u + \psi}

with

\displaystyle \begin{array}{rcl} \phi &=& -\frac{1}{\sigma}\left(v_0 + \kappa\theta T \right)(\sqrt{1-\rho^2} + i\rho)\nonumber \\ \psi &=& \frac{1}{\sigma^2}\left[(\kappa - \rho\sigma/2)(v_0+\kappa\theta T) + \kappa\theta\log\left(4-4\rho^2\right) - i\left(\rho^2\sigma/2- \kappa\rho\right)\frac{v_0+\kappa\theta T}{1-\rho^2} + 2i\kappa\theta\arctan{\frac{\rho}{\sqrt{1-\rho^2}}} \right]\end{array}

High vol of vol leads to small \phi and in turn to a very slowly decaying integrand which renders any type of Black-Scholes control variate

\displaystyle \phi_{BS}(z)=e^{-\frac{1}{2}\sigma_{BS}^2 T(z^2+iz)}

useless. On the other side the integral over the asymptotic expansion has an easy to evaluate semi-analytic solution

\displaystyle\int\limits_{0}^{\infty}{ \frac{e^{\phi u + \psi}}{u^2+\frac{1}{4}}\mathrm{d}u} =  e^\psi \left( -2 \mathrm{Ci}(-\phi/2)\sin{(\phi/2)} + \cos{(\phi/2)}\left(\pi + 2 \mathrm{Si}(\phi/2)\right)\right)

with the special functions aka trigonometric integrals

\displaystyle\mathrm{Ci}(x)=-\int\limits_{x}^{\infty}\frac{\cos{t}}{t}\mathrm{d}t\ ,\ \ \mathrm{Si}(x)=\int\limits_{0}^{x}\frac{\sin{t}}{t}\mathrm{d}t .

This function can act as a highly efficient control variate for these otherwise difficult to handle high vol of vol parameter sets. A comparison with the Black-Scholes control variate is shown in the diagram below using Gauss-Laguerre integration of different orders N.asym_chf.png-1

As a rule of thumb the control variate based on the asymptotic expansion of the characteristic function outperforms the standard Black-Scholes control variate if 

\displaystyle T > 0.02 \  \wedge \ \frac{\nu_0 + T\kappa\theta}{\sigma} \sqrt{1-\rho^2}< 0.055.

The QuantLib implementation of the algorithm is part of the PR#898.

Optimized Heston Model Integration: Exponentially-Fitted Gauss-Laguerre Quadrature Rule.

Aim: Develop an exponentially-fitted Gauss-Laguerre quadrature rule to price European options under the Heston model, which outperforms given Gauss-Lobatto, Gauss-Laguerre and other pricing method implementations in QuantLib.

Status quo: Efficient pricing routines for the Heston model

\begin{array}{rcl} d \ln S_t&=& \left(r_t - q_t - \frac{\nu_t}{2}\right)dt + \sqrt\nu_t dW^{S}_t \nonumber \\ d\nu_t&=& \kappa\left(\theta-\nu_t \right ) dt + \sigma\sqrt\nu_t dW^{\nu}_t \nonumber \\ \rho dt &=& dW^{S}_tdW^{\nu}_t \end{array}

are based on the integration over the normalized characteristic function in the Gatheral formulation

\phi_t(z) = \exp\left\{\frac{v_0}{\sigma^2}\frac{1-e^{-Dt}}{1-Ge^{-Dt}}\left(\kappa-i\rho\sigma z-D\right) + \frac{\kappa\theta}{\sigma^2}\left(\left(\kappa-i\rho\sigma z-D\right)t-2\ln\frac{1-Ge^{-Dt}}{1-G}\right) \right\}

\begin{array}{rcl} D&=&\sqrt{\left(\kappa - i\rho\sigma z\right)^2+\left(z^2+iz\right)\sigma^2} \nonumber \\ G&=&\displaystyle\frac{\kappa -i\rho\sigma z-D}{\kappa -i\rho\sigma z + D}\end{array}

in combination with a Black-Scholes control variate to improve the numerical stability of Lewis’s formula [1][2][3]. The normalized characteristic function of the Black-Scholes model is given by

\phi^{BS}_T(z)=e^{-\frac{1}{2}\sigma_{BS}^2 T (z^2 + iz)}

and the price of a vanilla call option can then be calculated based on

\begin{array}{rcl} C(S_0, K, T)&=&BS\left(S_0, K, T, \sigma_{BS}\right) \nonumber \\ &+& \frac{\sqrt{Se^{(r-q)T}K}e^{-rT}}{\pi}\displaystyle\int\limits_{0}^{\infty}{\Re\left( \frac{\phi^{BS}_T\left(u-\frac{i}{2}\right) - \phi_T\left(u-\frac{i}{2}\right)}{u^2+\frac{1}{4}} e^{i u \left(\ln\frac{S}{K}+(r-q)T\right) } \right)  \mathrm{d}u}\end{array}

where BS\left(S_0, K, T, \sigma_{BS}\right) is the corresponding Black-Scholes price. Different choices for the volatility of the control variate are discussed in the literature. For the following examples the volatility will be defined by either

\sigma_{BS}^2 = \frac{1}{\kappa T}\left(1-e^{-\kappa T}\right)\left(\nu_0-\theta\right) + \theta

or

\displaystyle \sigma_{BS}^2 = -\frac{8}{T} \ln{\Re\left(\phi_T\left(-i/2\right)\right)}.

The first one matches the overall variance for \sigma=0 whereas the latter one matches the values of the characteristic functions at the starting point z=-i/2 of the integration. Usually the latter choice gives better results. Looking at the integrand above one can directly spot a weak point in the algorithm for deep in the money/deep out of the money options. In this case the integrand becomes a highly oscillating function due to the term

e^{i u \left(\ln\frac{S}{K}+(r-q)T\right) }.

Second, when using Gauss-Laguerre quadrature the overlap of the weight function e^{-x} with the characteristic function becomes sub-optimal for very short maturities or small effective volatilities as can be seen with the Black-Scholes characteristic function already.

Third, for very small \sigma the integrand becomes prone to subtractive cancellation errors.

The last issue is easy to overcome by using the second order Taylor expansion of the normalized characteristic for small \sigma. The corresponding mathematica script can be seen here.

Rescaling the integrand improves the second issue by rewriting the integral in terms of x = \eta u with

\eta = \max\left(0.01, \min\left(10, \frac{1}{\sqrt{8(1-e^{-\kappa t})(v_0-\theta)/\kappa + \theta t}}\right)\right)

The first problem, the highly oscillating integrand, can be tackled with the exponentially fitted Gauss-Laguerre quadrature rule [4]. The numerical integration of two smooth functions f_1(x) and f_2(x) with

I = \displaystyle \int_0^\infty e^{-x} \left(f_1(x) \sin(\mu x) + f_2(x)\cos(\mu x)\right)\mathrm{dx} = \int_0^\infty f(x)\mathrm{dx}

is approximated by a Gauss-Laguerre quadrature rule of the form

I \approx \sum_i^N w_i(\mu) f(x_i(\mu)).

In this use case the frequency \mu is

\mu =\ln\frac{S}{K}+(r-q)T,

and the weights w_i and nodes x_i of the Gauss-Laguerre quadrature rule become functions of the frequency \mu. The algorithm outlined in [4] is too slow to be used at runtime and in the original form only usable up to N=8. In order to use it for the Heston model we pre-calculate the weights and nodes for a defined set of frequencies \{0 \leq \mu_i\ \leq 50\}. The closest pre-calculated value \mu_j for a given frequency \mu is then used to evaluate the integral. Of course it would be optimal to use the weights and nodes for \mu instead of \mu_j but it is far, far better than assuming \mu=0 as it is done within the conventional Gauss-Laguerre quadrature rule.

Two techniques are key to get to larger N for the pre-calculated weights and nodes:

  • Start the algorithm with \mu=0 to get the standard Gauss-Laguerre weights/nodes and gently increase \mu afterwards. Use the old weights and nodes from the previous up to 20 steps to generate the next starting vector for the Newton solver based on the Lagrange interpolating polynomials.
  • Increase “numerical head room” by using the boost multi-precision package instead of double precision. This is only relevant for the pre-calculation. The resulting weights and nodes are stored as normal double precision values.

The source code to generate the weights and nodes for the exponential fitted Gauss-Laguerre quadrature rule is available here.

The reference results for the comparison with the other Heston pricing methods are generated using a boost multi-precision version of the Gauss-Laguerre pricing algorithm of order N=2000. Comparison with the results for N=1500 and N=2500 ensures the correctness of the results. The exponentially fitted Gauss-Laguerre quadrature rule is always carried out using N=48 nodes, corresponding to only 48 valuations of the characteristic function. First example model is

\displaystyle s=1.0, t=1,v_0=0.04, \kappa=2.5,\theta=0.06, \sigma=0.75,\rho=-0.1

expfit1

Exponential fitting outperforms the other methods especially when taking the total number of characteristic function valuations into consideration. As expected the method works remarkable well for deep OTM/ITM options and ensures that the absolute pricing error stays below 10^{-15} for an extremely large range of option strikes. Next round, same model but t =10, \rho=-0.75.

expfit2

Again the pricing error for exponential fitting stays below 10^{-14} for all moneynesses between -15 and 15 and well below the other methods. Next test, same model but very short maturity with t=\frac{1}{365}. The COS-method with 250 calls of the characteristic function and exponential fitting with N=48 are close together and outperform all other methods.

expfit3

The same graphs are shown for a variety of different Heston parameters being used in the literature in the following document. The exponentially fitted Gauss-Laguerre quadrature rule with only 64 nodes solves the problem almost always with \epsilon < 10^{-13} for moneynesses between -20 and 20 and maturities between one day and 10 years and outperforms the other methods, especially when taking the number of characteristic function calls needed into consideration. This method can also be extend to larger number of nodes.

The QuantLib implementation of the algorithm is part of the PR#812.

[1] Lewis, A. A simple option formula for general jump-diffusion and other exponential Lévy processes

[2] F. Le Floc’h, Fourier Integration and Stochastic Volatility Calibration.

[3] L. Andersen, and V. Piterbarg, 2010,  Interest Rate Modeling, Volume I: Foundations and Vanilla Models,  Atlantic Financial Press London.

[4] D. Conte, L. Ixaru, B. Paternoster, G. Santomauro, Exponentially-fitted Gauss–Laguerre quadrature rule for integrals over an unbounded interval.

Moment Matching Option Pricing: A Case Study

The moment generating function of a random variable Z is

M_Z(\theta) := E\left(e^{\theta Z}\right) , \ \theta\in \mathbb{R}

and the n-th moment of the probability distribution Z is then given by

\displaystyle m_n=E\left(Z^n\right) = M_Z^{(n)}(0) = \frac{d^n M_Z(\theta)}{d\theta^n}\bigg\rvert_{\theta=0}

The moment generating function is known for many financial models. Hence it is natural to derive approximations to exact pricing formulas based on the moment generating function.

The Kluge model [1] with exponential jump size distribution may serve here as a test bed for this approach. It is defined by

\displaystyle \begin{array}{rcl} S_t &=& \exp(f_t + X_t + Y_t) \\ dX_t &=& -\alpha X_tdt + \sigma dW_t^x \\ dY_t &=& -\beta Y_{t-}dt+J_tdN_t \\ \omega(J) &=& \eta e^{-\eta J} \end{array}

where N_t is a Poisson process with jump intensity \lambda and \eta is the inverse jump size. To match today’s forward curve F_0^t the function f_t is given by

\displaystyle f_t = \ln F_0^t -X_0 e^{-\alpha t}-Y_o e^{-\beta t} -\frac{\sigma^2}{4\alpha}\left(1-e^{-2\alpha t} \right ) - \frac{\lambda}{\beta}\ln\left( \frac{\eta-e^{-\beta t}}{\eta-1}\right), \eta\ge 1

The moment generating function for the log spot process

Z_t = \ln S_t = f_t+X_t+Y_t

is then given by [1]

\displaystyle M_Z(t) = \exp\left( \theta f_t + \theta X_0 e^{-\alpha t} + \theta^2 \frac{\sigma^2}{4\alpha}\left(1-2e^{-2\alpha t}\right) + \theta Y_0e^{-\beta t}\right) \left( \frac{\eta-\theta e^{-\beta t}}{\eta-\theta}\right)^\frac{\lambda}{\beta}

The pricing approximation formulas are based on the central moments

\displaystyle \mu_n = E\left[\left(Z-E(Z)\right)^n\right] = \sum_{i=0}^n {n\choose i} m_i (-E(Z))^{n-i}

Mathematica calculates the first four central moments as

\begin{array}{rcl} \displaystyle \mu_1&=&0 \\ \displaystyle \mu_2 &=& \frac{\sigma^2 \left(1-e^{-2 \alpha t}\right)}{2\alpha }+\frac{\lambda \left(1- e^{-2 \beta t}\right)}{\beta \eta ^2}\\ \mu_3 &=& \frac{\lambda \left(2-2 e^{-3 \beta t}\right)}{\beta \eta ^3} \\ \displaystyle \mu_4 &=& \frac{3 e^{-5 t (\alpha +\beta )} \left(16 \alpha ^2 \beta \lambda e^{5 \alpha t+3 \beta t} \sinh (2 \beta t)+e^{t (\alpha +\beta )} \left(\beta \eta ^2 \sigma^2 \left(e^{2 \alpha t}-1\right) e^{2 \beta t}+2 \alpha \lambda e^{2 \alpha t} \left(e^{2 \beta t}-1\right)\right)^2\right)}{4 \alpha ^2 \beta ^2 \eta ^4}.\end{array}

Approximations to Black-Scholes like log-normal models with higher moments are summarized in [2]. These extensions are parameterized in terms of the Fisher parameters for sknewness and kurtosis

\displaystyle \gamma_3=\frac{\mu_3}{\mu_2^{3/2}}, \ \gamma_4 = \frac{\mu_4}{\mu_2^2}-3

Corrado and Su [3] get for an European call option price with third and fourth order corrections

\begin{array}{rcl} \displaystyle C_{CS}&=&C_{BS}(F_0^t, K, \sigma, r, t) +\gamma_3 Q_3 + \gamma_4 Q4 \\ Q_3 &=& \frac{1}{3!}F_0^t\sigma\sqrt{t}\left(2\sigma\sqrt{t}-d)\right)\varphi(d)\\ Q_4&=& \frac{1}{4!}F_0^t\sigma\sqrt{t}\left(d^2-3d\sigma\sqrt{t}-1\right)\varphi(d)\\ \varphi(d) &=& \frac{1}{\sqrt{2\pi}}e^{-\frac{d^2}{2}}\\ d &=& \frac{\ln\left(F_0^t/K\right)+\sigma^2 t/2}{\sigma \sqrt{t}}\\\sigma &=& \sqrt{\mu_2 / t}.\end{array}

whereas in 1998 Rubinstein [4] derived the following approximation considering a normal Edgeworth series expansion

\begin{array}{rcl} \displaystyle C_R = &=&C_{BS}(F_0^t, K, \sigma, r, t) +\gamma_3 Q_3 + \gamma_4 Q4 + \gamma_3^2 Q_5 \\ Q_5&=& \frac{10}{6!}F_0^t\sigma\sqrt{t}\left(d^4-5d^3\sigma\sqrt{t}-6d^2+15d\sigma\sqrt{t}+3\right)\varphi(d). \end{array}

Two regimes are considered to test the quality of the different approximations for the Kluge model. First a regime with relatively small jumps but changing jump intensity and second a regime with a few jumps but changing jump size. The test parameters are

F_0=30, K=F_0, t=0.5, X_0=Y_0=0, \\ \alpha=4.0, \sigma=1.0, \beta=5.0, \eta=5.0, \lambda=4.0

The reference results for Euopean vanilla calls are generated with the corresponding finite difference pricing engine and Richardson extrapolation. First test case is to vary the number of jumps by increasing \lambda from zero to 40.jumpintensity.pngThe Corrado/Su fourth order approximation wins only for very small jump intensities, whereas for larger jump intensities the Corrado/Su approximation up to third order competes head-to-head against the Rubinstein formula. Similar picture can be seen for the second test case, few jumps with increasing jump size. For small jumps the fourth order approximation takes the lead but for most of the time the Rubinstein approximation gives the best results.jumpsize.png

Source code for these results can be taken from the PR #728.

[1] Kluge, T. : Pricing Swing Options and other Electricity Derivatives
[2] Jurczenko, E, Maillet, B and Negrea, B: Multi-moment Approximate Option Pricing
Models: A General Comparison (Part 1)
[3]  Corrado C. and T. Su: Implied Volatility Skews and Stock Return
Skewness and Kurtosis Implied by Stock Option Prices , European Journal of
Finance 3, 73-85., 1997
[4] Rubinstein M: Edgeworth Binomial Trees, Journal of Derivatives
5 (3), 20-27., 1998

 

Almost exact SABR Interpolation using Neural Networks and Gradient Boosted Trees

Update 03-11-2019: Added arbitrage free SABR calibration based on neural networks.

Very efficient approximations exist for the SABR model

\begin{array}{rcl} \displaystyle dF_t &=& \alpha_t F_t^\beta dW_t \\ \nonumber d\alpha_t &=& \nu \alpha_t dZ_t \\ \nonumber \rho dt &=& <dW_t, dZ_t> \end{array}

like the original Hagan et. al. formula [1] or variants of it [2] but these analytic formulas are in general not arbitrage free. Solving the corresponding partial differential equation leads to an arbitrage free solution

\displaystyle \frac{\partial u}{\partial t} - \frac{\nu^2}{2}\frac{\partial u}{\partial x} + \frac{1}{2}e^{2x}F_t^{2\beta}\frac{\partial^2 u}{\partial F^2}+\frac{\nu^2}{2}\frac{\partial^2 u}{\partial x^2}+\rho\nu e^{x_t}F_t^\beta\frac{\partial^2 u}{\partial F \partial x} -ru = 0

but is computationally demanding. The basic idea here is to use a neural network or gradient boosted trees to interpolate (predict) the difference between the analytic approximation and the exact result from the partial differential equation for a large variate of model parameters.

First step is to reduce the number of dimensions of the parameter space \{F_0,  \alpha, \beta, \nu, \rho \} by utilizing the scaling symmetry of the SABR model [3]

\begin{array}{rcl} \displaystyle F_t &\rightarrow& \lambda F_t \\ \nonumber \alpha_t &\rightarrow& \lambda^{1-\beta}\alpha_t. \end{array}

so that we can focus on the case F_0=1.0 without lose of generality

\displaystyle  \sigma_{BS}\left( K, F_0, T, \alpha, \beta, \nu, \rho \right) = \sigma_{BS}\left(\lambda K, \lambda F_0, T, \lambda^{1-\beta}\alpha, \beta, \nu, \rho \right).

This in turns also limits the “natural” parameter space for \{ \alpha, \beta, \nu \} which will be set to

\displaystyle \alpha \in [0, 1], \ \beta \in [0, 1], \ \nu \in [0, 1].

Next on the list is to set-up an efficient PDE solver to prepare the training data. The QuantLib solver supports already the two standard error reduction techniques, namely adaptive grid refinement around important points and cell averaging around special points of the payoff. The latter one ensure a smooth second order convergence in spatial direction [4]. The Hundsdorfer-Viewer ADI scheme is also of second order in the time direction and additional Rannacher smoothing steps at the beginning will ensure a smooth convergence in the time direction as well [5].  Hence the Richardson extrapolation can be used to improve the convergence order of the overall algorithm. An example pricing for

F_0=1.0, K=0.466, T=0.6, \alpha = 0.825, \beta=0.299, \nu=0.410, \rho=-0.166

is shown in the diagram below to demonstrate the efficiency of the Richardson extrapolation. The original grid size for scaling factor 1.0 is \{ T, S, v \} = \{20, 200, 25\}.
sabr_pde_richardson.png

The training data was generated by a five dimensional quasi Monte-Carlo Sobol sequence for the parameter ranges

\displaystyle \alpha \in [0, 1], \ \beta \in [0, 1], \ \rho \in [-1, 1], \ \nu \in [0, 1], T\in [\frac{1}{12}, 1].

The strikes are equally distributed between the 1\% and 99\% quantile of the risk neutral density distribution w.r.t to the ATM volatility of the SABR model. The PDE solver will not only calculate the fair value for F_0=1.0 but for a range of spot values around F_0. Using the scaling symmetry of the SABR model this can be utilized to calculate more prices with new K'=\lambda K and \alpha'=\lambda^{1-\beta}\alpha values for F_0=1.0.

The training set includes 617K samples values.  The network is trained to fit the difference between the correct SABR volatility from the solution of the partial differential equation and the Floc’h-Kennedy approximation. It does not need a large neural network to interpolate the parameter space, e.g. the following Tensorflow/Keras model definition with 46K parameters has been used in the examples below

model = Sequential()
model.add(Dense(20, activation='linear', input_shape=(7, )))
model.add(Dense(100, activation='linear'))
model.add(Dense(400, activation='sigmoid'))
model.add(Dense(10, activation='tanh'))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='mae', optimizer='adam')

As always it is important for the predictive power of the neural network to normalize the input data e.g. by using sklearn.preprocessing.MinMaxScaler. The out-of-sample mean absolute error of the neural network is around 0.00025 in annualized volatility, far better than the Kennedy-Floc’h or Hagan et al approximation.sabr_vol_dist

The diagram below shows the difference between the correct volatility and the different approximations using the parameter example from the previous post. sabr_volsOne could also used gradient tree boosting algorithms like XGBoost or LightGBM. For example the models

xgb_model = xgb.XGBRegressor(nthread=-1,
                             max_depth=50,
                             n_estimators=100,
                             eval_metric ="mae")

gbm_model = lgb.train({'objective': 'mae',
                       'num_leaves': 500 }
                      lgb.Dataset(train_X, train_Y),
                      num_boost_round=2000,
                      valid_sets=lgb_eval,
                      early_stopping_rounds=20)

result in similar out-of-sample mean absolute errors of 0.00030 for XGBoost and 0.00035 for LightGBM. On the first glance the interpolation looks smooth as can be seen in the diagram below using the same SABR model parameters, but zooming into it exposes non differentiable points, which defeats the object of stable greeks.

xgboost.pngThe average run time for the different approximations is shown in the tabular below.

\begin{tabular}{|l|c|c|} \hline \textbf{Algorithm} & \textbf{Run Time}  \\ \hline Hagan et al & 0.16us  \\ \hline Floc'h-Kennedy & 1.86us  \\ \hline Tensorflow DNN & 12.39us  \\ \hline XGBoost & 4.61us \\ \hline LightGBM & 21.84 us \\ \hline PDE \& Richardson & 1.23s \\ \hline\end{tabular}

With this highly efficient pricing routines calibration of the full SABR model can be done in a fraction of a second. To test this approach several Heston parameter configurations have been used to calculated the implied volatility of 15 benchmark options for a single expiry. The full SABR model has been calibrated against these volatility sets with help of a standard Levenberg-Marquardt optimizer by either using the PDE pricer or the neural network pricer. As expected the neural network calibration routine has only taken 0.2 seconds but the PDE calibration has taken over half an hour on average.

[1] P. Hagan, D. Kumar, A. Lesnieski, D. Woodward: Managing Smile Risk.
[2] F. Le Floc’h, G. Kennedy: Explicit SABR Calibration through Simple Expansions.
[3] H. Park: Efficient valuation method for the SABR model.
[4] K. in’t Hout: Numerical Partial Differential Equations in Finance explained.
[5] K. in’t Hout, M. Wyns: Convergence of the Hundsdorfer–Verwer scheme for two-dimensional convection-diffusion equations with mixed derivative term

 

Finite-Difference Solver for the SABR Model

Despite being based on a fairly simple stochastic differential equation

\begin{array}{rcl} \displaystyle dF_t &=& \alpha_t F_t^\beta dW_t \\ \nonumber d\alpha_t &=& \nu \alpha_t dZ_t \\ \nonumber \rho dt &=& <dW_t, dZ_t> \end{array}

the corresponding partial differential equation

\displaystyle \frac{\partial u}{\partial t} - \frac{\nu^2}{2}\frac{\partial u}{\partial x} + \frac{1}{2}e^{2x}F_t^{2\beta}\frac{\partial^2 u}{\partial F^2}+\frac{\nu^2}{2}\frac{\partial^2 u}{\partial x^2}+\rho\nu e^{x_t}F_t^\beta\frac{\partial^2 u}{\partial F \partial x} -ru = 0

for the SABR model – derived using the variable transformation x_t = \ln \alpha_t together with Ito’s lemma and the Feynman-Kac formula – is quite difficult to solve numerically. Part of the problem is the process of the underlying, which corresponds to a constant elasticity of variance (CEV) model if \nu is zero. This model exhibits a variety of different behaviours depending on the value of \beta and on the boundary conditions. The authors in [1] give a comprehensive overview on this topic. To limit the possible model zoo let’s define

\displaystyle X\left(F_t\right) = X_t=\frac{F_t^{2(1-\beta)}}{\alpha^2(1-\beta)^2} \ \wedge \  \delta=\frac{1-2\beta}{1-\beta}

and assume absorbing boundary conditions at X=0 if \delta < 2. First step for an implementation of a finite difference scheme is to find efficient limits for the discretization grid. These limits can e.g. be derived from the cumulative distribution function of the underlying process.

\textrm{Pr}(F \le F_t | F_0) = \begin{cases} 1-{\chi '}^{2}\left( \frac{X_0}{t}; 2-\delta, \frac{X_t}{t}\right), & \textrm{if} \ \delta < 2 \\ \nonumber 1 - {\chi '}^{2}\left( \frac{X_t}{t}; \delta, \frac{X_0}{t}\right), & \textrm{if}  \ \delta > 2 \end{cases} .

Please notice that the underlying X(F_t) in the first case \delta < 2 appears in the non centrality parameter of the non central chi-squared distribution {\chi '}^{2}, hence the calculation of the inverse can only be carried out by a numerical root-finding algorithm. Sankaran’s approximation of the non central chi-squared distribution {\chi '}^{2} can be used to speed-up this method [2]. In the second case \delta > 2 the inverse of the equation \displaystyle \textrm{Pr}(F \le F_t | F_0) = q is

F_t= \left[{t \left({\chi'}^2\right)}^{-1} \left(1-q; \delta, \frac{X_0}{t}\right)  \alpha^2(1-\beta)^2\right]^{\frac{1}{2(1-\beta)}}

which can be computed using the boost library. In addition adaptive grid refinement around important points and cell averaging around special points of the payoff at maturity [3] improve the accuracy of the CEV finite difference scheme. F_T is only a local martingale when \delta > 2 or equivalent \beta > 1. In this case the call-put parity reads [1]

P_{\delta > 2} = C_{\delta>2} + K - F_0\Gamma\left(-\nu; \frac{X_0}{2t}\right)

and acts as a litmus test for the implementation of the boundary conditions and the finite difference scheme.

The variance direction x_t = \ln \alpha_t matches a Brownian motion, hence the discretization is straight forward. Operator splitting schemes like Craig-Sneyd or Hundsdorfer-Verwer [4] are tailor made to solve the two dimensional partial differential equation.

The finite difference solver can now be used to compare different approximations with the correct model behaviour and also to compare it with high accurate Monte-Carlo simulations [2]. The model configuration [5]

F_0=1.0, \alpha=0.35, \beta=0.25, \nu=1.0, \rho=0.25, T=2.0

should act as a test bed. As can be seen in the diagram below the Monte-Carlo results are in line with the results from finite difference methods within very small error bars. The standard Hagan et al. formula as well as the Le Floc’h-Kennedy [6] log-normal formula deviate significantly from the correct implied volatilities.implvol

The approximations are not arbitrage free. The probability densities for small strikes turn negative. Hagan’s formula exhibits this behaviour already for larger strikes than the Le Floc’h-Kennedy formula as can be seen in the following diagram. As expected the finite difference solution does not produce negative probabilities.

propdensity2.pngA note on the Floc’h-Kennedy approximation, the formula becomes numerically unstable around ATM strike levels, hence a second order Taylor expansion is used for moneyness

m=\frac{F}{K}-1.0 \in \{-0.0025, 0.0025\}.

The following diagram shows the difference between second order Taylor expansion and the formula evaluated using IEEE-754 double precision.

taylor

The source code is part of the PR #589 and available here.

[1] D.R. Brecher, A.E. Lindsay: Results on the CEV Process, Past and Present.
[2] B. Chen, C.W. Oosterlee, H. Weide, Efficient unbiased simulation scheme for the SABR stochastic volatility model.
[3] K. in’t Hout: Numerical Partial Differential Equations in Finance explained.
[4] K. in’t Hout, S. Foulon: ADI Finite Difference Schemes for Option Pricing in the Heston Model with Correlation.
[5] P. Hagan, D. Kumar, A. Lesnieski, D. Woodward: Arbitrage free SABR.
[6] F. Le Floc’h, G. Kennedy: Explicit SABR Calibration through Simple Expansions.

 

Compile QuantLib on an Android Mobile

Termux is an Android terminal emulator and Linux environment app which can be installed directly w/o rooting. It comes with a minimal Linux base system and additional packages can be installed afterwards. The following script compiles and runs QuantLib in a Termux session

pkg install clang autoconf automake libtool boost-dev nano git
git clone https://github.com/lballabio/quantlib.git
cd quantlib
./autogen.sh
./configure --disable-static CXXFLAGS='-O2' CXX='clang++'
make
./test-suite/quantlib-test-suite