Monte-Carlo Calibration of the Heston Stochastic Local Volatiltiy Model

Solving the Fokker-Planck equation via finite difference methods is not the only way to calibrate the Heston stochastic local volatility model

\begin{array}{rcl} d \ln{S_t} &=& \left(r_t - q_t - \frac{1}{2}L\left(S_t, t\right)^2\nu_t \right)dt + L(S_t, t)\sqrt{\nu_t} dW_t^S \\ \nonumber d\nu_t &=& \kappa\left(\theta - \nu_t\right)dt + \eta\sigma\sqrt{\nu_t}dW_t^\nu \\ \nonumber \rho dt &=& dW_t^\nu dW_t^S \end{array}

The basic equation to calibrate the leverage function L(S_t,t) for a local volatility surface \sigma_{LV}(S_t,t) and a set of Heston parameters \{\nu_0, \kappa, \theta, \sigma, \rho, \eta \} is given by

L(S_t, t)=\frac{\sigma_{LV}(S_t, t)}{\sqrt{\mathop{\mathbb{E}}[\nu_t|S=S_t]}}= \sigma_{LV}(S_t, t)\sqrt{\frac{\int_{\mathop{\mathbb{R}}^+} p(S_t,\nu, t) d\nu}{\int_{\mathop{\mathbb{R}}^+}\nu p(S_t,\nu,t)d\nu}}

Key problem here is to calculate the expectation value \mathop{\mathbb{E}}[\nu_t|S=S_t]. This can either be done via the Fokker-Planck equation as outlined in [3] and the references in there or via Monte-Carlo simulations as shown in [2].  Getting the solution of the Fokker-Planck equation via finite difference methods is anything but easy if the Feller constraint is violated. In this case the Monte-Carlo approach promises less numerical problems to overcome. Starting point for an efficient Monte-Carlo calibration is a fast and accurate simulation scheme for a stochastic local volatility (SLV) model. The variance part of the SLV can be sampled exactly using the non-central \chi^2 distribution.

Pr(\nu_{t+\Delta t} < \nu | \nu_t) = F_{\chi^{\prime }2}\left(\nu \frac{4\kappa}{\eta^2\sigma^2\left(1-e^{-\kappa \Delta t}\right)}; \frac{4\kappa\theta}{\eta^2\sigma^2}, \frac{4\kappa}{\eta^2\sigma^2\left(1-e^{-\kappa \Delta t}\right)}\nu_t e^{-\kappa \Delta t}\right)

The boost library supports the inverse of the cumulated non-central \chi^2 distribution but the calculation itself is rather slow. The Heston QE scheme outlined in [1] comes with an fast and accurate sampling scheme for the square root process. In the case \gamma_1=\gamma_2=\frac{1}{2} it reads

\begin{array}{rcl} m &=& \theta+(\nu_t - \theta)e^{-\kappa \Delta t} \\[6pt] \nonumber \psi &=& \frac{\nu_t \frac{\eta^2\sigma^2 e^{-\kappa\Delta t}}{\kappa}\left(1-e^{-\kappa\Delta t} \right) + \frac{\theta\eta^2\sigma^2}{2\kappa}\left(1-e^{-\kappa\Delta t}\right)^2}{m^2}\\[6pt] \nonumber \beta &=& \frac{2}{\psi} - 1 + \sqrt{\frac{2}{\psi}\left(\frac{2}{\psi}-1 \right)} \\[6pt] \nonumber p &=& \frac{\psi-1}{\psi+1} \\[6pt] \nonumber Z_\nu &\sim& \mathcal{N}(0,1) , u \sim \mathcal{U}(0,1) \\[6pt] \nonumber \nu_{t+\Delta t} &=& \begin{cases} \frac{m}{1+\beta}\left(\sqrt{\beta}+Z_\nu\right)^2 & \text{if } \psi < 1.5 \\[6pt] \begin{cases} 0, & \text{if } u \leq p \\[6pt] \ln{\left(\frac{1-p}{1-u}\right)}\frac{m}{1-p} & \text{otherwise} \end{cases} & \text{if } \psi \geq 1.5 \end{cases} \end{array}

The authors in [2] suggesting a sampling scheme for X_t = \ln{S_t}. The scheme below is a bit closer to the original QE scheme and without a rigorous proof it has shown slightly smaller discretisation  errors for the examples.

\begin{array}{rcl} X_{t+\Delta t} &=& X_t + \left(r_t - q_t - \frac{1}{4}\left(\nu_t+\nu_{t+\Delta t}\right)L(S_t, t)^2 \right)\Delta t \\[6pt] \nonumber &&+ \frac{\rho}{\sigma}L(S_t, t) \left(\nu_{t+\Delta t} -\nu_t - \kappa\theta\Delta t + \frac{1}{2}\left(\nu_t + \nu_{t + \Delta t}\right)\kappa\Delta t\right) \\[6pt] \nonumber &&+\sqrt{\frac{1}{2}\left(1-\rho^2\right)\left(\nu_t + \nu_{t+\Delta t}\right) Z_x \Delta t} \\[6pt] \nonumber Z_x &\sim& \mathcal{N}(0,1) \end{array}

To get from the Monte-Carlo paths to the leverage function the authors in [2] using a binning technique in every time step. QuantLib supports two ways to generate the random draws, either based on i.i.d normal variables or based on Sobol quasi Monte-Carlo simulations with brownian bridges.

Test case definition: Feller constraint is violated.

\begin{array}{rcl} S_{0} &=& 100 \\ \nonumber r_t &=& 5\% \\ \nonumber q_t &=& 2\% \\ \nonumber \sigma_{LV}(S_t, t) &=& 30\% \\ \nonumber \kappa &=& 1.0\\ \nonumber \theta &=& 0.06\\ \nonumber \rho &=& -0.75\\ \nonumber \sigma &=& 0.4\\ \nonumber \nu_{0} &=& 0.09 \end{array}

Effectively the leverage function L(S_t, t) will remove the skew and term structure introduced by the Heston model and shifts volatility to 30%. In order to measure the calibration quality the price mismatches expressed in volatility basis point differences are calculated for OTM calls and puts with prices above 0.02 and for maturity from one month until two year. The SLV prices are calculated by solving the Feynman-Kac backward equation using finite difference methods.

avgpriceerror

The diagram above shows the average pricing error for different number of Monte-Carlo calibration paths and for different bin sizes. As expected the optimal bin size depends on the number of Monte-Carlo simulations. Especially deep OTM options benefit from a large number of bins. The usage of Sobol quasi Monte-Carlo simulations with brownian bridges does not give a significant advantage.

The average pricing error is decreasing with the square root of the number of Monte-Carlo calibration paths if one always choses the optimal bin size as shown in the diagram below.  Quasi Mont-Carlo path generation with brownian bridges gives a small advanage.

optimalbinsize

As expected the leverage function itself exposes some Monte-Carlo noise, see diagram below. The corresponding leverage function based on the finite difference method is smooth but the overall calibration error is of the same size.

mc_leverage

The calibration for different mixing factors \eta is straight forward. The diagram below e.g. shows the prices of a double no touch option for different distances of the (symemtric) lower and upper barriers expressed in terms of the corresponding Black-Scholes prices. This diagram type is derived from figure 8.8 in [4].

moustache

The source code is available on github as part of the test suite HestonSLVModelTest.

[1] Leif Andersen, Efficient Simulation of the Heston Stochastic Volatility Model

[2] Anthonie Van der Stoep, Lech Grzelak, Cornelis Oosterlee,
The Heston Stochastic-Local Volatility Model: Efficient Monte Carlo Simulation

[3] Johannes Goettker-Schnetmann, Klaus Spanderen, Calibrating the Heston Stochastic Local Volatility Model using the Fokker-Planck Equation

[4] Iain J. Clark, Foreign Exchange Option Pricing: A Practitioner’s Guide

Parallel Unit Test Runner using MPI

Update: 10.03.2016: added performance results of latest QuantLib test-suite build on a 32 Core SMP machine using boost interprocess instead of MPI.

Running QuantLib’s test suite on a recent computer takes around 10min. The situation will improve a lot if the test runner utilises more than one core of today’s multi-core CPUs to run the tests in parallel. Unfortunately multi-threading won’t work because the boost unit test framework is not thread safe.  A reasonable way forward to speed-up the test suite is via multiprocessing using message passing between the compute nodes based on the master-slave paradigm. The cross platform standard MPI together with boost::mpi is tailor-made for this task.

The missing piece is an external, parallel unit test runner, which uses MPI for load balancing and to collect the test results. The runner for QuantLib ‘s test suite needs boost version 1.59 or higher and can be found here.  Please replace in quantlibtestsuite.cpp line 24

 #include <boost/test/unit_test.hpp> 

by

 #include <mpiparalleltestrunner.hpp> 

and do not link the executable with libboost_unit_test_framework.so because the new header file includes the header only version of the boost test framework (Thanks to Peter Caspers for the hint). Load balancing is a crucial point for the overall speed-up because  QuantLib’s test suite has a handful of long running tests (max. around 90 seconds), which should be scheduled first. Therefore the MPI test runner collects the statistics of every unit-test’s runtime and stores these in a local file to plan the schedule of the next runs.

parallel_smp

The diagram above shows the runtime of QuantLib’s test suite for a different number of parallel processes and CPU configurations. The minimum runtime is set by the longest running test case, which is around 50 seconds. On a single CPU the performance scales pretty linear with the number of cores being utilised and also hyper-threading cores help. Using more than sixteen real cores does not improve the situation any further because the overall runtime is already dominated by the longest running test case.

QuantLib 1.6.2 Multithreading Patch for JVM/.NET

Update 23.11.2015: A modified version of the patch is now part of the official QuantLib Release 1.7.

The implementation of QuantLib’s observer pattern does not tolerate a parallel garbage collector running in a different thread. This can lead to random crashes or producing “pure virtual function calls” when using QuantLib in JVM and .NET languages (e.g. Java/Scala and C#/F#) via the SWIG interface. An original description of this problem can be found e.g. here and within the references.

With the version 1.58 and higher the boost library has extended the interface of class boost::enable_shared_from_this<T> by the method

weak_ptr<T> weak_from_this() noexcept;

which gives access to the underlying weak pointer used to implement the “shared pointer from this” functionality. This method now allows for a much cleaner and easier to install patch to make QuantLib’s observer pattern thread safe. Especially it is no longer necessary to define preprocessor defines or to edit boost files.

The implementation is backed by boost::signals2 to get the observer pattern working in a thread safe manner. The boost::signals2 library works purely based on shared_ptr. Therefore the first step is to add an Observer::Proxy class to QuantLib’s Observer, which can be registered with boost::signals2 and which forwards the update calls to the corresponding observer. In addition the class Observer is now derived from boost::enable_shared_from_this.

class Observer : public boost::enable_shared_from_this<Observer> {
          friend class Observable;
  public:
    // public interface remains untouched

    virtual ~Observer() {
        boost::lock_guard<boost::recursive_mutex> lock(mutex_);
        if (proxy_) {
            proxy_->deactivate();

            for (iterator i=observables_.begin(); 
                 i!=observables_.end(); ++i) 
                (*i)->unregisterObserver(proxy_);
        }
    }
  private:
    class Proxy {
      public:
        Proxy(Observer* const observer)
         : active_  (true),
           observer_(observer) {
        }

        void update() const {
            boost::lock_guard<boost::recursive_mutex> lock(mutex_);
            if (active_) {
                const boost::weak_ptr<Observer> o
                    = observer_->weak_from_this();
                if (!o._empty()) {
                    boost::shared_ptr<Observer> obs(o.lock());
                    if (obs)
                        obs->update();
                }
                else {
                    observer_->update();
                }
            }
        }

        void deactivate() {
            boost::lock_guard<boost::recursive_mutex> lock(mutex_);
            active_ = false;
        }

    private:
        bool active_;
        mutable boost::recursive_mutex mutex_;
        Observer* const observer_;
    };

    boost::shared_ptr<Proxy> proxy_;
    mutable boost::recursive_mutex mutex_;

    std::set<boost::shared_ptr<Observable> > observables_;
    typedef std::set<boost::shared_ptr >::iterator 
        iterator;
};

The critical path for the race condition in the original implementation is when the destructor of an observer is called while an observable triggers an update call. Now the destructor of an observer deactivates the proxy if the destructor is able to get the same mutex lock as the update method of the proxy acquires. If the proxy gets deactivated it will not forward any update calls to the observer which is about to be deleted.

If the update call manages to acquire the mutex lock then it will delay any concurrent observer destructor calls. Therefore it is safe trying to get the underlying weak pointer from the observer linked to the proxy in line 27. This weak pointer can be empty if no shared pointer has been created from the observer in the past. If a weak pointer exists then the proxy tries to get the shared pointer and calls the update methods from there. This part follows the same methodology as outlined in [1}. The rest of the implementation inherits the thread safeness from the boost signals2 library.

The patch for QuantLib 1.6.2 can be found here. It also contains Riccardo’s thread-safe singleton patch and it requires boost version 1.58 or higher.

[1] Shuo Chen, Where Destructors meet Threads

Heston Model Calibration using Adjoint Algorithmic Differentiation

Algorithmic Differentiation becomes more and more popular in financial engineering since the method was first brought to the attention of a wider audience in [1]. Key factors for the popularity are

  • Adjoint Algorithmic Differentiation (AAD): computational cost to calculate all first order partial derivatives of a function (or a computer program) with this method is loosely speaking three to six times larger than the cost of evaluating the function itself.  If the function has a large number of first order partial derivatives then this method clearly overtakes the finite difference method, for which the computational cost is proportional to the number of partial derivatives.
  • Easy to use C++ libraries, among others e.g. CppAD or ADOL-C
  • Accuracy: The partial derivatives are the direct result of a function evaluation and do not depend on an arbitrary bumping parameter.

Adjoint Algorithmic Differentiation was first mentioned in conjunction with QuantLib in Sebastian and Jan’s talk at the 2013 User Group Meeting [2]. The topic has gain further momentum with Peter’s initial blog on Adjoint Greeks and with Alexander’s announcement that CompatibL is working on an AAD port of QuantLib. Alexander will give a talk at this years Global Derivatives Conference on the techniques involved to port QuantLib.

Almost all efficient local optimisation algorithms used for model calibration like the Levenberg-Marquardt algorithm are based on gradient methods and therefore need to calculate the Jacobian matrix of the target function Z. The target function for the Heston model calibration is defined by the goodness of fit measure

\displaystyle \zeta = \min_\Theta \|Z\|^2 = \min_\Theta \sum_{i=1}^N Z_i^2 = \min_\Theta \sum_{i=1}^N \left( \sigma_i^{model}(K_i, T_i) - \sigma_i^{market}(K_i, T_i, \Theta) \right)^2

for the model parameters

\Theta = \{ x_1, .., x_5\} = \{ \nu_0, \kappa, \theta, \sigma, \rho\}

The model prices of the calibration options are evaluated using Gauss-Laguerre integration of the characteristic functions \phi_j(k) (j=1,2):

\begin{array}{rcl} \displaystyle \Pi_j &=& \displaystyle \frac{1}{2} + \frac{1}{\pi} \int_0^\infty\Re\left[\frac{\phi_j(u)\mathrm{e}^{-iu \ln K}}{iu}\right]\mathrm{d}u \\[0.9em] \nonumber \displaystyle P_j(\varphi) &=& \displaystyle \frac{1-\varphi}{2} + \varphi \Pi_j \\[0.9em] \nonumber \displaystyle \mathrm{NPV} &=& \varphi\left[\mathrm{e}^{-qT}SP_1(\varphi) - \mathrm{e}^{-rT}KP_2(\varphi)\right] \end{array}

with the binary variable \varphi=+1 for a call and \varphi=-1 for a put. The first step in order to use AAD for the model calibration is an implementation of the Gauss-Laquerre integration based on the CppAD library. The only change needed is to replace the data type Real by CppAD::AD<Real> in

template <class F>
CppAD::AD<Real> GaussianADQuadrature::operator()(const F& f) 
    const {
    CppAD::AD<Real> sum = 0.0;
    for (Integer i = order()-1; i >= 0; --i) {
        sum += w_[i] * f(x_[i]);
    }
    return sum;
}

Using this AAD version of the Gauss-Laguerre integration the method

CppAD::AD<Real>; 
AnalyticHestonADEngine::Fj_Helper::operator()(Real phi) const;

can be ported in a similar manner and the AnalyticHestonADEngine::doCalculation method now reads

std::vector<CppAD::AD<Real> > params;
params += spotPrice, v0, kappa, theta, sigma, rho;
CppAD::Independent(params);

std::vector<CppAD::AD<Real> > y(1);

// untouched code ...

const std::vector<Real> moreResults 
    = CppAD::ADFun<Real>(params, y)
          .Reverse(1, std::vector<Real>(1, 1.0));

results.value = CppAD::Value(y[0]);
results.additionalResults["v0"]    = moreResults[1];
results.additionalResults["kappa"] = moreResults[2];
results.additionalResults["theta"] = moreResults[3];
results.additionalResults["sigma"] = moreResults[4];
results.additionalResults["rho"]   = moreResults[5];

All first order Greeks for the calibration instruments can now be calculated using AAD. The Jacobian of the target function Z w.r.t. the first order Greeks is given by

\begin{array}{rcl} \displaystyle\frac{\partial Z_i}{\partial x_j} &=& \displaystyle \frac{\partial \sigma_i^{model}(\Theta)}{\partial x_j} = \displaystyle \frac{1}{\frac{\partial NPV_i}{\partial \sigma_i}}\frac{\partial NPV_i}{\partial x_j} \\[0.9em] \nonumber &=& \displaystyle \frac{1}{\nu_{BS}}\frac{\partial NPV_i}{\partial x_j} \end{array}

The advantage of using AAD for the Heston model is not calculation speed but precision. In fact the AAD version of the Heston model calibration is slower than the finite difference based method but the AAD method does not need an arbitrary, fine-tuned bumping parameter or any higher order finite difference schemes to come up with high precision first order derivatives. The diagram below shows the relative difference between the AAD value and several finite difference approximations for \frac{\partial NPV}{\partial \nu_0} of an ATM option with two years to maturity and

\nu_0=0.1, \kappa=1.0, \theta=0.1, \sigma=0.5, \rho=-0.75

Only the six point central finite difference scheme with optimal bumping size gives the AAD value within/close to machine precision. The two point forward scheme, which is used by default in the MINPACK implementation of the Levenberg-Marquardt algorithm, reproduces only the first eight digits. A more detailed analysis for the forward scheme can be found in [3] or [4]. Especially the latter paper calculates the values for the optimal choice of the bumping size for the different schemes. Please find the source code for the AAD pricing engine here.plot

The rate of convergence can be improved by using the Richardson extrapolation. For the diagram below the same analysis was repeated including a Richardson extrapolation step.

plotRE

[1] Giles, M. and Glasserman, P., (2006) Smoking adjoints: fast Monte Carlo Greeks. Risk, 19:88–92. 1

[2] Schlenkrich, S. and Riehme J., (2013) Design Patterns for Algorithmic Differentiation

[3] Kopecky, K. (2007) Numerical Differentiation, Lecture Notes

[4] Numerical Differentiation in Integration, Lecture Notes

QuantLib-1.5 SWIG Patch for JVM/.NET

Update 23.11.2015: A modified version of the patch is now part of the official QuantLib Release 1.7.

Update 22.09.2015: Please find the latest and improved version of the patch for QuantLib 1.6.2 here.

The usage of QuantLib in JVM and .NET languages (e.g. Java/Scala and C#/F#) via the SWIG interface has a known shortcoming. The implementation of QuantLib’s observer pattern does not tolerate a parallel garbage collector running in a different thread. As a result programs are randomly crashing or producing “pure virtual function calls”. A detailed description of this problem can be found e.g. here and within the references.

Please find here a patch for QuantLib 1.5 to fix this issue.