Parallel Model Calibration using MPI and Boost.MPI

The message passing standard MPI is a language-independent communication protocol. MPI supports the parallelization of numerical algorithms on both massive parallel computers and on symmetric multi processor systems. MPI is standardized, highly portable and the de facto standard on massive parallel supercomputers. Even though MPI can be used in a multi-threading environment it is normally used in a multi-process environment. Therefore MPI is tailor-made to parallelize algorithms based on the non thread-safe QuantLib.

The roots of the MPI specification are going back to the early 90’s and you will feel the age if you use the C-API, which is designed to achieve maximum performance. The Boost.MPI library – quoting from the web page – “is a C++ friendly interface to the standard Message Passing Interface… Boost.MPI can build MPI data types for user-defined types using the Boost.Serialization library”.

Model calibration can be a very time-consuming task, e.g. the calibration of a Heston or a Heston-Hull-White model using American puts with discrete dividends. The class MPICalibrationHelper acts as a MPI wrapper for a given CalibrationHelper and allows to parallelize an existing model calibration routine (hopefully with minimal impact/effort). The source code is available here. It contains the MPICalibrationHelper class and as an example the parallel version of the DAXCalibration test case (part of test-suite/hestonmodel.cpp). The code depends on QuantLib 1.0 or higher, Boost.Thread and Boost.MPI.

The diagram above shows the speed-up of a Heston-Hull-White calibration with discrete dividends on an eight core machine using a finite difference pricing engine. The main reason for the sub-linear scaling is the limited memory bandwidth between the CPUs and the main memory and not the MPI communication overhead.

Advertisement

Pricing of a Virtual Power Plant on a GPU

Even the pricing of  a simple virtual power plant (VPP) is challenging. Main reasons are the high number of possible states of the VPP and the large number of possible exercise dates because often a VPP is priced as a bermudan-style option with hourly exercise rights. The implementation effort for an exact pricing engine based on finite difference methods (see e.g.[1]) or based on least squares Monte-Carlo is comparable large. As shown in [1] Monte-Carlo combined with  perfect foresight optimization can result in a very good approximation. The algorithm consists of a Monte-Carlo path generator and a dynamic programming optimization part, which calculates the optimal load schedule plan for each path separately. The stochastic processes involved are outlined in [1].

The CUDA based GPU implementation is available here.  It depends on the latest QuantLib version from the SVN trunk or the upcoming QuantLib 1.2 release and CUDA 4.0. The corresponding C++ implementation is a speed-optimized version of the test-case VPPTest::testVPPPricing. This version also supports multi-threading. The following hardware was used to compare both implementations:

  • CPU: Core i5@3.0 GHz, quad-core
  • GPU: GTX560@810/1620MHz, 336 cores

As can be seen in the diagram below the GPU outperforms the CPU roughly by a factor 100 for single precision and a factor of 50 if the GPU is using double precision.

The CUDA implementation consists of the following files:

gpuvpppricingengine.hpp / gpuvpppricingengine.cpp
A QuantLib pricing engine for a simple VPP based on a Monte-Carlo simulation and perfect foresight optimization via dynamic programming. The physical size of the Monte-Carlo simulation is controlled by the following parameters of the constructor

  1. Size nSimulations: number of Monte-Carlo simulations carried out.
  2. bool antithetic: enables/disables antithetic sampling
  3. Size blockSize: number of threads in a CUDA block.
  4. Size gridSize: number of CUDA blocks that are grouped together in a simulation kernel.

gpuvpppricingengine_kernel.hpp
gpuvpppricingengine_kernel.cu/ gpuvpppricingengine_kernel.def
The CUDA implementation consists of two kernels. The first kernel is the Monte-Carlo path generator, which calculates the paths on hourly granularity and stores them in the global memory of the graphic card.. The technics used are outlined e.g. in [2], [3] and [4]. The second kernel performs the optimization of the load schedule based on dynamic programming. The memory layout of this step depends on the number of possible states of the VPP because every possible state is stored in the shared memory of the GPU. The number of states is given by N_{states} = 2t_{up}+t_{down}. CUDA does not support efficient dynamic shared memory allocation. Therefore the sizes of all shared memory arrays must be given at compile time. To allow an optimal use of the limited shared memory capacity different kernels with different N_{states} values are generated using X-macros and the appropriate kernel is chosen at runtime.

cudatype.hpp
defines basic CUDA types, especially the typedef for the type “real” can be used to compile the code either for single or double precision.

gpurand.hpp
C++ interface for a GPU random number generator

gpucurand.hpp / gpucurand.cpp / gpucurand_kernel.hpp / gpucurand_kernel.cu
implementation of the GPURand interface based on the CURAND library, which is part of CUDA 4.0.

[1] this blog, VPP Pricing III: Exact Pricing based on Finite Difference Methods.

[2] L. Howes, D. Thomas, Efficient Random Number Generation and Application Using CUDA.

[3] A. Bernemann, R. Schreyer, K. Spanderen, Accelerating Exotic Option Pricing and Model Calibration Using GPUs

[4] M. Joshi, Graphical Asian Options

VPP Pricing III: Exact Pricing based on Finite Difference Methods

The total value of a virtual power pant (VPP) can be decomposed in an intrinsic part plus an extrinsic part. The intrinsic value is given by the cash-flows that the VPP would generate based on the current power and gas forward curve. Therefore the intrinsic value can be calculated without defining a stochastic model for the power and gas prices using either linear or dynamic optimization methods. Calculating the extrinsic value implies pricing the VPP exactly to calculate the total value for a given stochastic process. The model in use here is outlined in the article VPP Pricing I: Stochastic Processes & Partial Integro Differential Equation. Exact pricing can  be done using least square Monte-Carlo or finite difference methods using dynamic programming for the local optimization [1].

The focus is on the finite difference method, which involves solving the three-dimensional partial integro differential equation

\begin{array}{rcl} rV&=&\frac{\partial V}{\partial t}+\frac{\sigma_x^2}{2}\frac{\partial^2 V}{\partial x^2}-\alpha x\frac{\partial V}{\partial x}-\beta y \frac{\partial V}{\partial y} \\[6pt]&+&\frac{\sigma_u^2}{2}\frac{\partial^2 V}{\partial u^2}- \kappa u\frac{\partial V}{\partial u} +\rho\sigma_x\sigma_u\frac{\partial^2 V}{\partial x\partial u}\\[6pt] &+&\lambda\int_\mathbb{R}\left(V(x,y+z,u,t)-V(x,y,u,t) \right )\omega(z)dz \\ \end{array}

An additional fourth dimension is needed to keep track of the different states of the VPP. Based on the characteristics of the VPP type outlined in the article VPP Pricing II: Mixed Integer Linear Programming a VPP with three possible load levels P \in \{0, P_{min}, P_{max} \} has 2 t_{up} + t_{down} different states.

Pricing via Monte-Carlo and perfect foresight involves simulating the stochastic process

\begin{array}{rcl} P_t &=& \exp(p_t + X_t + Y_t) \\ dX_t &=& -\alpha X_tdt + \sigma_x dW_t^x \\ dY_t &=& -\beta Y_{t-}dt+J_tdN_t \\ \omega(J) &=& \eta e^{-\eta J} \\ G_t &=& \exp(g_t + U_t) \\ dU_t &=& -\kappa U_tdt+\sigma_udW_t^u \\ \rho &=& \mathrm{corr} (dW_t^x, dW_t^u) \end{array}

and optimize the power plant load schedule for each path separately in the same way the intrinsic value is calculated. This procedure will result in an upper bound for the exact price of the VPP. The prices for a 4 weeks VPP contract based on this two methods and with the parameters outlined in VPP Pricing II show almost no differences between the “exact” finite difference method and the perfect foresight  upper bound value beside the Monte-Carlo error (s. diagram below, compare with [1]).

The size of the lattice for the finite difference method was

(t, nPower, nJump, nGas, nStates) = (672, 101, 51, 20, 14).

The code is available here. It depends on the latest QuantLib version from the SVN trunk or the upcoming QuantLib 1.2 release. If you want to generate the plot you’ll also need R

[1] H. van Dijken,  D. van Abbena, H.S. Los, C. de Jong, The value of starting up the power plant.

VPP Pricing II: Mixed Integer Linear Programming

The next two steps are defining a simple VPP contract (or a simplified gas-run power plant) and setting up a mixed integer linear programming optimization (MIP) to calculated the intrinsic value and an upper bound for the extrinsic value based on a Monte-Carlo simulation and assuming perfect foresight. The third step outlined in the next part will then be the “exact” pricing of the extrinsic value using dynamic programming and finite difference methods.

The set-up of the simplified gas-run power plant is similar to the one explained in chapter 4.2.3 of the text-book [1]. In general the power plant has three power output level:

  • Plant is off, P=0
  • Generation at minimum load P=P_{min}
  • Generation at maximum load P=P_{max}

The power plant has a fixed efficiency rate

\zeta=\frac{MWh Power Output}{MWh Heat}.

Ramp rates will be neglected, but the power plant has a minimum uptime t_{up} and a minimum downtime t_{down}. The start-up costs are given by a fixed start-up cost \eta (in €) and the price of the gas needed to produce the start-up heat \theta (in MWh).

The mixed integer linear optimization is running in one hour blocks and is using three decision variables per hour i. The binary decision variable \beta_i is true if the power plant is running at minimum load P_{min} or at maximum load P_{max} and \beta_i is false if the plant is off. The real decision variable 0\le s_i \le 1 is equal to one if the plant is started in hour i, which is implied by the following constraint

\beta_i - \beta_{i-1} \le s_i.

The minimum up-time t_{up} and the minimum down-time t_{down} is a consequence of the constraints

\begin{array}{rcl} \beta_i &\ge& \sum_{t=i-t_{up}+1}^{t=i} s_t \\[7pt] \beta_i &\le& 1-\sum_{t=i+1}^{t=i+t_{down}} s_t \end{array}

The real decision variable 0\le \gamma \le 1 is equal to one if the power plant is running at maximum load P_{max} and zero if the power plant is either running at minimum load P_{min} or if the plant is off, that means

\beta_i \ge \gamma_i

Let P_i be the power price, G_i be the gas price and CO_2^i be the carbon dioxide price at hour i. The objective function is then given by

P\& L = \sum_{t=1}^N\left[\left(\gamma_iP_{max} + P_{min}(\beta_i-\gamma_i)\right) \left(P_i - \frac{G_i+CO_2^i}{\zeta}\right) - s_i\left(\eta + \theta (G_i+CO_2^i)\right)\right]

For a one year span the problem consists of 3\cdot 365 \cdot 24 = 26280 decision variables \{\beta_i, \gamma_i, s_i\} and 4 \cdot 365\cdot 24 = 35040 constraints. This comparable small problem can be solved using e.g. the Gnu Linear Programming Kit (GLPK). For an overview on open source linear/mixed integer programming solver see [2].

The model parameters and the example forward curves are outlined in the previous entry VPP Pricing I. The diagram below shows the intrinsic value and the upper bound for the total value (intrinsic plus extrinsic value) based on Monte-Carlo, perfect foresight and MIP for different  power plant efficiencies \zeta. The parameters of the VPP contract are given by

t_{up}=t_{down}=2h, P_{min}=8MW, P_{max}=40MW, \eta=300 EUR, \theta=20MWh,

the (fixed) carbon dioxide price is 3.0€ per MWh heat.

The source code is available here. It depends on GLPK and the latest QuantLib version from the SVN trunk or the next QuantLib 1.2 release.

It is now quite easy to add and price time-integral constraints, e.g. the following constraint restricts the number of starts within a year to be less than or equal to a given number

\sum_{t=1}^N s_i \le \#Starts.

The following diagram shows the results for \#Starts \le 25 and a minimum load P_{min}=25MW.

The source code is available here. It depends on QuantLib 1.1 and if  you want to generate the plot directly from the C++ program you’ll also need R, RCPP and RInside.

[1] M. Burger, B. Graeber, G. Schindlmayr, Managing Energy Risk, ISDN 978-0-470-ß2962-6

[2] S. R. Thorncraft, Evaluation of Open-Source LP Optimization Codes in Solving Electricity Spot Market Optimization Problems.

Swing Option: Linear vs. Dynamic Programming II

In order to get an improved impression on the differences between Monte-Carlo/linear programming and  finite difference methods/dynamic programming  a more realistic swing option with hourly payoff profile is evaluated based on a German hourly forward curve. The forward curve is taken from the Kyos example download page. The parameterization of the Kluge model is outlined in [1].

With maturity of twelve weeks the example swing option provides 12*7*24=2016 exercise opportunities. The number of overall exercised swing opportunities is constraint by

100 \le \sum_{i=1}^{2016} \beta_i \le 500.

The size of two dimensions of the finite difference method (dynamic programming) is therefore already been given. 2016 steps are needed in time direction and 500 steps are needed in the “consumed exercises” direction. Together with the two other directions – one for the power price and one dimension for the jump process – and without further simplidsfications this forms a pretty large finite difference problem. The Monte-Carlo based linear programming approach reduces the computational burden but will lead to an upper bound of the swing option price. The diagram below shows the corresponding results.



The code is available here. It depends on the GNU Linear Programming Kit, the Boost Thread library for parallelization and at the time of writing on the latest QuantLib version from the SVN trunk. If you want to generate the plot directly out of the C++ program you also need R, RCPP and RInside. To utilize all CPU cores please use the -DNTHREADS=(number of CPU cores)  compiler switch. In addition to run the program you have to download the forward curve German power from the Kyos web page and convert it into a text file EEX_2010-2013.txt of the folliwing format

4-Oct-2010 36.73 32.09 27.32 23.22 25.71 35.60 47.32 …

5-Oct-2010 38.12 31.12 22.76 25.65 27.87 34.60 50.01 …

[1] T. Kluge, Pricing Swing Options and other Electricity Derivatives

Swing Option: Linear vs. Dynamic Programming I

Finding the optimal exercise strategy for a swing option is a challenging task, even for simple payoff structures. In the absence of complicated time-integral constraints dynamic programming can be used to calculate the optimal exercise strategy. Linear programming applied ex post over the whole spot price path is capable to deal with more complicated time-integral constraints but the algorithm leads only to an upper bound of the real option price [1].

A basic payoff structure is e.g.: On the exercise dates t_i, i\in\{1..N\} the decision variables \beta_i\in\{0,1\} indicate whether to exercise (\beta_i=1) or not to exercise (\beta_i=0) the option at a given point t_i in time. Payoff is either \beta_i (P(t_i)-K) for a call or \beta_i(K-P(t_i)) for a put option. Here K is the strike price and P(t_i) is the spot price . The number of exercise opportunities is constraint by

E_{min} \le \sum_{i=1}^N \beta_i \le E_{max}.

The price of a swing call option is then given by

\text{npv} = \max_{\beta}\mathbb{E}^\mathbb{Q} \left[ \sum_{i=1}^N \beta_i \left( P(t_i) - K\right)e^{rt_i}\right]

Let the dynamics of the spot price in the risk neutral measure \mathbb{Q} be given by the Kluge model [2]:

\begin{array}{rcl} S &=& exp(X_t + Y_t) \\ dX_t &=& \alpha(\mu(t)-X_t)dt + \sigma dW_t \\ dY_t &=& -\beta Y_{t-}dt + J_tdN_t \\ \omega(J)&=& \eta_u e^{-\eta_u J} \end{array}

This model is composed of an Ornstein-Uhlenbeck (OU) process X_t, a deterministic period function \mu(t) characterizing the seasonality and a mean reverting process Y_t with jumps to incorporate spikes. N_t is a Poisson  process with jump intensity \lambda and the jump size itself is exponentially distributed. The random variables W_t, N_t and J_t are independent. The Monte-Carlo path simulation will be built on top of standard components for an OU process and a Poisson jump-diffusion process [3]. The dynamic programming approach can either be carried out using least squares Monte-Carlo ansatz (Longstaff Schwartz algorithm) or using finite difference methods. We prefer finite difference methods to avoid e.g. the problem of choosing an appropriate basis function set. The Feynman-Kac theorem leads to the corresponding partial integro differential equation (PIDE):

\begin{array}{rcl} rV =&&\frac{\partial V}{\partial t}+\frac{\sigma^2}{2}\frac{\partial^2 V}{\partial x^2} + \alpha(\mu(t)-x)\frac{\partial V}{\partial x}-\beta y\frac{\partial V}{\partial y}\\&+&\lambda \int_\mathbb{R} \left( V(x,y+z,t)-V(x,y,t)\right )\omega(z) dz \end{array}

A Gauss-Laguerre quadrature is appropriate to calculate the integral part of the PIDE. Beside this two-dimensional PIDE an additional dimension is needed to keep track of the already consumed exercise rights. The three-dimensional formulation together with Bellman’s principle of optimality transforms the global optimization problem into a local optimization problem.

Target of the linear programming approach is the upper bound

\text{npv} \le \mathbb{E}^\mathbb{Q}\left[ \max_{\beta} \sum_{i=1}^N \beta_i\left(P(t_i) - K \right) e^{rt_i}\right]

The linear programming algorithm will calculate the optimal exercise strategy \left\{ \beta \right\} on each Monte-Carlo path separately (perfect foresight) with respect to the given constraints. For this basic type of swing option linear programming is sort of over-engineering because a simple sort algorithm will also reproduce the optimal exercise strategy. But introducing linear programming and mixed integer programming now will enable us later on to deal with more complicated time-integral constraints. Libraries for this task are freely available, e.g. the GNU Linear Programming Kit.

The test parameterization of the model is

\alpha=1.0, \sigma=200\%, \mu(t)=3.0,\beta=5.0, \eta=2.0, \lambda=1.0, X_0=3.0, Y_0=0.0,

the example swing call option has maturity of one year, strike price is equal 40, one exercise opportunity per month and the minimum number of exercise rights is equal to the maximum number of exercise rights, E_{min} = E_{max}. Interest rates are at r=10\%.

As the diagram above shows for this set-up the upper bound for the swing option price calculated using linear programming differs significantly from the correct price calculated based on dynamic programming on a lattice. For twelve exercise rights both algorithms have to provide the same results because the constraint forces to always exercise on each exercise date. Next to come is to rerun the simulation with a more realistic forward curve and process parameters.

The code is available here. It depends on the GNU Linear Programming Kit, the Boost Thread library for parallelization and at the time of writing on the latest QuantLib version from the SVN trunk. If you want to generate the plot directly out of the C++ program you also need R, RCPP and RInside.

[1] M. Burger, B. Graeber, G. Schindlmayr, Managing Energy Risk, ISDN 978-0-470-ß2962-6

[2] T. Kluge, Pricing Swing Options and other Electricity Derivatives

[3] P. Glasserman, Monte Carlo Methods in Financial Engineering.  ISBN-0387004513

Calibration: Comparison of deterministic Optimizers

The gaol is to calibrate a time dependent Heston model defined by the following SDE

The parameter set \{\kappa_t, \theta_t, ,\sigma_t, \rho_t\} is supposed to be piecewise constant in time. This model has a semi-closed solution for plain vanilla European put/call options based on the characteristic function method [1].

The DAX implied volatility surface based on July 5, 2002 data taken from [2] defines the “benchmark” calibration problem. The benchmark model parameters for the optimization problem are given by

\Theta = \{\nu_0, \kappa_{0\leq t < 0.25}, \kappa_{0.25 \leq t},\theta, \sigma, \rho\}.

The non-linear least square optimization problem is defined by the goodness of fit measure

\zeta=\min_{\Theta} \sum_{i=1}^N \left( \sigma_i^{market}(K,T) - \sigma_i^{model}(K,T, \Theta) \right)^2

where \sigma_i^{market}(K,T) is the market implied volatility for strike K and maturity T and \sigma_i^{model}(K,T, \Theta) is the corresponding Black-Scholes volatility implied from the model price. The optimal solution for this problem is

\Theta_{min} = \{ 0.2231, 39.651, 7.546, 0.0954, 5.1865, -0.5004 \}

leading to a goodness of fit measure of \zeta_{min}=74.4731 (Please keep in mind that this result is the outcome of a naive calibration procedure. Due to the large \kappa and \sigma values I’d not use these parameters to price a derivative.).


The diagram above shows the “goodness of fit”-surface for the parameter sets in

\Theta = \{ 0.2231, 39.651, \kappa_{0.25 \leq t} \in [0, 16], 0.0954, \sigma \in [0, 16], -0.5004 \}

To be able to compare a larger number of deterministic optimizers the model calibration will be carried out using R and with help of the additional packages minpack.lm and minqa.

Non-linear Least Square Optimization:

  1. nls.lm: Levenberg-Marquardt algorithm(based on MINPACK, also available in QuantLib)
  2. nls: Gauss-Newton algorithm
  3. nl2sol:based on the PORT library.

Non-linear (trusted region) Minimization:

  1. nlm:Newton style minizer
  2. bfgs: quasi-Newton method published by Broyden, Fletcher, Goldfarb and Shanno
  3. l-bfgs-b: limited memory BFGS algorithm incl.box constraints
  4. cg: conjugate gradient algorithm
  5. nm: Nelder-Mead method
  6. bobyqa:trust region method that forms quadratic models by interpolation.
  7. newuoa: trust region method that forms quadratic models by interpolation.
  8. uobyqa:trust region method that forms quadratic models by interpolation.

The particular result dependents on the starting vector but the following diagram shows a common outcome.


The best methods are the non-linear least square algorithms nls.lm and nls followed by nl2sol. Algorithms not included in the diagram have performed even worse than “nm” for this problem.

The goodness of fit measure is calculated in C++ based on the QuantLib and exposed to R using RCPP. The C++ code and the R scripts to perform the optimizations and to create the plots can be found here.

1] A. Elices, Models with time-dependent parameters using transform methods: application to Heston’s model,

[2]  A. Sepp, Pricing European-Style Options under Jump Diffusion Processes with Stochastic Volatility: Applications of Fourier Transform.