Solving Sparse Linear Systems using CUSP and CUDA

A finite difference equation can be represented and solved based on a sparse linear system. When using schemes with implicit parts to solve the equation one needs to calculate the inverse of this sparse matrix. Iterative solvers like the BiCGStab algorithm (plus preconditioner) are tailor-made for these kind of problems. In addition these solvers can easily be adapted to parallel computing architectures like modern GPUs. The cusp library [1] implements a set of common iterative solvers for sparse matrices based on CUDA and thrust [2].

Alternatively operator splitting techniques can be used to solve the problem. In this case only tridiagonal linear systems have to be inverted. The Thomas algorithm (named after Liewellyn Thomas) can solve such systems in $O(n)$ operations instead of $O(n^3)$ operations required by the Gaussian elimination. But it is not easy to parallelize the Thomas algorithm.

On a CPU operator splitting usually outperforms iterative solvers like BiCGStab for the implicit part. The VPP pricing problem from the previous blog entries will now serve as a benchmark to compare the solver performance on the CPU against a GPU. As can be seen in the diagram below on a CPU operator splitting is faster than the BiCGStab solver plus preconditioner. But the BiCGStab on the GPU clearly gains the lead even though the corresponding preconditioner for the GPU has not been implemented yet. Please find the interface class between QuantLib/boost::ublas and cusp here.

[2] thrust, Code at the speed of light

VPP Pricing V: Least-Squares Monte-Carlo

For the sake of completeness please find here the code for the evaluation of the virtual power plant (VPP) using a least-squares Monte-Carlo algorithm. The code depends on the latest QuantLib version from the SVN trunk or the upcoming QuantLib 1.2 release. The model and power plant specifications can be found in the previous blog entries. A more general descriptionÂ  of the problem and the algorithms can be found e.g. here [1]. Test forward curves can be taken e.g. from the Kyos example download page.

The regression polynomials are of third order in the spark spread $x$ and the stochastic component of the gas price $y$.

$p_a(x,y)=a_0 + a_1 x + a_2 x^2 +a_3x^3 + a_4y + a_5y^2 + a_6y^3 + a_7xy+a_8x^2y+a_9xy^2$

The regression is carried out for every exercise right (every hour) and every possible VPP state separately. The calibration phase is based on ordinary Monte-Carlo scenarios, whereas the pricing is done using Quasi Monte-Carlo scenarios (Sobol sequence) and a Brownian Bridge (BB).

The following table summarizes the performance of the different pricing algorithms for the example contract and maturity of six month. Target accuracy is around 1% relative error in the NPV. The timings are given for a Core i5@3GHz CPU using four threads or a GTX560@0.8/1.6GHz GPU with 336 cores.

$\footnotesize{ \begin{tabular}{|c|c|c|c|c|c|} \hline Algorithm & Optimisaton & Approximation & Hardware & Runtime & Comment\\ \hline \hline Quasi-MC + BB & dyn. prog. & perfect foresight & GPU & 0.19s & single precision \\ Quasi-MC + BB & dyn. prog. & perfect foresight & CPU & 20.3s & \\ Monte-Carlo & dyn. prog. & perfect foresight & CPU & 286.3s & \\ Finite Difference & dyn. prog.& no & CPU & 487.7s & \\ Least-Squares MC & dyn. prog. & no & CPU & 645.6s & \\ Quasi-MC + BB & linear prog. & perfect foresight & CPU & 4198s & using GLPK \\ \hline \end{tabular} }$

I don’t know the reason for the bad performance of the Gnu Linear Programming Kit for these kind of problems. Some commercial linear optimizer are much faster but they can not compete with dynamic programming for a simple VPP. As soon as e.g. time integral constraints are involved linear programming might become the method of choice.

[1] H. van Dijken, The value of starting up the power plant.