A finite difference equation can be represented and solved based on a sparse linear system. When using schemes with implicit parts to solve the equation one needs to calculate the inverse of this sparse matrix. Iterative solvers like the BiCGStab algorithm (plus preconditioner) are tailor-made for these kind of problems. In addition these solvers can easily be adapted to parallel computing architectures like modern GPUs. The cusp library  implements a set of common iterative solvers for sparse matrices based on CUDA and thrust .
Alternatively operator splitting techniques can be used to solve the problem. In this case only tridiagonal linear systems have to be inverted. The Thomas algorithm (named after Liewellyn Thomas) can solve such systems in operations instead of operations required by the Gaussian elimination. But it is not easy to parallelize the Thomas algorithm.
On a CPU operator splitting usually outperforms iterative solvers like BiCGStab for the implicit part. The VPP pricing problem from the previous blog entries will now serve as a benchmark to compare the solver performance on the CPU against a GPU. As can be seen in the diagram below on a CPU operator splitting is faster than the BiCGStab solver plus preconditioner. But the BiCGStab on the GPU clearly gains the lead even though the corresponding preconditioner for the GPU has not been implemented yet. Please find the interface class between QuantLib/boost::ublas and cusp here.
 cusp, Generic Parallel Algorithms for Sparse Matrix and Graph Computations
 thrust, Code at the speed of light