Parallel Unit Test Runner using MPI

Update: 10.03.2016: added performance results of latest QuantLib test-suite build on a 32 Core SMP machine using boost interprocess instead of MPI.

Running QuantLib’s test suite on a recent computer takes around 10min. The situation will improve a lot if the test runner utilises more than one core of today’s multi-core CPUs to run the tests in parallel. Unfortunately multi-threading won’t work because the boost unit test framework is not thread safe.  A reasonable way forward to speed-up the test suite is via multiprocessing using message passing between the compute nodes based on the master-slave paradigm. The cross platform standard MPI together with boost::mpi is tailor-made for this task.

The missing piece is an external, parallel unit test runner, which uses MPI for load balancing and to collect the test results. The runner for QuantLib ‘s test suite needs boost version 1.59 or higher and can be found here.  Please replace in quantlibtestsuite.cpp line 24

 #include <boost/test/unit_test.hpp> 

by

 #include <mpiparalleltestrunner.hpp> 

and do not link the executable with libboost_unit_test_framework.so because the new header file includes the header only version of the boost test framework (Thanks to Peter Caspers for the hint). Load balancing is a crucial point for the overall speed-up because  QuantLib’s test suite has a handful of long running tests (max. around 90 seconds), which should be scheduled first. Therefore the MPI test runner collects the statistics of every unit-test’s runtime and stores these in a local file to plan the schedule of the next runs.

parallel_smp

The diagram above shows the runtime of QuantLib’s test suite for a different number of parallel processes and CPU configurations. The minimum runtime is set by the longest running test case, which is around 50 seconds. On a single CPU the performance scales pretty linear with the number of cores being utilised and also hyper-threading cores help. Using more than sixteen real cores does not improve the situation any further because the overall runtime is already dominated by the longest running test case.

Parallel Model Calibration using MPI and Boost.MPI

The message passing standard MPI is a language-independent communication protocol. MPI supports the parallelization of numerical algorithms on both massive parallel computers and on symmetric multi processor systems. MPI is standardized, highly portable and the de facto standard on massive parallel supercomputers. Even though MPI can be used in a multi-threading environment it is normally used in a multi-process environment. Therefore MPI is tailor-made to parallelize algorithms based on the non thread-safe QuantLib.

The roots of the MPI specification are going back to the early 90’s and you will feel the age if you use the C-API, which is designed to achieve maximum performance. The Boost.MPI library – quoting from the web page – “is a C++ friendly interface to the standard Message Passing Interface… Boost.MPI can build MPI data types for user-defined types using the Boost.Serialization library”.

Model calibration can be a very time-consuming task, e.g. the calibration of a Heston or a Heston-Hull-White model using American puts with discrete dividends. The class MPICalibrationHelper acts as a MPI wrapper for a given CalibrationHelper and allows to parallelize an existing model calibration routine (hopefully with minimal impact/effort). The source code is available here. It contains the MPICalibrationHelper class and as an example the parallel version of the DAXCalibration test case (part of test-suite/hestonmodel.cpp). The code depends on QuantLib 1.0 or higher, Boost.Thread and Boost.MPI.

The diagram above shows the speed-up of a Heston-Hull-White calibration with discrete dividends on an eight core machine using a finite difference pricing engine. The main reason for the sub-linear scaling is the limited memory bandwidth between the CPUs and the main memory and not the MPI communication overhead.