Inter-Thread Communication using Lock-Free Algorithms

In his blog Martin Fowler discusses the LMAX architecture, a high throughput retail financial trading platform. A remarkable detail of the architecture is the central business logic processor, which is implemented as a single threaded Java program. The supporting pre- and post-processing is running as a multi threaded application using lock-free ring buffers.

In general lock-free algorithms are implemented using atomic read-modify-write primitives which are provided by modern CPUs. Probably the most used lock-free algorithm in the boost library is the reference counting in boost::shared_ptr. On popular hardware platforms these reference count updates are based on atomic increments/decrements usingĀ  compare-and-swap (CAS) operations. The performance improvements over normal mutex exclusion locks are significant as the following little experiment shows. Basis for this experiment is a function which increments an integer counter 500 million times in a loop [1]. This loop is carried out in

  • simple single threaded loop
  • single thread loop using atomic increments
  • single thread loop acquiring a mutex exclusion lock for every pass.
  • two threads using atomic increments
  • two threads using mutex exclusion locks

A Java and the C++ implementation is available here. The Java code uses the package java.util.concurrent.atomic and the C++ code uses boost::detail::atomic_count to implement the atomic increments. The run times are measured on a Core i3@3GHz.

Lock-free algorithms are difficult to implement and to debug. Tim Blechmann has written a little gem library boost::lockfree (Parts of the library are now in the boost release 1.53.0). Among others this library contains a wait-free single-producer/single-consumer ring buffer. This ring buffer is e.g. tailor-made to separate the Monte-Carlo path generation from the pricing of a derivate. With a few lines of code the path generation can then run in a different thread.

The code available here contains a BufferedDataFactory based on the boost::lockfree::ringbuffer class and a slightly modified version of the QuantLib test case HybridHestonHullWhiteProcessTest::testZeroBondPricing. In this version of the test case the path generation is running in a separated thread and uses the BufferedDataFactory to hand over the data to the pricing thread.

[1] Disruptor: High performance alternative to bounded queues for exchanging data between concurrent threads.