View on GitHub

Concurrent Lock-free Binary Search Tree

Download this project as a .zip file Download this project as a tar.gz file

Summary

We are going to implement two variants of a synchronized binary search tree (BST) data structure on a shared memory system, and compare the performance of both. Specifically, we will implement a fine-grained locking version, and a lock-free version of the BST, compare the performance of both on a variety of traces. The BST itself will support insert, search and delete operations. The performance will be tested under a variety of conditions - different tree sizes, different workloads and different degrees of contention. The lock-free algorithm will use atomic primitives like compare and swap (CAS), test and set, etc.

Background

Multi-core, multi-processor architectures have become the norm these days. Hence, concurrent data structures are becoming increasingly important. Concurrent data structures can be accessed by multiple threads simultaneously. Lock-free programming is programming without locks. A lock-free implementation of a concurrent data structure guarantees that some thread can complete an operation in a finite number of steps regardless of the execution of other threads. Lock‑free programming uses atomic operations, such as atomic swap, test-and-set, fetch-and-add, compare-and-swap, load-link/store-conditional, etc., to maintain consistent state. Lock‑free programming can improve system throughput while avoiding issues with traditional locking based algorithms such as - deadlocks, livelocks, etc.

The Challenge

The primary goal of this project is to implement an efficient lock-free BST and an efficient fine-grain-locked BST, and to analyze the performance of both under various workloads. Naturally, this means that both our BST implementations will be modified by several threads in parallel, and so the challenge would be to ensure that both BST implementations have a final state that is consistent with some interleaving of the various operations performed on it by the various threads. Given the dependencies associated between various nodes in a BST, and the insert, search and delete operations we plan to perform on the tree, we anticipate synchronization to be a challenge, especially lock-free synchronization.

Resources

We plan to test our implementation on both the GHC cluster machines and latedays. The configuration of both are as follows:

We are starting from scratch on this project.

Goals and Deliverables

What we plan to achieve and deliver

  1. Documented source code for multi-threaded, fine-grained locking version of BST
  2. Documented source code for multi-threaded, lock-free version of BST
  3. Test harness code for performance comparison of both the above implementations

What we hope to achieve

If we find ourselves ahead of schedule we plan to scale up the project by implementing a lock free version for another (simpler) data structure.

Platform Choice

To test our implementation we need a shared memory system with high-core count so that we can vary the degree of contention by increasing and decreasing the number of threads accessing the BST. We are going to be using atomic primitives like CAS, test-and-set which are supported by almost all the Intel processors and so we can work with any of the GHC or latedays cluster machines.

Schedule

We are already behind schedule by almost 2 weeks as a result of our proposal change. We plan to engage in a 3 or 4 day coding marathon for the next few days to design our basic test harness, so that we can get back on track.

Authors and Contributors

Swapnil Pimpale, Romit Kudtarkar