<div dir="ltr"><font face="monospace, monospace">As a part of our ongoing effort to improve the reliability and robustness of GlusterD we are also targeting concurrency related issues, and this proposal is with regard to the mentioned issue.<br><br>The Big-lock<br>------------<br><br>GlusterD was originally designed as a single threaded application which could handle just one transaction at a time. It was made multi-threaded to improve responsiveness and support handling multiple transactions at a time. This was needed for newer features like volume snapshots which could leave GlusterD unresponsive for some periods of time.<br><br>Making GlusterD multi-threaded required the creation of a thread synchronization mechanism, to protect the shared data-structures (mainly everything under the GlusterD configuration, glusterd_conf_t struct) from concurrent access from multiple threads. This was accomplished using the Big-lock.<br><br>The Big-lock is an exclusive lock, so any threads which needs to use the protected data need to obtain the Big-lock and give up the Big-lock once done.<br><br><br>Problem with Big-lock<br>---------------------<br><br>The Big-lock synchronization solution was added into the GlusterD code to solve problems that arose when GlusterD was made multi-threaded. This was supposed to be a quick solution, to allow GlusterD to be shipped.<br><br>Big-lock as the name suggests, is a coarse grained lock. The coarseness of the lock leads to threads contending even when they are accessing unrelated data, which lead to some deadlocks.<br><br>One example of this deadlock is with transactions and RPC. If a thread holding the Big-lock blocked on network I/O it may result in a deadlock. This could happen when the remote endpoint is disconnected. The callback code would be executed in the same thread that has acquired the Big-lock. All network I/O handlers, including callbacks, are implemented to acquire the Big-lock before executing. From the above two, we have a deadlock.<br><br>To avoid this, we release the Big-lock whenever a thread could block on network I/O. This comes with a price. This opens up a window of time when the shared data structures are prone to updates leading to inconsistencies.<br><br>The Big-lock, in its current state, doesn’t even fully satisfy the problem it set out to solve, and has more problems on top of that. These problems are only going to grow with new features and new code being added to GlusterD.<br><br><br>Possible solutions<br>------------------<br><br>The most obvious solution would be to split up the Big-lock into more fine grained locks. We could go one step further and use replace the mutex locks (Big-lock is a mutex lock), with readers-writer locks. This will bring in more flexibility and fine grained control, at the cost of additional overheads mainly in the complexity of implementation.<br><br>As an alternative to readers-writer locks, we propose to use RCU as the synchronization mechanism. RCU provides several advantages above readers-writer locks while providing similar synchronization features. These advantages make it more preferable to readers-writer locks, even though the implementation complexity remains nearly the same for both approaches.<br><br><br>RCU<br>---<br><br>RCU, short for Read-Copy-Update, is a synchronization mechanism that can be used as an alternative to reader-writer locks.<br><br>A good introduction to RCU can be found in this series of articles on LWN [1][1] and [2][2]. The articles are with respect to the usage of RCU in the Linux kernel, where it is used heavily.<br><br>The advantages that make RCU preferable to RWlocks are the following,<br><br>- Wait free reads<br>  RCU readers have no wait overhead. They can never be blocked by writers. RCU readers need to notify when they are in their critical sections, but this notification is much lighter than locks.<br><br>- Provides existence guarantees<br>  RCU guarantees that RCU protected data in a readers critical section will remain in existence till the end of the critical section. This is achieved by having the writers work on a copy of the data, instead of using the existing data.<br><br>- Concurrent readers and writers<br>  Wait-free reads and the existence guarantee mean that it is possible to have readers and writers in concurrent execution. Any readers in execution, before a writer starts will continue working with the original copy of the data. The writer will work on a copy, and will use RCU methods to swap/replace original data without affecting existing readers. Any readers coming online after the writer will see the new data.<br>  This does mean that some readers will continue to work with stale data, but this isn&#39;t too big a problem as the data at least remains consistent till the reader finishes.<br><br>- Read-side deadlock immunity<br>  RCU readers always run in a deterministic time as they never block. This means that they can never become a part of a deadlock.<br><br>- No writer starvation<br>  As RCU readers don&#39;t block, writers can never starve.<br><br><br>### Userspace RCU<br><br>The kernel uses features provided by the processor to implement its RCU.  Userspace applications cannot make use of these features, but instead can use the Userspace RCU library.<br><br>liburcu [3][3] provides a userspace implementation of RCU, which is portable across multiple platforms and operating systems. liburcu also provides some common data structures and RCU protected APIs to use them.<br><br>An introduction to URCU and its APIs can be found in this article on LWN [4][4].<br><br><br>Proposed implementation<br>-----------------------<br><br>&gt; NOTE: This is still a high level concept. We haven&#39;t yet gotten into the details of the implementation.<br><br>The Big-lock is currently used mainly to protect access to the various configuration lists in GlusterD, including peers list, volumes list, bricks lists and snapshots list. These lists currently use the list API provided by libglusterfs.<br><br>For the initial implementation we will be replacing these lists with the RCU protected list data structures and APIs provided by liburcu. If implemented correctly, this should in itself solve a majority of the problems we have.  After this first change, we&#39;ll continue on to protect other data structures in GlusterD with RCU.<br><br>If everything goes well, we hope to make RCU potentially a part of the GlusterFS library and use it elsewhere in our codebase.<br><br>We are prototyping the implementation using the bullet-proof flavour of liburcu [5][5]. We&#39;ll share our findings shortly.<br><br>### Open issues<br><br>1. Availability of liburcu on different distributions and flavours of Unix.<br>2. Choice of liburcu flavour for the main implementation.<br><br>[1]: <a href="https://lwn.net/Articles/262464/">https://lwn.net/Articles/262464/</a> &quot;What is RCU, fundamentally?&quot;<br>[2]: <a href="https://lwn.net/Articles/263130/">https://lwn.net/Articles/263130/</a> &quot;What is RCU? Part 2: Usage&quot;<br>[3]: <a href="http://urcu.so/">http://urcu.so/</a> &quot;Userspace RCU&quot;<br>[4]: <a href="https://lwn.net/Articles/573424/">https://lwn.net/Articles/573424/</a> &quot;User-space RCU&quot;<br>[5]: <a href="https://lwn.net/Articles/573424/#URCU%20Flavors">https://lwn.net/Articles/573424/#URCU%20Flavors</a> &quot;URCU flavours&quot;</font></div>