I have something of an unfortunate fondness for indefensible hacks.
Like I discussed in my last post, RCU is a synchronization mechanism that excels at protecting read mostly data. It is a particularly useful technique in operating system kernels because full control of the scheduler permits many fairly simple and very efficient implementations of RCU.
In userspace, the situation is trickier, but still manageable. Mathieu Desnoyers and Paul E. McKenney have built a Userspace RCU library that contains a number of different implementations of userspace RCU. For reasons I won’t get into, efficient read side performance in userspace seems to depend on having a way for a writer to force all of the reader threads to issue a memory barrier. The URCU library has one version that does this using standard primitives: it sends signals to all other threads; in their signal handlers the other threads issue barriers and indicate so; the caller waits until every thread has done so. This is very heavyweight and inefficient because it requires running all of the threads in the process, even those that aren’t currently executing! Any thread that isn’t scheduled now has no reason to execute a barrier: it will execute one as part of getting rescheduled. Mathieu Desnoyers attempted to address this by adding a membarrier() system call to Linux that would force barriers in all other running threads in the process; after more than a dozen posted patches to LKML and a lot of back and forth, it got silently dropped.
While pondering this dilemma I thought of another way to force other threads to issue a barrier: by modifying the page table in a way that would force an invalidation of the Translation Lookaside Buffer (TLB) that caches page table entries! This can be done pretty easily with mprotect or munmap.
Full details in the patch commit message.