11 Feb 2011

How to Bootstrap a Threading API

This article is not intended to be a full-blown tutorial. It is meant to be a relatively concise set of notes, quick reference, with some rehearsal to get the reader's mind to the "track".

Just like any Boolean logic operation (read: digital circuit) can be assembled from Boolean inversion and conjunction, all of the threading API, except the pausing and awakening of sleeping threads, can be assembled if a thread-blocking sleep function exists and a mechanism for forking threads exists.

Once the sleep and fork exist, the rest is all subject to subjective taste, except the pausing and awakening of sleeping threads. The remaining part of this article is just one, somewhat inefficient and largely improvable, version of "the rest".

To create a threading API that does not support the pausing and awakening of sleeping threads I add 2 things:

1) critical sections
2) thread collision detection

The critical sections are used for maintaining consistency of the values of a set of variables.

For the sake of rehearsal, or getting "in the mood": if the set of variables consists of an organism type and its coating, let's say, organism==giraffe, coating==fur, then if the set gets updated to a parrot, i.e. first organism:=parrot, and after that coating:=feathers, then before the value of the coating is updated, there's a situation, where organism==parrot, but the coating==fur. It's better that no thread reads the inconsistent values of the pair, (organism,coating), before the whole pair has been updated and that's where the critical sections become useful.

Thread collision detection is for handling edge-situations, where no thread has yet locked the critical section, but more than one thread wants to enter it.

To allow the capabilities of multi-CPU hardware to be leveraged as much as possible, one should avoid putting the threads to a wait, idle, state. To minimize the waiting time of threads, one wants to maximize the number of threads that can concurrently access the shared set of variables, i.e. one wants to allow as many threads to concurrently enter the critical sections as possible. One way to do that is to use 2 types of critical sections:

A) critical section for only reading
B) critical section for reading and writing


The idea is that it's OK for more than one thread to concurrently read the same set of shared variables, as long as there's no thread writing to the set of shared variables at that very same moment. On the other hand, no thread should be reading the set of shared variables while some other thread is writing to them. Neither should there be more than one thread concurrently writing to the set of shared variables.

At the implementation level it means that each set of the shared variables has 2 locks: one set and released by the read-only critical sections and the other set and released by the read-write critical sections. Both of the critical section implementations read both of the locks.

One possible way to implement the thread collision detection is like this:

b_done=false
while(!b_done)
        i_lock++
        if(i_lock==1){
                enter critical section
                i_lock--
                b_done=true
        } else {
                i_lock--
                sleep some semi-random time
        } // else
} // while

The sleeping time of different blocked threads has to differ, preferably change at every iteration, because otherwise they might collide with each other at every iteration, resulting an infinite loop. One way to assure the sleeping time difference is to use some semi-random numbers. This entails that a programming language can have guite functional threading support without having a statistically proper rand function.