By Grace Shih, Alex Wu, Josephine Chen
Observability is a measure of the global state of a system. A system is said to be observable if its current state can be determined using globally-visible outputs. Observable actions change the global state.
Thus, on Unix, ways to change and observe the global state are:
Note that these are global with respect to a single process.
T1: T2:
0 x = 0; /* global variable, this assignment happens before the threads run */
----------------------------------------------------------
1 x += 5; A x += 5;
2 printf("%d\n", x); B printf("%d\n", x);
Assume that each line of code above executes atomically. What are the possible outputs for these threads running in parallel? It turns out there are 6 possible orders: 12AB, 1A2B, 1AB2, AB12, A1B2, and A12B. These orders give rise to two possible outputs.
5 10
This is the output for orders 12AB and AB12.
10 10
This is the output for orders 1A2B, 1AB2, A1B2, and A12B.
If we make the example a bit more realistic, and consider how printf() is implemented, we can add some possible outputs. The single printf("%d\n", x); function call breaks into a number of sub-steps:
2.1 movl x, %eax 2.2 pushl %eax 2.3 pushl "%d\n" # address of string 2.4 call printf 2.5 (in printf:) create string buffer from arguments (many steps) 2.6 (in printf:) call write() system call 2.7 (in printf:) return
Assume these sub-steps are executed atomically, but line 2 itself might not be. Then we achieve one more output:
10 5
| T1 | T2 | x | Output |
|---|---|---|---|
| 1 | 5 | ||
| 2.1-2.3 | 5 | ||
| A | 10 | ||
| B.1-B.7 | 10 | 10 |
|
| 2.4-2.7 | 10 | 5 |
But no other output sequence is possible. That means that any other output is a failure. For example,
5 5
would represent a synchronization failure. We've seen such a failure before: it is due to the fact that x += 5; is implemented in multiple instructions.
The system calls open() and close() might be implemented using code like the following:
int open__find_available_fd(proc_t *p) {
1: int i;
2: for (i = 0; i < MAXFD; i++) {
3: acquire(&p->fdtable_mutex);
4: if (p->fdtable[i] == 0) { /* this slot in the fdtable is not used */
5: p->fdtable[i] = 1;
6: release(&p->fdtable_mutex);
7: return i;
8: }
9: release(&p->fdtable_mutex);
10: }
11: return -1;
}
void close__fd(proc_t *p, int fd) {
1: acquire(&p->fdtable_mutex);
2: p->fdtable[fd] = 0;
3: release(&p->fdtable_mutex);
}
Last lecture, the professor argued that the open() implementation was observably different from what was desired, because the open() mutex locks were used to protect individual file descriptors. This meant that another thread in the same process could call close(), making a file descriptor available, but open() would not notice. This was his claim, let's evaluate it!
Let T1 and T2 be:
T1: T2:
int t = 0; //local variable
if (open(file) < 0) close(0);
t += 2;
Suppose open(file) returns -1 and an error code ENFILE, which means there are no more file descriptors available. But say that close(0) returns in absolute time before open() does. Is the open() implementation observable? Not for this program, which uses only local variables -- that is, non-global state. So T2's close could happen before or after the open, and no one could tell.
Now let T1 and T2 be:
T1: T2:
0 x = 0; //global variable
-------------------------------------------------------
1 x = open(file); A if (x == 0) {
2 printf("%d\n", x); B close(0);
C printf("c\n");
D }
Assume that every line of code executes indivisibly. Then the bad output here is "c, -1": that is, a file descriptor got closed, and T1 got an error when it tried to open. Such an output would not be possible, again assuming that every line executes indivisbly. Let's look at some cases to see what would be possible.
If the execution order is:
| T1 | T2 | Output |
|---|---|---|
| 0 | ||
| A | ||
| B | ||
| C | c |
|
| D | ||
| 1 | ||
| 2 | 0 |
open() was successful so the expected output would be:
c
0
If the execution order is:
| T1 | T2 | Output |
|---|---|---|
| 0 | ||
| A | ||
| D | ||
| 1 | ||
| 2 | -1 |
T2 never executes B and C because x is -1 in this case.
Here, open() failed so the expected output would be:
-1
But again, assuming every line of code is executed indivisibly, we cannot see c, -1. The job of our system call implementation is to provide the illusion of indivisibility. That is, no matter what interleaving of instructions we choose inside the system call implementations, we should see either c 0 or -1. Let's look at what the implementation actually does. Can we get a different output? Yes, if we execute lines like this:
| T1 | T2 | Output |
|---|---|---|
| 0 | ||
| 1.1-1.10 | Thread 1 executes the for () loop in the kernel's open implementation, but stops before returning |
|
| A | ||
| B | ||
| C | c |
|
| D | ||
| 1.11 | This step assigns "x" to the return value, -1 | |
| 2 | -1 |
We've observed an unacceptable output.
But is this difference important? If we again consider how user code is actually executed, then NO, it is not observably different! Line 1, the statement x = open();, will not execute indivisibly on a real computer, even if the system call appears to execute indivisibly. This is because system calls do not return their values atomically to global variables, like x. In fact, the statement x = open(); will be executed something like this, in two distinct steps:
... set up arguments for open ...
1 int $48 # system call; return value will be stored in %eax register
1X movl %eax, x # moves return value to memory
Line 1, which changes only a thread-local value (a register), does not change global state. Thus, a failed call to open() changes no global state -- it leaves the fdtable untouched and returns its value in a thread-local variable -- and cannot possibly be observable. (A student mentioned errno, a global variable that says what error occurred in the system call, as potentially observable global state, but in fact errno is thread-local too.) So with this expansion, one can achieve c -1 output even with atomic system calls:
| T1 | T2 | Output |
|---|---|---|
| 0 | ||
| 1 | ||
| A | ||
| B | ||
| C | c |
|
| D | ||
| 1X | This step assigns "x" to the return value, -1 | |
| 2 | -1 |
Thus, our implementation is not observably different from an atomic open, given these properties.
We can actually make our implementation observably different by adding new types of observation. For example, a system call uint32_t nsyscalls(void) that returns the number of system calls that have completed for this process (i.e. the system call has effectively returned, even if the return value hasn't been given back to the process yet -- not counting any nsyscalls system calls). This "number of system calls" becomes a piece of global state that can be observed.
Let T1 and T2 be:
T1: T2:
0 x = nsyscalls();
------------------------------------------------------------------------
1 int t = open(file); A if (x == nsyscalls()) {
2 printf("%d\n", t); B close(0);
C printf("c %d\n", nsyscalls());
D }
If system calls execute atomically/indivisibly, we would expect to see one of the following types of output. Each output is paired with the sequence of steps that can generate it. Assume that line 0 sets x to 0 (because before that line, the process has executed no system calls), and assume that printf() doesn't count as a system call.
| Output | Steps | |
-1 | 1, 2, A, D | |
c 1, 0 | A, B, C, D, 1, 2 | The close() system call increments the nsyscalls() counter, so T2 prints "c 1". |
0, c 1 | A, B, 1, 2, C, D | |
c 2, 0 | A, B, 1, C, D, 2 | Both open() and close() increment the nsyscalls() counter. |
0, c 2 | A, B, 1, 2, C, D | |
-1, c 2 | A, 1, 2, B, C, D | The open() returns before the close() can occur, but the nsyscalls() counter must see both. |
c 2, -1 | A, 1, B, C, D, 2 |
But there is one type of output that cannot happen:
c 1, -1 |
If c 1 is printed, then only one system call has returned by the time of line C. That single system call must have been close(), since it precedes the printf(). Thus, open() has not happened yet. We are assuming that open() happens atomically, so it must happen after close(): it will see the empty file descriptor and return 0.
But c 1, -1 is possible for our implementation. The following execution order would print exactly that output.
| T1 | T2 | Output |
|---|---|---|
| 1 | ||
| 2.1-2.10 | ||
| A | ||
| B | ||
| C | c 1 The nsyscalls() counter will return 1, since open() has not yet returned! |
|
| D | ||
| 2.11 | ||
| 3 | -1 |
Since we introduced an event count, the nsyscalls() counter, an open that fails has now become an observable change to a global state.
Deadlock is the condition where one or more thread can never make progress because each thread is waiting for a resource held exclusively by another thread.
Here are the four well-known necessary and sufficient conditions for deadlock.
Wait-for Graphs are used to detect circular wait
locks/resources
threads
To indicate that a thread is waiting to acquire a lock:

To indicate that a lock is held by a thread:

When there is a loop in the wait-for graph, we have detected circular wait. This means that there is a chain of threads and resources T1, R1, T2, R2, ..., Tn, Rn where T1 wants to acquire R1, which is held by T2, which wants to acquire R2, which ... is held by Tn, which wants to acquire Rn, which is held by T1. For example:

Lock ordering is a technique used to avoid deadlock when threads use multiple locks.
To create lock ordering:
This breaks circular wait, since there's no sequence of locks lock1...lockn where order(lock1) > order(lock2) > ... > order(lockn) > order(lock1).
So far we have been creating mutexes that are waited upon by polling. Polling wastes cpu time so a blocking mechanism is desired.
Specifications for a blocking mutex:
typedef struct bmutex {
mutex_t l;
int locked;
proc_t blocked_list[];
} bmutex_t;
void acquire(bmutex_t *l) {
1: while (1) {
2: acquire(l->l); /* normal, polling acquire! (spinlock) */
3: current process state is blocked;
4: add current process to l->blocked_list[];
5: if (!l->locked) {
6: l->locked = 1;
7: set current state to runnable;
8: remove current process from l->blocked_list[];
9: release(l->l);
10: return;
11: } else {
12: release(l->l);
13: schedule(); /* will block this process, UNLESS its state was set to runnable already */
14: }
15: }
}
void release(bmutex_t *l) {
acquire(l->l);
set all l->blocked_list[] processes to runnable;
l->locked = 0;
release(l->l);
}
This code avoids the sleep/wakeup race by setting a process's state to blocked while the mutex is held that protects the blocked_list[]. Thus, even if another thread calls release() between the acquiring thread's executing lines 12 and 13, then the acquiring thread will be on the blocked_list[], and set to runnable by release(). (Note that some other line orders can work, but it is important to have the thread set to blocked and placed on the blocked_list[] before releasing the l->l spinlock mutex.)
Semaphores are locking/synchronization mechanisms which can be used to derive all other mechanisms. The semaphore was created by Edger Dijkstra and is considered the original blocking wait locking mechanisms.
typedef struct {
int s;
} semaphore_t;
// to acquire lock
void P(semaphore_t *s) {
while (s->s == 0)
block;
s->s--;
}
// to release lock
void V(semaphore_t *s) {
s->s++;
}
// implementing a blocking mutex with a semaphore
typedef struct {
semaphore_t s; /* initially 1 */
} bmutex_t;
void acquire(bmutex_t *m) {
P(m->s);
}
void release(bmutex_t *m) {
V(m->s);
}
The semaphore is locked when it equals 0 and unlocked when it is greater than 0. This allows semaphores to be friendly mutexes by allowing the lock be acquired more than once. Start the semaphore out at N if you want the lock to be availible for acquisition N times concurrently.