====== Lecture 10 Scribe Notes ======
//by Derek Yang, Justin Lu, and Peter Peterson//
==== Pipe Buffer ====
Last lecture, we introduced the pipe buffer construct which we implemented as a circular array. The basic idea is to have two pointers, head and tail, which point to the first filled slot and the next empty slot respectively. These pointers are incremented with [[wp>modulo arithmetic]], meaning that, they loop back to zero at some interval instead of increasing infinitely. The size of the pipe buffer is usually a power of two, and in the following examples it will be eight (N = 8). (If the size wasn't a power of two, we'd have to worry about ''head'' and ''tail'' wrapping around from 2^32-1 to 0; but the way modular arithmetic works, if N is a power of 2, there are no problems!) The following is a simple implementation of the pipebuf structure, and its associated read/write functions.
struct pipebuf {
char buf[N];
unsigned head;
unsigned tail;
}
void writec(pipebuf_t *p, char c) {
p->buf[p->tail % N] = c;
p->tail++;
return;
}
void readc(pipebuf_t *p, char c) {
int c = p->buf[p->head % N];
p->head++;
return c;
}
This code, although simple, has many problems. The most apparent is that the read/write calls do not check for invalid conditions. Invalid conditions are operations in which read/write either cannot be done or do not make sense. For example, in this code read does not check to see if the buffer is empty (tail == head), and write does not check to see if the buffer is full (tail - head == N). Conditions like these that must always be true are called invariants.
==== Invariants ====
An invariant is a boolean instruction on a data structure that must be valid for it to be true.
In this case the invariant is that the difference between tail and head must be between 0 and N. (0 <= (p->tail - p->head) <= N)
void writec(pipebuf_t *p, char c) {
while (1) { //A
if ((p->tail - p->head) < N) { //B
p->buf[p->tail % N] = c; //C
p->tail++; //D
return; //E
}
}
}
void readc(pipebuf_t *p, char c) {
while (1) { //V
if (p->tail != p->head) { //W
int c = p->buf[p->head % N]; //X
p->head++; //Y
return c; //Z
}
}
}
In reality, invariants can be violated… as long as //nobody notices//! As a trivial example, in our original code, the head pointer could be atomically set to 1000 and then back to it's original value. This is clearly the violation of an invariant, but in this case it cannot affect the operation of the circular array.
1. writec(p,'A') \\
2. writec(p,'B') \\
3. writec(p,'C') \\
1. readc() \\
2. readc() \\
3. readc() \\
Let's say three threads try to write a character at the same time, then try to read from the pipe buffer later at the same time. If read end of the pipe is connected to standard out, the allowed outputs on the terminal would be all permutations of the characters ABC. Non-allowed outputs are AAA, A, AAB, SPLUNGE, or anything else. However, for our code the actual output will be different due to race conditions. The following are two examples of bad output due to race conditions.
**Case 1:**\\
^ 1 ^ 2 ^ 3 ^ 1st char ^ 2nd char ^ 3rd char ^ Tail (after operation) ^ Comment ^
| |A-C| |B | | | 1st |writes "B", but does not increment tail |
|A-E| | |A | | | 2nd |overwrites with letter "A", then increment tail and returns |
| |D-E| |A | | | 3rd |increment tail and return |
| | |A-E|A | |C | 3rd |writes "C" and increment tail |
If we call readc() three times sequentially, we will see the output "A C" on the screen. Letter "B" is not in the pipe buffer because it was overwritten with the letter A.
**Case 2:**\\
To show how we obtain the actual output on case 2, first we need to understand that line D is actually not atomic.\\
Line D can be split into the following three lines:\\
D~: x = p -> tail: \\
D`: x = x + 1; \\
D*: p -> tail = x; \\
^ 1 ^ 2 ^ 3 ^ 1st char ^ 2nd char ^ 3rd char ^ Tail (after operation) ^ Comment ^
| |A-C| |B | | | 1st |writes "B", but does not increment tail|
| | |A-C|C | | | 1st |overwrite "B" with "C", but does not increment tail|
|A-E| | |A | | | 2nd |overwrite "C" with "A", increment tail and returns|
| |D~ |D~ |A | | | 2nd |loads the same tail pointer into x|
| |D` |D` |A | | | 2nd |both increments x|
| |D* |D* |A | | | 3rd |both stores the same value back into x, tail pointer only incremented once|
| |E |E |A | | | 3rd |both return|
In this case, if we call readc() three times sequentially, we will obtain the output "A " on the screen, and third readc() will be blocked because the pipe buffer only contains two characters.
We conclude that this program has a serious synchronization issue. To solve this problem, we need to look for a critical section and put a lock around the critical section. So where are the critical sections?
==== Critical Sections ====
If we look at the code, we can see that line B is reading a variable that can be changed by other variable. To avoid false branching, line B has to be part of the critical section. Line C and D are also part of the critical section because these two lines are writing a value to a variable that is shared with the other threads. To create a critical section, we will call acquire() before the critical section and release() after the critical section. Note that we need to call release() inside the if //and// outside. Why do we need two release() calls? Imagine that we only have the first release call. If the buffer is full, the process will not run the code inside the if and will therefore skip the release call. Now the process will loop back and try to acquire a lock that it already has. This process will hold this lock forever and no one else will be able to acquire this lock. The following is the improved code with the addition of the lock.
mutex_t bank_lock;
void writec(pipebuf_t *p, char c) {
while (1) { //a
acquire(&bank_lock); // <= while(tas(p->lock),1) ;
if ((p->tail - p->head) < N) { //if buffer not full
p->buf[p->tail % N] = c;
p->tail++;
release(&bank_lock);
return;
}
release(&bank_lock);
}
}
The implementation of the lock in readc is similar to that of writec.
void readc(pipebuf_t *p, char c) {
while (1) {
acquire(&bank_lock);
if (p->tail != p->head) {
int c = p->buf[p->head % N];
p->head++;
release(&bank_lock);
return c;
}
release(&bank_lock);
}
}
Intuitively, we might think of the critical sections in in writec() and readc() as being one large critical section similar to our deposit and withdrawal example from previous lectures. However, in this case, we can use two critical sections because readc() and writec() touch disjoint portions of the state. writec() will only change tail and readc() will only change the head. writec() will only create a problem if the buffer is full, but readc() can only make the buffer more nonfull. Therefore it doesn't matter if writec() gets an older version of head, even if it is full. Similarly, readc() will only create a problem if buffer is empty, but writec() will only make it less empty. Therefore, it is safe to run readc() and writec() in parallel.
**ONE WRITER PRINCIPLE:** If each variable has only one writer, synchronization becomes easier.
//Note:// Even though it is not necessary to combine two critical sections into one, it is perfectly safe to combine them, however, it will not be minimal.
This lock implementation is a coarse grained lock. It is easy to use and saves space, but it has bad utilization, because many threads are spending a lot of time in the acquire operation. How can we make it fine grained? Well, we observe that each operation touches only one account. This means that two ''deposit'' operations will never conflict if they are touching different accounts. Therefore, we can implement finer-grained locking by having one lock per account.
typedef struct pipebuf {
unsigned head;
unsigned tail;
char buf[N];
mutex_t lock; // <=== NEW!
} pipebuf_t;
void writec(pipebuf_t *p, char c) {
while (1) { //a
acquire(&p->lock); // <= while(tas(p->lock),1) ;
if ((p->tail - p->head) < N) { //if buffer not full
p->buf[p->tail % N] = c;
p->tail++;
release(&p->lock);
return;
}
release(&p->lock);
}
}
void readc(pipebuf_t *p, char c) {
while (1) {
acquire(&p->lock);
if (p->tail != p->head) {
int c = p->buf[p->head % N];
p->head++;
release(&p->lock);
return c;
}
release(&p->lock);
}
}
This is all great, but we still have bad utilization. When we examine what's happening, we observe that many threads are constantly spinning in the ''acquire'' implementation, which has a ''while'' loop. These spinlocks are hurting us. Instead, we need the ability to block and schedule rather than spinning. This leads us to our next topic...
==== Blocking Mutexes ====
Utilization can be increased by avoiding spinning and instead using a blocking mutex. Therefore, we want an acquire operation that blocks until the mutex is available. By using a blocking mutex, other processes can run which means we don't waste precious CPU cycles and as a result have higher utilization of our computing resources.
Blocking mutexes are usually used for large / slow critical sections. In class, we have been discussing the atomicity of simple operations (like i++) because they are easy to conceptualize, examples are readily available, and common examples occur in threaded applications. Spinning for these locks is not a huge performance hit because the locks are only held for a few instructions. However, in the kernel, it is common to have large sections of code that run for seconds or more but must also execute atomically. If we were to spin during these sections, we would be wasting seconds of CPU time. Instead, we want to be able to do something useful while we wait, so we want to put the process to sleep and block rather than spin.
Blocking mutexes are simple in concept but implementations are non-trivial.
We need to change acquire and release for our new blocking mutex. Acquire will puts processes to sleep, and release wakes processes up.
In order to do this, we'll create a wait queue of struct wait objects. The head of the queue has a mutex lock to protect the shared state of the queue, and each element of the list has a process descriptor and a link to the next struct wait.
Here's some pseudocode:
Pseudocode:
struct wait {
procdescriptor_t *p;
struct wait *next;
} wait_t;
struct blockmutex {
unsigned val;
wait_t *waiting;
}
acquire(blockmutex_t *b) {
wait_t w;
w.p = current;
while (test_and_set(&b->val, 1)) {
w.next = b->waiting; // race
b->waiting = &w; // condition!
block();
}
}
release(blockmutex_t 8b) {
while(b->waiting) { // while there are waiting processes
wake(b->waiting->p); // wake it up
b->waiting=b->waiting->next; // next please!
}
b->val = 0; // the lock is unlocked!
}
Imagine three threads, 1, 2 and 3. Thread 1 tries to acquire the lock. It is currently unlocked, so thread 1 succeeds. Thread 2 now tries to acquire the lock, but it is unavailable, so thread 2 adds itself to the wait queue. When thread 3 tries to acquire the lock, it will add itself to the queue after thread 2. Threads 2 and 3 now block because the resource they need is unavailable. When thread 1 calls release, it unblocks threads 3 and 2, it sets the mutex to 0, and either thread 2 or 3 will be able to acquire the lock. By this method, threads 2 and 3 do not have to continually check to see if they can acquire the lock -- instead they trust that the system will wake them up when they have a chance of acquiring the resource.
Unfortunately, there are race conditions in the above code. By now it should be easy to conceptualize why locking is important and how code that seems simple enough (e.g., i++) is not inherently atomic. What is also true (and maybe unexpected) is that acquiring and releasing locks must also be atomic. Otherwise, the same problems we have interleaving deposits and withdrawals can happen when interleaving locks! Making lock acquisition and release atomic is not always easy.
One problem in the above code is that the waiting structure is not protected by locks -- adding and removing elements from that queue must be atomic -- therefore the queue needs it's own critical section.
Another problem is a larger issue known as the "sleep / wake up race".
===== Sleep / Wake Up Race =====
What happens if thread 2 acquires the lock exactly when thread 1 releases it?
Imagine that thread 2 takes the first steps towards putting itself on the wait queue, but before it can safely get on the wait queue and go to sleep, thread 1 wakes up all processes (and subsequently clears the wait queue). Thread 2 will then pick up where it left off and finish adding itself to the empty wait queue (after being woken up for no reason). But what's worse is that because thread 1 has just cleared the queue it will //never wake up the queue again//. Thread 2 is now permanently blocked and forgotten. It's like if you put yourself in cryogenic storage but forgot to tell anyone to wake you up.
To avoid this catastrophe, thread 2 should get ready to sleep by first putting itself on the wait queue and then setting its status to blocked and **only then** actually going to sleep by calling schedule. This way, if the wait queue is woken up before thread 2 is fully "asleep", it will simply wake up and continue running instead of being permanently put to sleep. This avoids the sleep / wakeup race.
New pseudocode:
struct wait {
procdescriptor_t *p;
struct wait *next;
} wait_t;
struct blockmutex_t {
unsigned val;
wait_t * waiting;
mutex_t waitlock; // a normal spinlock, protecting small critical sections
};
acquire(blockmutex_t *b) {
wait.t w;
w.p = current;
while (1) {
acquire(&b->waitlock);
if (b->val == 0) { // if i can acquire the lock
b-val = 1; // grab it
release(&b->waitlock);
return;
}
w.next = b->waiting; // otherwise, add myself to the queue
b->waiting = &w;
p->state = BLOCKED; // set my state to BLOCKED for the next time schedule is called
release(&b->waitlock);
schedule(); // now go and schedule
}
}
release(blockmutex_t *b) {
acquire(&b->waitlock);
while (b->waiting) { // while there are items on the queue
b->waiting->state = RUNNABLE; // set a process to runnable
b->waiting = b->waiting->next; // next, please!
}
b->val = 0; // we're now unlocked
release(&b->waitlock);
}
In this code, we've added a separate spinlock to our blockmutex_t to protect the waiting list to ensure that if a process begins going to sleep, it will not be interrupted by another process trying to access the waiting list. It's also acceptable that this is a spinlock because it is such a small section of code. We must also protect acquire with a lock -- This ensures that a process will not be put to sleep and never woken up.
This is great, since everything is protected by internal locks, you can't interleave acquire and release so thread 2 can't be interrupted when putting itself to sleep. But will all this fancy footwork really work? What happens if we try to make the sleep / wake up race happen by releasing the lock immediately before blocking? Fortunately for thread 2, release will wake up thread 2 just before it would have gone to sleep and it will simply run again. This is exactly the behavior we desire, because thread 2 //should// be run instead of sleeping if thread 1 has indeed released the lock.
The "preparations" in our acquire correspond to the prepare_to_wait() function in Linux. The wait queue is a series of linked wait structures, hanging off the lock. prepare_to_wait gets everything ready, and declarewait is what actually makes the process wait.
==== Condition Variables ====
Going back to our circular array example, imagine many threads reading and writing to the pipe buffer. We don't have high utilization, because our circular array spins instead of blocks. When the buffer is full, it will keep checking and waiting... and wasting a lot of CPU time reading status that is not likely to change. Read will also spin when the buffer is empty. While our original code is a safe and correct implementation, it exhibits poor utilization because the CPU spends a lot of time "busy waiting".
Notionally, for read we’d like to do something like, “block until not empty” and for write something like "block until not full". Our utilization problem is that read and write spin wait for some condition (e.g. tail != head). We want to remove the spinning and block in order to increase utilization. In other words, we want to block on a condition. We'd love to be able to do this in hardware, but it is difficult or impossible. To do this we have to solve it in software by changing the state of the condition to an object we can test on, like a lock. This synchronization object is called a “condition variable”.
Condition variables have two operations: wait and notify.
The wait operation says, “put me to sleep until notify happens.” The notify operation says, “wake up everyone that’s on notify.” However, in practice, we have to make sure that we can do this in a non “sleep / wake up race” fashion.
To do this, wait takes a condition variable and a mutex.
Pseudocode:
wait (condvar_t *c, mutex_t *m) {
release(m); // these two lines are
block until someone calls notify(c); // performed atomically
acquire(m); // only after notify has been called
}
notify(condvar_t *c) {
wakeup any thread blocked on wait(c);
}
These are "wake up all" semantics -- everything is woken up. (There are other variants, such as "wake up one".)
Ok -- now that we have it, how can we use it? Our two conditions are: "is the buffer full?" and "is the buffer empty?" These become our two condition variables nonfull and nonempty.
struct pipebuf {
char buf[N];
unsigned head;
unsigned tail;
blockmutex_t lock;
condvar_t nonfull;
condvar_t nonempty;
}
write(p,char c) {
acquire(&p->lock);
while (1) {
if ((p->tail - p->head) < N) { // is the buffer empty?
p->buf[p->tail % N] = c; // hey, now we can add some stuff...
p->tail++;
release(&p->lock);
notify(&p->nonempty); // notify readers that p is no longer empty
return;
}
wait(&p->nonfull, &p->mutex); // else it was full: wait until p is not full
}
}
void read(pipebuf_t *p, char c) {
acquire(&p->lock);
while (1) {
if (p->tail != p->head) { // if the buffer is not full
int c = p->buf[p->head % N];
p->head++;
release(p->lock);
notify(&p->nonfull) // hey, the buffer is not full
return c;
}
wait(&p->nonempty, &p->lock); // notify readers its not empty
}
}
Now, when the state of a condition variable changes, all the processes that are waiting on some condition will be notified so that they can wake up and try to obtain the resource.
(As a side note, there are often loops in synchronization operations to make sure that by the time we come back around to read/write, the conditions have not changed.)
==== Deadlock (X_X) ====
Bill Gates says that he wants word to have an insane operation on pipe buffers – swapping the contents of two bounded buffers:
swap(pipebuf_t *p, pipebuf_t *q) { // swaps contents atomically
A acquire(&p->lock);
B acquire(&q->lock);
C p->head <==> q->head;
D p->tail <==> q->tail;
E p->buf <==> q->buf;
F release(&p->lock);
G release(&q->lock);
}
It //seems// like this swap happens with the appropriate locking -- so what's the problem?
In a single threaded environment, this is ok. But in a multithreaded environment, this could be a serious problem: Imagine if thread 1 calls swap(x,y) and another calls swap(y,x)! This could be a deadlock -- a permanent state where a process is waiting for itself, or waiting for a sequence of processes that are in the end waiting on the first process. This is called circular wait.
Thread 1 executes instruction A, then thread 2 executes A – each thread is now waiting for the other to release. The system is in a state where some set of processes will never make progress -- a deadlock.
There are 4 conditions for deadlock:\\
1. mutual exclusion - things may not be in same chunk of code at the same time \\
2. hold and wait - thread may block/spin on an acquire operation while holding a lock \\
3. no lock preemption - a thread can only release a lock voluntarily \\
4. circular wait - e.g. a is waiting for b which is waiting for c which is waiting for a eliminate any of the above to solve problem, #4 is easiest.
The solution (in this case) is for the complete swap to be in a single lock so that there cannot be any circular wait.
How do we find the circular wait in our code? Draw a waits-for graph where we draw (threads) and [locks], diagramming which threads are waiting for which locks.