You are expected to understand this. CS 111 Operating Systems Principles, Fall 2007
You are here: CS111: [[notes:lec9]]
 
 
 

Lecture 9 Scribe Notes

by Nazia Habib, Jeffrey Tan, and Michael Vysin

Synchronization II

Pipe Buffer

Consider a process P2 that reads what another process P1 writes.

Since this pipe can only handle one transfer at a time, the following situation will occur:

  1. P1 blocks until P2 reads from it.
  2. When P2 reads, it copies characters from P1’s memory into its memory.
  3. P1 restarts when its characters are copied by P2.

This approach works, but forcing P1 to block until P2 reads from it provides low utilization, as P1 could be writing even as P2 performs other tasks.

How can we improve utilization?

We can allow P1 to write its data into a buffer and let P2 read the data from the buffer.

Buffering improves utilization by reducing the time required for I/O operations. It reduces per-request overhead by combining smaller requests into larger ones. It results in less system call and synchronization overhead.

A typical pipe buffer works as follows:

  1. A process can copy characters into the write end of the pipe buffer until the buffer is full.
  2. When the buffer is full, the writing process will be blocked until space becomes available again.
  3. Another process can copy characters from the buffer's read end until the buffer is empty.
  4. When the buffer is empty, the reading process will be blocked until characters become available to read again.

Pipe buffers in Linux are set to a limited size and are called bounded buffers.

How can we achieve synchronization of pipe reads and writes? Applications' read and write system calls are synchronized by the kernel, so we require sequential consistency at the level of system calls.

Let's look at the old way of getting kernel-level sequential consistency. Back in the day, when a process made a system call, the kernel:

  • turned off interrupts;
  • locked out all other processes (so only one processor could run kernel code at a time);
  • turned interrupts back on and unlocked processes only when the syscall completed or blocked.

The above method is called coarse-grained locking. It implements mutual exclusion for groups of code much larger than the minimal sufficient critical sections. It is relatively easy to implement, but not very efficient. When this approach was used on Linux, this big lock was known as the "Big Kernel Lock". A more efficient, but also more difficult, method is fine-grained locking, which implements mutual exclusion much closer to the minimal critical sections.

Bounded Buffer

A bounded buffer is a finite-size array used to store data for processes during both read and writes. A simple implementation of a bounded buffer is as follows,

#define N 8
 
typedef struct {
   char buf[N]; /* N = 8 */
   unsigned head;
   unsigned tail;
   spinlock_t lock;
} pipebuf_t;
 

Here, we have a character array to store data, a head and tail pointer to indicate the "bounds" of our buffer, and of course, a synchronization mechanism: a spinlock. We could also split up the spinlock into a write lock and a read lock. If we were to do this, we could have more throughput, becuase reads and writes must only be synchronized with operations of the same kind. Even though we are dealing with just an array of characters, the actual implementation assumes that the array is circular. What is special about a circular data-array is that the head and tail pointers can loop around the array, sometimes even meeting at a specific index. A visual representation is as follows,

What is special about the circular buffer is that the head and tail pointers move along the circle and can loop around to the other side of the buffer. This leads to a few special cases, namely...

  1. head = tail (this means the buffer is currently empty)
  2. head = tail + 8 (this means the buffer is full. '8' is used here as the size of the buffer)

Notice that if head and tail should ever overflow 32 bits (4 Gigabytes), the behaviour will still be correct, becuase the size of the buffer is a power of two. Simple read and write operations for these types of buffers will also have their special problems to deal with. For example,

// This implementation is WRONG!
 
void writec(pipebuf_t *p, char c) {
   while (1) {
       if (p->tail - p->head < N) {
           p->buf[p->tail % N] = c;
           p->tail++;
           return;
       }
   }
}

This polling implementation of a write-to-buffer operation is incorrect. If we inspect the function closer, we realize that there are many synchronization errors that could occur on any of the lines. To fix this, we use our trusty spinlocks like so,

void writec(pipebuf_t *p, char c) {
   while (1) {
       acquire(&p->lock);  // Acquire a write lock on the bounded buffer
       if (p->tail - p->head < N) {
           p->buf[p->tail % N] = c;
           p->tail++;
           release(&p->lock); // We must ALWAYS release the lock before we 
                              // exit the function (to prevent deadlocks)
           return;
       }
       release(&p->lock); // Release here so other processes can have a chance at running.
   }
}

This code now shows correct handling of critical sections. These critical sections are namely any part of the code that reads or modifies a variable that is also accessible to other processes. In the above example, everything inside the while(1) loop is considered critical. Another example using spinlocks is the corresponding readc operation for the buffer.

char readc(pipebuf_t *p) {
   while (1) {
       acquire(&p->lock);                  // Begin Critical Section
       if (p->tail != p->head) {
           char c = p->buf[p->head % N];
           p->head++;
           release(&p->lock);
           return c;
       }                                   // End Critical Section
       release(&p->lock);
   }
}

These simple examples give rise to a fundamental rule of thumb known as...

One Writer Rule - Synchronization is easier if each variable only has one writer.

Great! We have a basic implementation of a spinlocking read/write buffer but we need to address a few problems with this code.

Two problems that can occur

  1. Polling/Spinning Implementation (we should block when waiting for the lock)
    • Our current code wastes CPU cycles in the while loop, when we should block.
  2. Starvation

Starvation and the Bounded Wait Problem

The problem with the current implementation is that there is no guarantee each process waiting for the lock will wake up. Suppose processes B and C are competing for a lock already owned by process A. When process A releases the lock, random chance may cause process B to acquire the lock. Then suppose process A waits on the lock again, and random chance causes process A to acquire the lock when process B releases it. Notice that the above scenario could continue indefinitely - it is possible that process C would never acquire the lock. That is called starvation.

The idea behind Bounded Wait is that all processes will wake up eventually. Stated more formally, if a given lock is released an infinite number of times, than any acquire operation will eventually succeed.

We satisfy bounded wait by using a Wait Queue. A wait queue has the following properties:

  • It is a queue of processes waiting for a lock.
  • When the lock is unlocked, the processes acquire the lock in queue order
  • Processes on the queue are in the blocked state.

This type of synchronization object is known as a mutex, short for mutual exclusion.

Implementing a Mutex

Let's start by defining the mutex structure:

typedef struct wait {
    procdescriptor_t *p;           /* process to wake */
    struct wait *next;             /* points to next element of wait queue */
} wait_t;
 
typedef struct mutex {
    unsigned locked;
    wait_t *wfirst;                /* head of wait queue */
    wait_t *wlast;                 /* tail of wait queue */
    /* something else later */
} mutex_t;

What we have created is a mutex with a simple locked/unlocked state variable. To keep track of what processes are waiting on the lock, we use a singly-linked list along with its head and tail pointers. When a process tries to acquire the lock and fails, it will be added to this list. When the lock is released, the next process on the list, if any, is woken up so it can acquire the lock.

A First Attempt

Given the idea behind the implementation, let's implement the bacquire function. (not to be confused with acquire for spinlocks)

void bacquire(mutex_t *m) {
 
   if (m->locked == 0 && m->wfirst == NULL) {         /* Grab the lock if unlocked and */
       m->locked = 1;                                 /* the wait queue is empty */
       return;
   }
 
   /* create an entry in the linked list of waiting processes...
    * note that w is a local variable */
   wait_t w;
   w.p = current;
   w.next = NULL;
   if (m->wlast)
       m->wlast->link = &w;
   else
       m->wfirst = &w;
   m->wlast = &w;
 
   current->state = BLOCKED;
 
   schedule();
 
   /* we are woken up by brelease when the lock has been
    * released and it is our turn to grab it */
 
   assert(m->locked == 0);
   assert(m->wfirst == &w);
 
   /* remove ourselves from the list */
   m->locked = 1;
   if (w.link != NULL)
       m->wfirst = w.link;
   else
       m->wfirst = m->wlast = NULL;
}

And here's the brelease function:

void brelease(mutex_t *m) {
   if (m->wfirst)
       m->wfirst->state = RUNNING;
   m->locked = 0;
}

There are two things to notice about this code. One is an aside from the primary purpose of the code, but is worth pointing out nonetheless. The wait_t structure in bacquire exists on the stack, and yet is added to a linked list external to this function. In most cases this would not work, but remember that we are in kernel mode - when we call schedule, our stack (and the wait_t structure on it) is set aside while other processes execute. Thus passing around pointers to stack variables is, in this case, safe to do.

The second point is central to the discussion of synchronization. Notice that the first, last, and locked fields of the wait_t structure are shared state. (If you don't see that, imagine what would happen if two processes called bacquire simultaneously.) Thus there is a need for synchronization. We can use our old friends the spinlocks to implement this.

Note that we are building higher level synchronization objects on top of lower level objects. This is a key concept, and is worth remembering as a principle of handling synchronization.

An Improved Version

Start by adding a spinlock to the mutex_t structure:

typedef struct {
   unsigned locked;
   wait_t *wfirst;             /* head of wait queue */
   wait_t *wlast;              /* tail of wait queue */
   spinlock_t l;               /* <--- ADDED */
} mutex_t;

Now we need to acquire and release that spinlock around the critical sections. In this case, the critical sections are the entire bacquire and brelease functions. The spinlock must be acquired at the beginning of the function and released at the end.

There is one more location where the spinlock must be released. Look back at the implementation of bacquire before continuing to try and figure out where it is. Remember that one of the functions called by bacquire is different from the others.

Here's the final implementation of bacquire and brelease in all their glory:

void bacquire(mutex_t *m) {
   acquire(&m->l);                            /* <--- ADDED */
   if (m->locked == 0 && m->wfirst == NULL) {
       m->locked = 1;
       release(&m->l);
       return;
   }
 
   wait_t w;
   w.p = current;
   w.next = NULL;
   if (m->wlast)
       m->wlast->link = &w;
   else
       m->wfirst = &w;
   m->wlast = &w;
 
   current->state = BLOCKED;
 
   release(&m->l);                            /* <--- ADDED */
   schedule();
   acquire(&m->l);                            /* <--- ADDED */
 
   assert(m->locked == 0);
   assert(m->wfirst == &w);
   m->locked = 1;
   if (w.link != NULL)
       m->wfirst = w.link;
   else
       m->wfirst = m->wlast = NULL;
   release(&m->l);                            /* <--- ADDED */
}
 
void brelease(mutex_t *m) {
   acquire(&m->l);                            /* <--- ADDED */
   if (m->wfirst)
       m->wfirst->state = RUNNING;
   m->locked = 0;
   release(&m->l);                            /* <--- ADDED */
}

Notice we release the spinlock while we are blocked. Otherwise a deadlock will result because the spinlock must be acquired to release the mutex. For this reason, in general spinlocks cannot be held across calls to schedule.

Notice that this implementation solves the two problems that were pointed out earlier.

  1. bacquire will block if it needs to wait for the lock. This improves utilization because it doesn't waste CPU time waiting for the lock to become available.
  2. The bounded wait condition is satisfied. The wait queue ensures that processes waiting for the mutex are served in a first-come, first-serve order.

Historically, this functionality took a long time to design correctly. There are many locations for subtle race conditions to creep in. An example is the sleep-wakeup race.

The Sleep-Wakeup Race

The sleep-wakeup race is, quite expectedly, a race condition that results from an improper implementation of blocking. When the sleep-wakeup race exists, it is possible that a process can go to sleep indefinitely and never be woken up.

Such a race condition can exist in quite subtle ways. As an example, take the bacquire function, implemented above, and change the code as follows:

    // in the bacquire function... this is BAD CODE... do NOT use this!
 
    //current->state = BLOCKED;      // <--- This line commented out
    release(&m->l);
    current->state = BLOCKED;        // <--- This line added
 
    schedule();
 
    acquire(&m->l);

With a simple swap of two lines of code, we have introduced the sleep-wakeup race into the mutex code. Here's why:

  1. Process A, trying to acquire the mutex, reaches the code snippet above. It releases the spinlock, and immediately afterwards gets preempted by process B
  2. Process B, which owns the mutex, tries to release the mutex. It succeeds, because the spinlock is free.
  3. Process B, when releasing the mutex, grabs the first process on the wait queue, which is Process A, and sets its state to RUNNABLE
  4. When Process A regains control, it sets its state to BLOCKED and schedules, waiting to be woken up

Since the mutex is now free, but Process A is still blocked on the wait queue, Process A will remain blocked indefinitely.

Notice the implications of this: the process's state is part of the critical section, even though the process's state is not part of the mutex_t structure.

Where Next?

Now that we have a blocking mutex, we are able to replace the acquire and release functions in the circular pipe buffer with the mutex functions. We have slightly inproved the utilization of the readc and writec functions, because one polling wait has been replaced with a blocking wait.

However, the main source of inefficiency in readc and writec still remains: the polling loop to determine if the pipe buffer is ready for either reading or writing. That, however, requires waiting on a condition, not on a lock. We have seen how we can build complex synchronization objects on top of less-complex ones. A similar trick will be used to solve this problem as well.

 
notes/lec9.txt · Last modified: 2007/12/13 11:26 by kohler
 
Recent changes RSS feed Driven by DokuWiki