You are expected to understand this. CS 111 Operating Systems Principles, Fall 2006
You are here: CS111: [[2006fall:notes:lec7]]
 
 
 

Lecture 7 notes on Synchronization II

By Sabah Chaudhry, Kaushal Naik, Emilia Wirjakartika

Topics covered in the lecture:

  • Building Locks
  • Critical Sections
  • Synchronization Objects

----

Considering the examples from the last lecture:

    typedef struct pipebuf {
       char buf[N];
       unsigned head;
       unsigned tail;
    } pipebuf_t;
 
    void writec(pipebuf_t *p, char c) {
1:      while (p->tail - p->head == N) 
2:          /* */;	
3:      p->buf[p->tail % N] = c;
4:      p->tail++;
    }
 
    char readc(pipebuf_t *p) {
1:	while (p->head == p->tail)
2:	    /* */;
3:      char c = p->buf[p->head % N];
4:      p->head++;
5:      return c;
    }

This code is correctly synchronized, actually, as long as at most one thread calls writec() at any time and at most one thread calls readc() at any time. Single calls to writec() and readc() can run in parallel, but two writec()s must never run in parallel (and similarly two readc()s must never run in parallel). This is because of the One Writer Principle: If each variable has only one writer, coordination becomes easier. With the given constraints, this bounded buffer implementation follows the one-writer principle: p->tail is modified only by the single writec(), and p->head only by the single readc().

Examples

But what if there is more than one writer/reader in parallel? Disaster. Given the following five threads, running code as follows:

 W1   writec(1); 
 W2   writec(2); 
 R1   printf("%d\n", readc());
 R2   printf("%d\n", readc()); 
 R3   printf("%d\n", readc());

We would expect output of either "2, 1" or "1, 2" (separated by newlines). But how can we get "1, 1" as output? This can be achieved if the instructions are executed in the following sequences:

 W1   writec(1)   1..4
 W2   writec(2)   1..4 

State of buffer after this instruction

block1.jpg

 R1   readc()     1..3	(this thread's c == 1)
 R2   readc()     1..3	(this thread's c == 1)
 R1   readc()     4,5

hd1.jpg

 R2   readc()     4,5

end.jpg

Something is very wrong.

Critical Section

A critical section is a set of instructions so that indivisibility is preserved if at most one thread's instruction pointer is in the set at any time.

To achieve synchronization -- that is, to achieve indivisibility, and thus make it possible to model multiple virtual processors on a single physical processor -- any critical sections in the operating system code must be enforced. That is, we must actively prevent more than one thread's IP from entering a critical section at any time.

Enforcing critical sections has two important subproblems:

1. Mutual Exclusion. There can never be two threads' IP in the Critical Section simultaneously.
2. Bounded Wait (No starvation) Every thread that is attempting to enter a Critical Section waits at most a bounded amount of time. This bound prevents starvation of any single thread.

Whenever we think about critical sections, ask ourselves these questions:

  • What kinds of concurrency that we want to prevent?
  • How many Critical Sections are there?
  • How big are the Critical Sections?
  • What are the Critical Sections?

----

Critical Sections in the above code:

First Choice: There is one critical section, consisting of Line 4 in both functions.

This choice is not a sufficient critical section, as the example shows.

Second Choice:

All of writec() is a single critical section, and all of readc() is another critical section.

This re-enforces the One Writer Principle, above. It prevents multiple concurrent writers/readers, but still allows one readc() and one writec() to execute in parallel. This successfully enforces mutual exclusion. However, these critical sections are not the minimal critical sections. If a lot of code is in a critical section, performance is affected negatively (because fewer things can happen in parallel) and bounded wait is usually harder to achieve. This is especially true when a critical section contains a big long loop, like the while (1) loops here. If critical sections are minimal -- if most code can run in parallel -- this usually reduces waiting time (bounded wait) for other threads.

If we change the writec() function to:

    void writec(pipebuf_t *p, char c) {
1:      while (1) { 
2:          if (p->tail - p->head < N) {
3:              p->buf[p->tail % N] = c;
4:              p->tail++;
5:              return;
            }
        }
    }

Then our choices for Critical Section would also change.

Supposing we choose lines 3 and 4 to be put in Critical Section. It is not sufficient Critical Section as can be shown by this table:

tab.jpg

The correct Minimal Critical Section for writec() consists of lines 2, 3, and 4.

Now the readc( ) can be rewritten as

    char readc(pipebuf_t *p) {
1:      while (1) {
2:          if (p->head != p->tail) {
3:              char c = p->buf[p->head % N];
4:              p->head++;
5:              return c;
	    }
        }
    }

Likewise, the minimal critical section here consists of lines 2, 3, and 4.

Looking at the W1,W2,R1,R2,R3 example from above with these critical sections, the only possible outputs are "1,2" and "2,1". This does not violate indivisibility!

How to implement Critical Section

We have several different options for implementing the critical sections.

1. Uniprocessor Machine

Let's assume that readc() and writec() are system calls, and further assume that no interrupts ever happen when the processor is running kernel code. In such a situation readc() and writec() already enforce the necessary critical sections! On a uniprocessor machine without interrupts in kernel mode, the code in a system call cannot be interrupted by anything else, so there is already mutual exclusion. However, the above implementations of readc() and writec() wouldn't work well in such a kernel, since they can block forever, waiting for a character to read or for space to write. To avoid blocking, we might rewrite readc() to a non-blocking version:

     char readc_cs(pipebuf_t *p) {
	 if (p->head != p->tail) {
             char c = p->buf[p->head % N];
             p->head++;
	     return c;
	 } else 
	     return -1;
     }
2. Uniprocessor + Timer Interrupts

However, this method is obviously impractical. Let's alter our uniprocessor machine so that interrupts can happen in the kernel. Now we actually need to enforce the critical sections ourselves, with real code. But in this machine, the only thing that can interrupt a critical section is -- an interrupt. So we enforce critical sections by disabling interrupts before the critical section, and enabling interrupts right after the critical section is over. We want to leave interrupts disabled for as little time as possible, since interrupts are very important for the operating system's correct functioning.

    void writec(pipebuf_t *p, char c) {
1:	while (1) { 
2:	    disable_interrupts();
3:          if (p->tail - p->head < N) {
4:              p->buf[p->tail % N] = c;
5:              p->tail++;
6:		enable_interrupts();
7:              return;
8:          }
9:          enable_interrupts();
        }
    }

Notice that we reenable interrupts on both line 6 and line 9. This is because there are two branches out of the critical section. We need to put the enforcement code on every critical section entry point and every exit point.

    char readc(pipebuf_t *p) {
1:      while (1) {
2:	    disable_interrupts();
3:          if (p->head != p->tail) {
4:              char c = p->buf[p->head % N];
5:              p->head++;
6:              enable_interrupts();
7:              return c;
8:	    }
9:	    enable_interrupts();
        }
    }  

If we had left out the enable_interrupts() on line 9, then in the case when the buffer is full, the code would leave interrupts disabled and therefore spin forever. Yuck!

3. Multiprocessor

Interrupt disabling is very important no matter what, but if there are multiple processors running on the same memory space, it is not sufficient. Now interrupts aren't the only way for multiple threads to enter the same critical section: threads running on different processors can enter the critical section simultaneously regardless of interrupt state.

To solve the issue, we need a Mutex.

Mutex (Mutual Exclusion Lock)

Here are the mutex's operations.

  typedef /* some type */ mutex_t;
  void acquire(mutex_t *m);
  void release(mutex_t *m);

acquire(m): the calling thread waits until no other thread has acquired m, then acquires m.

release(m): allows other threads to acquire m.

Here's an attempt at a mutex implementation using integers

typedef int mutex_t;	(initially 0)
 
void acquire(mutex_t *m) {
    while (*m != 0)
	/* */;
    *m = 1;
}
 
void release(mutex_t *m) {
    *m = 0;
}

What's wrong with this? Well, these functions have Critical Sections in themselves! To resolve this we need to rely on underlying support for locking from the processor. To use it, let's rewrite the acquire code.

    void acquire(mutex_t *m) {
1:      while (1) {
2:          if (*m == 0) {
3:              *m = 1;
4:              return;
            }
        }

We need lines 2 and 3 to execute in a critical section of their own. Processors already enforce mini-critical sections, in that simple single instructions (such as loading an aligned word from memory, or storing an aligned word into memory) execute indivisibly. This is called read-write coherence. But in assembly language, lines 2 to 4 will be written as more than one instruction:

    movl m, %eax
    jnz .L1
    movl $1, %eax
    movl %eax, *m

.L1:

RSM: Read & Set Memory

Processor designers heard our cries for help and implemented a single machine instruction that does lines 2-4 for us. This single machine instruction t_a_s (test and set) has the same behavior as this implementation, except that it always executes indivisbly.

    int t_a_s(int *a, int new_v) {
	int old = *a;
	*a = new_v;
	return old;
    }

Since the multiple processors in a computer share memory, this test-and-set can be used in a loop to wait until the other processor is done. When the code leaves the critical section, it clears the flag and another process can begin afterwards.

The acquire function will become:

    void acquire(mutex_t *m) {
        while (t_a_s(m, 1) == 1)
            /* */;
    }

This acquire function is not a privileged function, and it can run at user level.

The writec() function will become:

    void writec(pipebuf_t *p, char c) {
1:      while (1) { 
2:          acquire(&mutex);
3:          if (p->tail - p->head < N) {
4:              p->buf[p->tail % N] = c;
5:              p->tail++;
6:              release(&mutex);
7:              return;
8:          }
9:          release(&mutex);
        }
    }

Similarly readc() will become:

    char readc(pipebuf_t *p) {
1:      while (1) { 
2:          acquire(&mutex);
3:          if (p->tail != p->head) {
4:              char c = p->buf[p->head % N];
5:              p->head++;
6:              release(&mutex);
7:              return c;
8:          }
9:          release(&mutex);
        }
    }

Both functions use the same mutex, a global mutex object. But that enforces a larger critical section than the minimal ones we identified. The critical section is lines 2-7 lines, and in readc( ) it is lines 2-8. Now read and write can't happen at the same time. This arrangement of mutexes, to protect much larger than the minimal critical sections, is called:

Coarse grained locking

Enforces small number of large critical sections.

+ It is a lot easier to program
- The performance is affected negatively because less threads can be run simultaneously. Starvation, in which a process never gets sufficient resources to run to completion, can also happen.

Fine Grained Locking

Enforces minimal critical sections.

+ Good performance, since many processes can usually run in parallel (but beware of the cost of acquiring and releasing many fine-grained locks!)
- It is harder to maintain correctness since mistakes can be made easily due to the larger number of locks

It's obvious how to make the bounded buffer use more fine-grained locking, luckily. Just add a mutex per pipebuf_t, since the critical sections are per-pipe buffer.

    typedef struct pipebuf {
	char buf[N];
	unsigned head;
	unsigned tail;
        mutex_t m;
    } pipebuf_t;
 
    void writec(pipebuf_t *p, char c) {
1:      while (1) { 
2:          acquire(&p->m);
3:          if (p->tail - p->head < N) {
4:              p->buf[p->tail % N] = c;
5:              p->tail++;
6:              release(&p->m);
7:              return;
8:          }
9:          release(&p->m);
        }
    }
 
    char readc(pipebuf_t *p) {
1:      while (1) { 
2:          acquire(&p->m);
3:          if (p->tail != p->head) {
4:              char c = p->buf[p->head % N];
5:              p->head++;
6:              release(&p->m);
7:              return c;
8:          }
9:          release(&p->m);
        }
    }

Finer Grained Locking

But even this enforces a single critical section among both readc and writec, which is more than the minimum. A true finest-grained locking arrangement would use different mutexes for read and write. And the above code will become:

    typedef struct pipebuf {
	char buf[N];
	unsigned head;
	unsigned tail;
        mutex_t rm;
        mutex_t wm;
    } pipebuf_t;
 
    void writec(pipebuf_t *p, char c) {
1:      while (1) { 
2:          acquire(&p->wm);
3:          if (p->tail - p->head < N) {
4:              p->buf[p->tail % N] = c;
5:              p->tail++;
6:              release(&p->wm);
7:              return;
8:          }
9:          release(&p->wm);
        }
    }
 
    char readc(pipebuf_t *p) {
1:      while (1) { 
2:          acquire(&p->rm);
3:          if (p->tail != p->head) {
4:              char c = p->buf[p->head % N];
5:              p->head++;
6:              release(&p->rm);
7:              return c;
8:          }
9:          release(&p->rm);
        }
    }
 
2006fall/notes/lec7.txt · Last modified: 2007/09/28 00:25 (external edit)
 
Recent changes RSS feed Driven by DokuWiki