You are expected to understand this. CS 111 Operating Systems Principles, Fall 2006
You are here: CS111: [[2006fall:notes:lec10]]
 
 
 

Lecture 10 notes

  • Synchronization
  • Memory

Introduction

Why have we been studying synchronization for so long? Recall that one main goal of the operating system is to create the process abstraction. This abstraction virtualizes the machine's resources, allowing multiple applications to share the same hardware resources while preserving hard modularity. That way, we can scale up the number of applications we run without necessarily buying one computer per application, and while keeping the effects of application bugs relatively isolated. The three parts of the Von Neumann computer -- I/O devices, processors, and primary memory -- all need to be virtualized to make the process abstraction work. We first spent some time on files, which abstracts a computer's I/O devices. Synchronization is the next required piece! It lets multiple "virtual processors" share a single physical processor (or a couple physical processors, in a symmetric multiprocessing machine) without stepping on each other's toes. We build up high-level synchronization objects from low-level properties like read-write coherence, as indicated in this diagram.

mini-diagram.jpg

Utilization

But we want multiple processes to share a single computer's resources efficiently, without significant overhead. (Processes wouldn't be a useful abstraction if you could never run more than one!) So we need a metric that measures how efficiently a computer's resources are being used. That metric is utilization.

Utilization refers to the fraction of a machine's resources being used productively.

W - Amount of resource spent doing useful work
N - Total amount of resource
Utilization = W/N

The "resource" is often time, so that W and N are measured in seconds, but other units are possible too, such as cycles, disk seeks, and so forth. An important component of utilization is the definition of "useful work". This can be different depending on how you look at it. For example, consider processor utilization. From the kernel's point of view, "useful work" is any time spent running on behalf of any application. From a single application's point of view, "useful work" is the time that that application gets to run. (The application doesn't care about other applications.)

Example: If W = 10 units, and N = 10000 units, then Utilization = 1/1000.

Utilization drops when one or more threads/processes are spinning or blocking instead of doing useful work. For example, lock contention reduces utilization, because many threads are waiting for a lock rather than doing work. Spinning is usually worse than blocking for utilization, because as a thread spins (i.e., polls over and over again to see whether a lock is free), it is preventing other threads from doing work; whereas when a thread blocks (i.e., becomes nonrunnable), it allows other threads to make progress.

One way to improve utilization is to allow multiple threads to run in parallel. For example, imagine three processes, P1, P2, and P3, that all access the same shared variable, V. These processes are running on a 2-processor machine. Each process reads the variable 99% of the time, and writes the variable 1% of the time; and the processes do nothing else. If we protect the variable with a conventional mutex, each process will see a utilization of about 33%, because no two processes can run simultaneously. But if we use a read/write lock, two processes, running on the two different processors, can read the variable at the same time. The utilization will be much closer to 66%.

Blocking on a Condition

We know now that spinning is bad for utilization. But our existing bounded buffer implementation spins!

typedef struct pipebuf {
	mutex_t l; 
	char buf[N];  
	int head;   
	int tail;
} pipebuf_t;
 
void writec (pipebuf_t *pb, char c) {
    while (1) {
        acquire(pb->l);
        if (pb->tail – pb->head < N) {
            pb->buf[pb->tail % N] = c;
            pb->tail++;
            release(pb->l);
            return;
        }
        release(pb->l);
    }
}
 
char readc(pipebuf_t *pb) {
    while (1) {
        acquire(pb->l);
        if (pb->head != pb->tail) {
            char c = pb->buf[pb->head % N];
            pb->head++;
            release(pb->l);
            return c;
        }
        release(pb->l);
    }
}

When the buffer is emtpy, readc will constantly be spinning, waiting for the condition pb->head != pb->tail. If the buffer is full, writec will constantly be spinning, waiting for the condition pb->tail - pb->head < N. This will reduce utilization because the threads that spin take up processor time without doing useful work.

What if we use a blocking mutex in place of a regular mutex?

typedef struct pipebuf {
	bmutex_t l;       
	char buf[N];
	int head;
	int tail;
} pipebuf_t;

In general, blocking mutexes improve utilization by releasing the processor to do more useful work.

However, this code spins even if we use a blocking mutex for pb->l. The problem is that each function spins waiting for a condition. The threads aren't blocking for very long on the acquire(pb->l) operation, since other threads release that mutex quickly after acquiring it. That's not the problem. Threads spin independently of whether the mutex is held, because they are waiting for something to become true of the relationship between pb->head and pb->tail.

So let's Introduce a new function: wait_for(expression). This function is impossible to implement, but we'll see why it's useful, and then see how to change it to an implementable version.

  • Will block until expression is true.

Let us try replacing the if statements inside writec and readc with the wait_for function call.

void writec(pipebuf_t *pb, char c) {
	acquire(pb->l);
	wait_for(pb->tail – pb->head < N);
	pb->buf[pb->tail % N] = c;
	pb->tail++;
	release(pb->l);
}
 
char readc(pipebuf_t *pb) {
	acquire(pb->l);
	wait_for(pb->head != pb->tail);
	int c = pb->buf[pb->head % N];
	pb->head++;
	release(pb->l);
	return c;
}

Problem with this implementation: We acquire a lock using the acquire function and then block until the expression holds true. This leads to a deadlock because the lock is held while the function is blocked. For example, if one thread blocks in readc, then another thread's later attempt to writec will block while trying to acquire pb->l. No further progress can be made.

Let us try moving the acquire function call after the wait_for function.

void writec(pipebuf_t *pb, char c) {
	wait_for(pb->tail – pb->head < N);
	acquire(pb->l);
	pb->buf[pb->tail % N] = c;
	pb->tail++;
	release(pb->l);
}
 
char readc(pipebuf_t *pb) {
R1	wait_for(pb->head != pb->tail);
R2	acquire(pb->l);
R3	int c = pb->buf[pb->head % N];
R4	pb->head++;
R5	release(pb->l);
R6	return c;
}

As you can see, the acquire was shifted down after the wait_for function. This will prevent the system from deadlocking like in the previous example.

Problem with this implementation:
This creates a race condition between wait_for and acquire.

For example, assume that two threads try and read a character from an empty pipe. They will both block on line R1. Now, another thread writes a single character to the pipe. Both threads will wake up, because the expression pb->head != pb->tail becomes true. They will both then continue on:

Thread T1 Thread T2 Thread T3 pb->head pb->tail
R1 R1 0 0 T1 and T2 block
writec 0 1 T1 and T2 both wake up
R2-R6 1 1 T1 runs first, reads the character
R2-R6 2! 1 T2 runs second, reads a nonexistent character

Uh oh!

This code, on the other hand, actually works:

void writec(pipebuf_t *pb, char c) {
	while (1) {
		wait_for(pb->tail – pb->head < N);
		acquire(pb->l);
		if (pb->tail - pb->head < N) {
			pb->buf[pb->tail % N] = c;
			pb->tail++;
			release(pb->l);
			return;
		}
		release(pb->l);
	}
}
 
char readc(pipebuf_t *pb) {
	while (1) {
		wait_for(pb->head != pb->tail);
		acquire(pb->l);
		if (pb->head != pb->tail) {
			int c = pb->buf[pb->head % N];
			pb->head++;
			release(pb->l);
			return c;
		}
		release(pb->l);
	}
}

We get around the race condition by checking the wait_for condition again, inside the critical section. But we check it using an if statement, not a wait_for, because wait_for might block and lead to deadlock.

Condition Variables

The condition variable synchronization object makes this pattern practical. Waiting on an arbitrary expression, like wait_for does, is really hard to implement correctly. A condition variable represents a specific condition. Threads can block until this condition is true. The downside is that now, threads must manually notify the condition variable when the condition has become true.

Using a condition variable generally involves three parts:

  • A boolean expression that defines the condition. This is like what would get passed to wait_for.
  • A [blocking] mutex that protects everything that affects the boolean expression. That is, if thread T has acquired the mutex, then only thread T can cause the expression to change value.
  • A condition variable that represents the condition. Threads that want to wait until the expression is true will wait for the condition variable instead. Threads that change the expression to true must notify the condition variable, which causes any waiting threads to wake up.

The condition variable is used in combination with the mutex.

Condition Variable Operations

  • wait(condvar_t *cv, bmutex_t *l)
    • Also known as cond_wait
    • On entry, l must have been acquired and the condition expression is false
    • Will release l, then block until a thread calls notify or broadcast on cv
    • Re-acquires l before returning to caller
  • notify(condvar_t *cv)
    • Also known as cond_notify
    • A thread calls notify when it thinks it has changed the expression to true
    • Wakes up the first thread waiting on cv
  • broadcast(condvar_t *cv)
    • Also known as cond_broadcast
    • Wakes up every thread that is waiting on cv

An implementation of condition variables is considered fine-grained when there is one condition variable per blocking condition. There are two condition variables for the pipebuf structure: one variable for the blocking condition of an empty buffer and another variable for the blocking condition of a full buffer. In the following example, we have two condition variables labeled: nonfull, nonempty.

struct pipebuf {
      bmutex_t l;
      char buf[N];
      int head;
      int tail;
      condvar_t nonfull, nonempty;
}

These two functions will now wait for when the condition variables are true before continuing with their execution. In the writec function, writing a character will advance the tail pointer. When tail - head == N, the buffer is considered full and writec will not be able to write. Writec will block on the wait(&pb->nonfull, &pb->l) function call until a read is performed, which advances the head pointer and notifies that the pb->nonfull variable is true. Once the nonfull variable is set to true, the writec function can perform its write. Thus, the pb->nonfull condition variable corresponds to the condition expression pb->tail - pb->head < N. Similarly, the readc function will block if the buffer is empty. It waits for writec to be called so that the nonempty condition variable is set to true and there will be a character to read in the buffer. The pb->nonempty condition variable corresponds to the condition expression pb->tail != pb->head. Note that we still must check the actual condition expression!

void writec(pipebuf_t *pb, char c) {
	acquire(&pb->l);
	while (1) {
		if (pb->tail - pb->head < N) {
			pb->buf[pb->tail % N] = c;
			pb->tail++;
			notify(&pb->nonempty);  
			release(&pb->l);
			return;
		}
		wait(&pb->nonfull, &pb->l);
	}
}
 
char readc(pipebuf_t *pb) {
	acquire(&pb->l);
	while (1) {
		if (pb->head != pb->tail) {
			int c = pb->buf[pb->head % N];
			pb->head++;
			notify(&pb->nonfull);
			release(&pb->l);
			return c;
		}
		wait(&pb->nonempty, &pb->l);
	}
}

During notify, the mutex is held. Wait must acquire the mutex, but there might be another reader in the critical section. Therefore, we must keep waiting inside the while loop.

For more examples, visit this website

Virtual Memory

Definition

An imaginary memory area supported by some operating systems (for example, Windows but not DOS) in conjunction with the hardware. You can think of virtual memory as an alternate set of memory addresses. Programs use these virtual addresses rather than real addresses to store instructions and data. When the program is actually executed, the virtual addresses are converted into real memory addresses.

The purpose of virtual memory is to enlarge the address space, the set of addresses a program can utilize. For example, virtual memory might contain twice as many addresses as main memory. A program using all of virtual memory, therefore, would not be able to fit in main memory all at once. Nevertheless, the computer could execute such a program by copying into main memory those portions of the program needed at any given point during execution.

Definition of virtual memory taken from webopedia.com.

Lecture

Virtual memory is used to provide isolation between processes. It also prevents processors from hacking each other's allocated memory.

Prof. Kohler Asks:

Q: Is it possible to implement virtual memory without processor support?
A: Not nicely.

Suppose we have 512MB memory and two processes, namely P1 and P2. How do we keep P1 from accessing P2's memory?

mem.jpg

Methods to Isolate Memory

Binary Translation

One method of preventing processes from accessing each others' memories is to implement binary translation. This method changes an instruction that accesses memory into a set of instructions that simulates the memory access to check if it should be allowed.

Example of binary translation:

original instruction:

movl x, %eax

translated instruction:

compl $x, 2^25
jge error
movl x, %eax
error, kill process
Are there any trade-offs to using binary translation?

Depending on how it is implemented, the execution time of the instruction is 2x - 40x slower. Lower Utilization!

Are there any other options to protect process memory?

Hardware support: domains segmentation

Idea: A process has access to a set of domains, which are contiguous regions of memory (low/high). Processor faults on memory access outside valid domain

movl x, %eax

     if (x >= low && x+4 <= high)
         movl x, %eax
     else
         fault;

(*) x+4 because movl is 4 bits

Things Processes Can't do

(or isolation would fail)

  • Change low & high
  • change OTHER processes' low and high
  • change kernel memory

(^ Processor allows these only when current protection level is KERNEL)

so what can the processor do? let's create a new domain.

fork() - how much memory does the domain need?

unfortunately this creates an allocation problem.

Example

Suppose we have 1024 MB memory, 512 MB of it reserved for Kernel. P1 needs 128 MB, P2 needs 256 MB and P3 needs 128 MB. So the memory map looks something like this:

memory1.jpg

Suppose P1 and P3 exits, creating 256 MB free memory.

memory2.jpg

Another process called P4 enters and it needs 256MB. Perfect! but not really. Technically we have the space to accomodate P4, it turns out that P4 can't start, because there's no contiguous 256MB space! The free space created by freeing P1 and P3 is separated by P2's allocation. This is called External Fragmentation.

Fragmentation

space available that can't be allocated

  • External Fragmentation: space is between other allocations
  • Internal Fragmentation: space is inside allocations

So, What Should we do?

  • Compaction - move P2 so that free space isn't fragmented. Would work, but this is a high cost strategy since we would need to copy 256MB of data and complex because we would have to change pointers around.
  • Same size allocation - allocate x contiguous bytes. if x <= N, give N bytes. if x > N, fail.
  • Paged Virtual memory - A solution!

Paged Virtual Memory

Recall our example problem: process P4 wants contiguous memory, but there is external fragmentation.

SO Hardware desginers introduce a map. Processes' address are different from physical address Map function virtual address => physical address.

Pages

Simplify address map by working with blocks of memory, PAGESIZE bytes long (x86 pagesize = 212 = 4KB)

addrmap (va) = pagemap( floor ( va/ PAGESIZE)) * PAGESIZE + (va mod PAGESIZE);
 
2006fall/notes/lec10.txt · Last modified: 2007/09/28 00:25 (external edit)
 
Recent changes RSS feed Driven by DokuWiki