You are expected to understand this. CS 111 Operating Systems Principles, Spring 2007
You are here: CS111: [[2007spring:notes:lec9]]
 
 
 

Lecture 9 Scribe Notes

Brought to you by: The Incomparable Alex Honda, World's Finest: Oscar Chan, and Phillip "Phillip Tao" Tao

Hardware Atomicity

Last lecture, we said that hardware provides sequential consistency at the level of instructions: instructions are atomically isolated. But this is a simplification, and there are a couple cases where real hardware does not isolate individual instructions. In the first section of the lecture, we'll look at some instances of non-atomic instructions.

Consider the simple implementation of the deposit function for the Pennyworth Bank shown below.

deposit( account *a, unsigned amt ){
   1:   movl 4(%eax), %edx
   2:   addl %ebx, %edx
   3:   movl %edx, 4(%eax)
}

The account structure (account_t) is stored at memory address 58. The amount of money in the account is stored 4 bytes after the start of the structure. Notice what the process does:
- It reads a value from memory
- adds the deposit amount
- and writes the new value back

There are two synchronization problems with this example. The first is visible if a timer interrupt occurs after the read from memory and before the write. If another process runs and modifies the account value before this process can run again there are unexpected results.

Solutions to this were presented in the last lecture. Assume that the deposit code is executed atomically. Now, think what happens in the 4 processor model presented below: For a 4 processor model assume that account is located at address 58 (%eax=58)

4 Processor Model
Processor 1 Processor 2 Processor 3 Processor 4
Instruction mov $327685, l mov $6, l mov $97, l movl 4(%eax), %edx
Hex value $0x50005 $0x6 $0x00061 value = 0x50061
Cache Index [62-63] [64-65] [62-65]

The diagram shows each of the processors and what code is being executed on each, as well as the hexadecimal equivalents of the numeric constants, and the cache line indexes which are used for each instruction. In this example the value l represents 4+%eax. Now consider this: If all these processes run immediately after each other, but on separate processors, accessing the same cache, what is the value of %edx?

If you understand the concept of a cache line feel free to skip the next section.

Cache Lines

A cache is a hardware that is randomly accessed by the processor that acts as a temporary buffer for the main memory. Often there are several levels of cache. The closer to the processor a cache line is the faster, and thus more expensive it is. The further away it is from the processor it is, the cheaper and more plentiful it is. The diagram below shows how multiple cache lines work.
Cache Line Illustration

Each hardware component is labeled on top. The relative pricing is denoted at the bottom. The processor accesses a subset (green) the L1 cache. The L1 cache holds a subset (blue) of the L2 cache. The L2 cache in turn holds a subset (purple) of main memory.

A cache line is a subset of a cache. In modern processors cache lines are generally multiples of 64. This means indexing for cache lines are 0-63, 64-127, 128-191, 192-255, etc. When memory is accessed, the processor reads data not from main memory, but from a cache line because it is faster.

Atomicity (cont'd)

So what happens in the 4 processor example mentioned earlier?

P1 begins
0x50005 is loaded from the program in main memory into cache line index 62.
because of its size, 0x0005 occupies indexes 62 and 63 and 0x0005 occupies indexes 64 and 65.
P2 begins
indexes 64 and 65 are loaded with 0x0006
P3 begins
indexes 64 and 65 are loaded with 0x0061
P4 begins
%eax(58) is added to 4 ending in 62
the movl moves the currents values of 62-65 into %edx
62 and 63 are still 0x0005
64 and 65 are now 0x0061

The end result for register %edx is 0x50061.

Atomicity is limited to a single cache line. This is a hardware problem. Software solution to hardware problem:

Alignment: Address of an object should be a multiple of it's size
This ensures that it will fit in a cache line
Compiler does this for you
Objects rounded up to the nearest power of 2

Read-and-Modify Instructions

It turns out that x86 assembly allows us to write the three lines of code using a single instruction that reads and modifies memory! Let's try using that.

alternate_deposit_with_compressed_code(account_t *a, unsigned amt) {
   addl %ebx, 4(%eax)
}

Applying the same test we find that the same problem occurs. What is going on? It turns out that all we really have done is compressed the same operations into a single assembly statement (as hinted by the function name). It does not mean we have solved any atomicity problems. Only certain functions guarantees atomicity.

List of Atomic Instructions
Aligned loads from memory
Aligned stores to memory
Operations on registers
Synchronization Instructions
-prefixed with lock;
-Note: 100-1000x slower
-test_and_set
-fetch_and_add
-compare_and_swap

We must not assume that any function does unless we know for a fact it operates atomically.

How can we use these synchronization instruction? First we define the operation of compare_and_swap(): It is the hardware equivalent of the following c code executing atomically-

int compare_and_swap(unsigned *val, unsigned oldval, unsigned newval) {
   if(*val==oldval) {
      *val=newval;
      return 1;
   }
   return 0;
}

Next we ask how can we use compare_and_swap? We do this with an esoteric example - atomic multiply by six and subtract 4 function

atomic_multiply_6_sub_4( unsigned *val) {
   unsigned oldvalue;
   do {
      oldval=*val;
   } while( compare_and_swap(val, oldval, oldval*6-4) );
}

Even though it is clear that this has very few applications, it should also be clear how to create other, more useful atomic instructions based off this one.

Reasons for Synchronizing
1 When 2 processes compete for data
2 When a process is multi threaded
3 When signals or interrupts are enabled
4 When accessing I/O (within system calls)
-The kernel is responsibility for syncing the I/O
-Every system call defines its own synchronization behavior
-Usually system call synchronization behavior is isolation atomicity

What is synchronization?

Synchronization is the protection of resources shared by two or more processes or threads, ensuring that two or more processes do not try to alter the same data, causing one to overwrite the effects of another. This is called a race condition.

When must we synchronize?

For user level processes, only two resources are shared, memory and I/O devices.

For synchronizing memory, we have two main concerns, multi-threaded processes and signals. Multi-threaded process synchronization is covered in Lecture 8.

What are signals?

Signals are a way for the kernel to asynchronously interrupt a process when an event occurs. Once a signal occurs, the current process stops executing immediately (not during an atomic instruction), and execution is switched to a signal handler. After the signal handler is done executing, the process can begin execution again.

How does this relate to synchronization?

Because signals occur asynchronously, they are vulnerable to race conditions. While the signal handler is running, another signal, could interrupt its execution. One way to solve this shortcoming is just to minimize signal handler code. For example, two possible signal handlers are:

void handle (int sig){
	exit(0);
}
int sig_happened = 0; // global variable
void handle (int sig){
	sig_happened = 1;
}

As you can see, this is not an ideal solution to the race conditions caused by signals. Because we are limited to writing as little code as possible for signal handlers, we cannot write a signal handler to do anything very significant, therefore they lose much of their power. But, signals are just poorly designed in general, and there’s not much we can do about it.

All access to I/O devices by user processes occur through system calls. Therefore it is the kernel’s job to synchronize access to I/O devices.

The kernel synchronizes access to I/O devices by allowing each system call to define its own synchronization behavior. For the most part, system calls implement Isolation Atomicity. For example:

pid_t p;
write(1, “Where am I?\n”, 11); // parent prints this
p = fork();
if (p==0)
    write(1, “Child\n”, 6); // child prints this
else
    write(1, “Parent\n”, 7); // parent prints this

The expected output is one of the following:

Where am I?
Parent
Child
Where am I?
Child
Parent

Without isolation atomicity, the child process and the parent process might execute interleaved, producing an output like:

Where am I?
CPharildent

If we change the write statements to printf’s, we lose isolation atomicity. Because printf’s implement buffered I/O, the parent’s buffer might carry over to the child process, producing output like the following:

Where am I?
Where am I?
Child
Parent

The main synchronization issue facing kernel processes is interrupt handling.

What are interrupts?

Much like signals, interrupts allow execution of a kernel process to be asynchronously halted, and the execution switched to an interrupt handler. A common technique to prevent race conditions during execution of an interrupt handler is to turn off interrupts while that handler is executing. For example, an interrupt handler might look like the following:

  1. Turn off interrupts
  2. Access Data
  3. Turn on interrupts

Instruction number 1 is a dangerous instruction, however, by executing it, the interrupt handler becomes a critical section, and can execute without fear of synchronization issues.

Another solution is the Big Kernel Lock. This involves acquiring a lock which prevents the kernel from doing anything else while the interrupt handler is executing. This has very bad utilization.

Implementing a pipe (bound buffer)

Consider the following example where we want a process P reads from whatever process Q writes.

The hypothetical situations would be the following:

  1. Process Q is blocked until the process P reads from it.
  2. When process P reads, it copies characters from process Q’s memory into its memory.
  3. Process Q will restart when its characters are copied by process P.

This approach works except that it would provide low utilization due to the fact that process Q is blocked unnecessarily. Process Q could be writing more but it is blocked until process P reads again.

So what does Linux use to avoid this low utilization issue?

 Linux actually makes process Q to write its data into a pipe buffer and let process Q to read the data from the pipe buffer.

But what if process Q does write(1, buffer , 40000000000)?

Linux has set the pipe buffers to limited sizes so they are called bounded buffer.

Pipe buffer works as described below

On the pipe buffer’s write end, a process can copy characters into the pipe buffer the buffer is full. Once the pipe buffer is full, the process that is writing to the pipe buffer’s write end will be blocked until there are spaces available again in the buffer. On the pipe buffer’s read end, a process can copy characters from the pipe buffer until the buffer is empty. Once the pipe buffer is empty, the process that is reading from the pipe buffer’s read end will be blocked until there are characters available to read again.

Circular array implementation

For demonstration purposes, we will show a buffer with max size of 8.

The properties of the circular array buffer are the following:

  1. Tail pointer is pointing to the next empty slot.
  2. Head pointer is pointing to the first filled slot.
  3. Pointers move around clock-wise
  4. When the head is equal to the tail, the buffer is empty
  5. When the head – head = 8, the buffer is full

struct pipebuf{
	char buf [8];
	unsigned head;
	unsigned tail;
	mutex_t lock;
        };

writec(pipebuf_t *p, char c){
        p->buf[p->tail%8] = c;
        p->tail++;
        }

char readc(pipebuf_t *p){
        int c = p->buf[p->head%8];
        p->head++;
        return c;
        }

The codes above is the actual implement of this bounded buffer, and we will talk about different synchronization issues when writing and reading from this buffer array next time.

 
2007spring/notes/lec9.txt · Last modified: 2007/09/28 00:28 (external edit)
 
Recent changes RSS feed Driven by DokuWiki