Why have we been studying synchronization for so long? Recall that one main goal of the operating system is to create the process abstraction. This abstraction virtualizes the machine's resources, allowing multiple applications to share the same hardware resources while preserving hard modularity. That way, we can scale up the number of applications we run without necessarily buying one computer per application, and while keeping the effects of application bugs relatively isolated. The three parts of the Von Neumann computer -- I/O devices, processors, and primary memory -- all need to be virtualized to make the process abstraction work. We first spent some time on files, which abstracts a computer's I/O devices. Synchronization is the next required piece! It lets multiple "virtual processors" share a single physical processor (or a couple physical processors, in a symmetric multiprocessing machine) without stepping on each other's toes. We build up high-level synchronization objects from low-level properties like read-write coherence, as indicated in this diagram.
But we want multiple processes to share a single computer's resources efficiently, without significant overhead. (Processes wouldn't be a useful abstraction if you could never run more than one!) So we need a metric that measures how efficiently a computer's resources are being used. That metric is utilization.
Utilization refers to the fraction of a machine's resources being used productively.
W - Amount of resource spent doing useful work
N - Total amount of resource
Utilization = W/N
The "resource" is often time, so that W and N are measured in seconds, but other units are possible too, such as cycles, disk seeks, and so forth. An important component of utilization is the definition of "useful work". This can be different depending on how you look at it. For example, consider processor utilization. From the kernel's point of view, "useful work" is any time spent running on behalf of any application. From a single application's point of view, "useful work" is the time that that application gets to run. (The application doesn't care about other applications.)
Example: If W = 10 units, and N = 10000 units, then Utilization = 1/1000.
Utilization drops when one or more threads/processes are spinning or blocking instead of doing useful work. For example, lock contention reduces utilization, because many threads are waiting for a lock rather than doing work. Spinning is usually worse than blocking for utilization, because as a thread spins (i.e., polls over and over again to see whether a lock is free), it is preventing other threads from doing work; whereas when a thread blocks (i.e., becomes nonrunnable), it allows other threads to make progress.
One way to improve utilization is to allow multiple threads to run in parallel. For example, imagine three processes, P1, P2, and P3, that all access the same shared variable, V. These processes are running on a 2-processor machine. Each process reads the variable 99% of the time, and writes the variable 1% of the time; and the processes do nothing else. If we protect the variable with a conventional mutex, each process will see a utilization of about 33%, because no two processes can run simultaneously. But if we use a read/write lock, two processes, running on the two different processors, can read the variable at the same time. The utilization will be much closer to 66%.
We know now that spinning is bad for utilization. But our existing bounded buffer implementation spins!
typedef struct pipebuf { mutex_t l; char buf[N]; int head; int tail; } pipebuf_t; void writec (pipebuf_t *pb, char c) { while (1) { acquire(pb->l); if (pb->tail – pb->head < N) { pb->buf[pb->tail % N] = c; pb->tail++; release(pb->l); return; } release(pb->l); } } char readc(pipebuf_t *pb) { while (1) { acquire(pb->l); if (pb->head != pb->tail) { char c = pb->buf[pb->head % N]; pb->head++; release(pb->l); return c; } release(pb->l); } }
When the buffer is emtpy, readc will constantly be spinning, waiting for the condition pb->head != pb->tail. If the buffer is full, writec will constantly be spinning, waiting for the condition pb->tail - pb->head < N. This will reduce utilization because the threads that spin take up processor time without doing useful work.
What if we use a blocking mutex in place of a regular mutex?
typedef struct pipebuf { bmutex_t l; char buf[N]; int head; int tail; } pipebuf_t;
In general, blocking mutexes improve utilization by releasing the processor to do more useful work.
However, this code spins even if we use a blocking mutex for pb->l. The problem is that each function spins waiting for a condition. The threads aren't blocking for very long on the acquire(pb->l) operation, since other threads release that mutex quickly after acquiring it. That's not the problem. Threads spin independently of whether the mutex is held, because they are waiting for something to become true of the relationship between pb->head and pb->tail.
So let's Introduce a new function: wait_for(expression). This function is impossible to implement, but we'll see why it's useful, and then see how to change it to an implementable version.
Let us try replacing the if statements inside writec and readc with the wait_for function call.
void writec(pipebuf_t *pb, char c) { acquire(pb->l); wait_for(pb->tail – pb->head < N); pb->buf[pb->tail % N] = c; pb->tail++; release(pb->l); } char readc(pipebuf_t *pb) { acquire(pb->l); wait_for(pb->head != pb->tail); int c = pb->buf[pb->head % N]; pb->head++; release(pb->l); return c; }
Problem with this implementation: We acquire a lock using the acquire function and then block until the expression holds true. This leads to a deadlock because the lock is held while the function is blocked. For example, if one thread blocks in readc, then another thread's later attempt to writec will block while trying to acquire pb->l. No further progress can be made.
Let us try moving the acquire function call after the wait_for function.
void writec(pipebuf_t *pb, char c) { wait_for(pb->tail – pb->head < N); acquire(pb->l); pb->buf[pb->tail % N] = c; pb->tail++; release(pb->l); } char readc(pipebuf_t *pb) { R1 wait_for(pb->head != pb->tail); R2 acquire(pb->l); R3 int c = pb->buf[pb->head % N]; R4 pb->head++; R5 release(pb->l); R6 return c; }
As you can see, the acquire was shifted down after the wait_for function. This will prevent the system from deadlocking like in the previous example.
Problem with this implementation:
This creates a race condition between wait_for and acquire.
For example, assume that two threads try and read a character from an empty pipe. They will both block on line R1. Now, another thread writes a single character to the pipe. Both threads will wake up, because the expression pb->head != pb->tail becomes true. They will both then continue on:
| Thread T1 | Thread T2 | Thread T3 | pb->head | pb->tail |
|
| R1 | R1 | 0 | 0 | T1 and T2 block | |
writec | 0 | 1 | T1 and T2 both wake up | ||
| R2-R6 | 1 | 1 | T1 runs first, reads the character | ||
| R2-R6 | 2! | 1 | T2 runs second, reads a nonexistent character |
Uh oh!
This code, on the other hand, actually works:
void writec(pipebuf_t *pb, char c) { while (1) { wait_for(pb->tail – pb->head < N); acquire(pb->l); if (pb->tail - pb->head < N) { pb->buf[pb->tail % N] = c; pb->tail++; release(pb->l); return; } release(pb->l); } } char readc(pipebuf_t *pb) { while (1) { wait_for(pb->head != pb->tail); acquire(pb->l); if (pb->head != pb->tail) { int c = pb->buf[pb->head % N]; pb->head++; release(pb->l); return c; } release(pb->l); } }
We get around the race condition by checking the wait_for condition again, inside the critical section. But we check it using an if statement, not a wait_for, because wait_for might block and lead to deadlock.
The condition variable synchronization object makes this pattern practical. Waiting on an arbitrary expression, like wait_for does, is really hard to implement correctly. A condition variable represents a specific condition. Threads can block until this condition is true. The downside is that now, threads must manually notify the condition variable when the condition has become true.
Using a condition variable generally involves three parts:
wait_for.The condition variable is used in combination with the mutex.
wait(condvar_t *cv, bmutex_t *l)cond_waitl must have been acquired and the condition expression is falsel, then block until a thread calls notify or broadcast on cvl before returning to callernotify(condvar_t *cv)cond_notifynotify when it thinks it has changed the expression to truecvbroadcast(condvar_t *cv)cond_broadcastcvAn implementation of condition variables is considered fine-grained when there is one condition variable per blocking condition. There are two condition variables for the pipebuf structure: one variable for the blocking condition of an empty buffer and another variable for the blocking condition of a full buffer. In the following example, we have two condition variables labeled: nonfull, nonempty.
struct pipebuf { bmutex_t l; char buf[N]; int head; int tail; condvar_t nonfull, nonempty; }
These two functions will now wait for when the condition variables are true before continuing with their execution. In the writec function, writing a character will advance the tail pointer. When tail - head == N, the buffer is considered full and writec will not be able to write. Writec will block on the wait(&pb->nonfull, &pb->l) function call until a read is performed, which advances the head pointer and notifies that the pb->nonfull variable is true. Once the nonfull variable is set to true, the writec function can perform its write. Thus, the pb->nonfull condition variable corresponds to the condition expression pb->tail - pb->head < N. Similarly, the readc function will block if the buffer is empty. It waits for writec to be called so that the nonempty condition variable is set to true and there will be a character to read in the buffer. The pb->nonempty condition variable corresponds to the condition expression pb->tail != pb->head. Note that we still must check the actual condition expression!
void writec(pipebuf_t *pb, char c) { acquire(&pb->l); while (1) { if (pb->tail - pb->head < N) { pb->buf[pb->tail % N] = c; pb->tail++; notify(&pb->nonempty); release(&pb->l); return; } wait(&pb->nonfull, &pb->l); } } char readc(pipebuf_t *pb) { acquire(&pb->l); while (1) { if (pb->head != pb->tail) { int c = pb->buf[pb->head % N]; pb->head++; notify(&pb->nonfull); release(&pb->l); return c; } wait(&pb->nonempty, &pb->l); } }
During notify, the mutex is held. Wait must acquire the mutex, but there might be another reader in the critical section. Therefore, we must keep waiting inside the while loop.
For more examples, visit this website
An imaginary memory area supported by some operating systems (for example, Windows but not DOS) in conjunction with the hardware. You can think of virtual memory as an alternate set of memory addresses. Programs use these virtual addresses rather than real addresses to store instructions and data. When the program is actually executed, the virtual addresses are converted into real memory addresses.
The purpose of virtual memory is to enlarge the address space, the set of addresses a program can utilize. For example, virtual memory might contain twice as many addresses as main memory. A program using all of virtual memory, therefore, would not be able to fit in main memory all at once. Nevertheless, the computer could execute such a program by copying into main memory those portions of the program needed at any given point during execution.
Definition of virtual memory taken from webopedia.com.
Virtual memory is used to provide isolation between processes. It also prevents processors from hacking each other's allocated memory.
Q: Is it possible to implement virtual memory without processor support?
A: Not nicely.
Suppose we have 512MB memory and two processes, namely P1 and P2. How do we keep P1 from accessing P2's memory?
One method of preventing processes from accessing each others' memories is to implement binary translation. This method changes an instruction that accesses memory into a set of instructions that simulates the memory access to check if it should be allowed.
Example of binary translation:
original instruction: movl x, %eax translated instruction: compl $x, 2^25 jge error movl x, %eax error, kill process
Depending on how it is implemented, the execution time of the instruction is 2x - 40x slower. Lower Utilization!
Are there any other options to protect process memory?
Idea: A process has access to a set of domains, which are contiguous regions of memory (low/high). Processor faults on memory access outside valid domain
movl x, %eax
if (x >= low && x+4 <= high)
movl x, %eax
else
fault;
(*) x+4 because movl is 4 bits
(or isolation would fail)
(^ Processor allows these only when current protection level is KERNEL)
so what can the processor do? let's create a new domain.
fork() - how much memory does the domain need?
unfortunately this creates an allocation problem.
Suppose we have 1024 MB memory, 512 MB of it reserved for Kernel. P1 needs 128 MB, P2 needs 256 MB and P3 needs 128 MB. So the memory map looks something like this:
Suppose P1 and P3 exits, creating 256 MB free memory.
Another process called P4 enters and it needs 256MB. Perfect! but not really. Technically we have the space to accomodate P4, it turns out that P4 can't start, because there's no contiguous 256MB space! The free space created by freeing P1 and P3 is separated by P2's allocation. This is called External Fragmentation.
space available that can't be allocated
Recall our example problem: process P4 wants contiguous memory, but there is external fragmentation.
SO Hardware desginers introduce a map. Processes' address are different from physical address Map function virtual address => physical address.
Simplify address map by working with blocks of memory, PAGESIZE bytes long (x86 pagesize = 212 = 4KB)
addrmap (va) = pagemap( floor ( va/ PAGESIZE)) * PAGESIZE + (va mod PAGESIZE);