edited by Calvin You, Alexandre Serriere, and Binh An
Recall from the previous lecture that if a page hasn't changed at all, then there is no need to do any swapping or demand paging. However, what happens if the page does change?
First let's see what a page fault is: According to the book, a page fault, or missing-page exception, is the event when an addressed page isn't present in the primary device and the virtual memory manager has to move the page in from a secondary device.
Now let's look at two functions from the previous lecture:
swapmap(proc, va) { either: return disk address or: return FAIL } /*Code copied from previous scribe note*/
Basically, this function just tells us whether or not the process (pid) has a page with the virtual address (va) swapped out to disk.
pfault(va, cpl, a type) { if(swapmap(current, va) != FAIL) { (process, va) = removal policy(); pa = process->pmap(va); write process->va to disk; then set process->pmap(vm) = FAULT; ... Read swapmap(current, va) into pa; ... Set current->pmap(va) = pa; ... } }
There is a slight performance problem with this implementation though; it will always be writing to a disk whenever pfault is called, even when there aren't any changes made to the page! So what we want to do here is implement demand paging.
-- When we start a program
-- To swap out a page
Note : Swapmap() can point to any region of disk ( not just swap space )
A demand paging system is basically a system that moves pages to the primary device only after the application attempts to use them; or in other words, if you don't use it, don't load it. Here's a good definition provided by wiki. This will make the performance of pfault() much better because it reduces the number of swaps and saves some loading expenses. Here's how we can do this:
First we will start the program with demand paging by:
Then for each va in the programs text and data:
After starting the program with demand paging, we will change pfault() to work with demand paging:
pfault(va, cpl, atype) { if(atype == WRITE && current->pmap(va) != FAULT && va access is allowed){ if(swapmap(current, va) != FAIL) { (process, va) = removal policy(); pa = process->pmap(va); if(process->pmap(va) == READ/WRITE) write process->va to disk; then set process->pmap(vm) = FAULT; ... Read swapmap(current, va) into pa; ... Set current->pmap(va) = READ_ONLY; ... } } }
As we can see, performance is improved: a swapmap() will only occur if va access is allowed and a change has been made to the page.
We can see the numerical relationship between using and not using demand paging to give us a better insight on when to use demand paging. Let's say that:
If we were to:
[So if U = N, then starting a program without demand paging would be better than starting with demand paging.]
[But, even if U ~ N , using demand paging is still better since we can save load time !!]
Now that we know what demand paging is, let's take a look at what the Dirtybit is. We'll use an example to understand this. One good example given in class was the Paperclip guy in MS Word. We don't want to keep writing the data of the Paperclip to disk unless the data on the page has changed. So how can we make sure that we don't waste valuable resources on this little Paperclip guy? Well, we can use the Dirtbit to help us track WRITEs to the page. Since we can track the WRITEs, we now know when the data on the page has changed. So now we know when to write the data to the disk!
So far we've been talking about having multiple pages in a program, but what happens if a page is shared between processes such as when a process is being fork()ed? To understand what is going on when pages are shared, let's look at the memory mapped files and the read function:
When a read function is called, it will first look up the file structure for a file descriptor, then read into buffer cache the file data corresponding to the offsets starting at file->f_pos to file->f_pos + the size of the data, and finally call memcpy(buf, buffer_cache, sz). However, since we are dealing with both virtual and physical addresses, we would want to make the read function more flexible. Therefore, instead of using memcpy(buf, buffer_cache, sz), we would use the mmap function. Note : memcpy() is expensive for big size whereas providing mapping is cheap !!
Now that we know what's going on with a page and its process, we can look into what's happening when a page is shared between processes, such as when a process is forked. First of all, what happens when we fork? Well, basically it copies the parents process into the child:
fork(current) { child = new process descriptor; for each page va in current->pmap { pa = current->pmap(va); set the current->pmap(va) = pa (READ_ONLY); // dirty BIT doesn't help here :-X set child->pmap(va) = pa (READ_ONLY); } }
Now that we have a page with two processes, we need to modify pfault() again so that we don't violate the isolation of the two processes:
pfault(va, cpl, atype) { if(atype == WRITE && the page is shared between processes) { pa = allocate page; copy the data from current->pmap(va) to pa; then set the current->pmap(va) = pa (READ/WRITE); // Copy on WRITE } }
The above implementations is called Copy On Write, since it will copy the data on write! Now let's look at the numerical relationship like we did for demand paging:
If we were to:
Note: If # of pages in process > # of pages written, then the lazy copy on write fork is preferred.
Symantec tries to get infected
- People who open email viruses
- Unprotected machines on Internet waiting to get infected ( Honey Pots )
Virtual Machines : improve utilization by running multiple OS per machine.
From the very beginning of this class, we have been concerned with performance and modularity in our program, but the only way we could ever check for these was just by eye-balling our code. Now we can use numerical values to help us determine where we need to make improvements.
There are 3 types of performance metrics:
Let's do an example to help us understand these performance metrics a little better:
Read 40 bytes from a disk:
while(1) { char buf[40]; read 40 bytes from disk into buf; compute(buf); }
Given a 1 Ghz CPU, the costs associated with each of these instructions are:
And here are the calculations:
One thing to note here is that the utilization is quite low. We want to improve the performance by increasing utilization and decreasing latency time. One method of doing this is to use Batching, which means running several requests at once to avoid paying per-request overhead. Here's the new code:
while(1) { #define BATCHSZ 21 char buf[BATCHSZ * 40]; read 21 requests worth of data (840 bytes) into buffer; for(i=0; i<BATCHSZ; ++i) compute(&buf[i*40]); }
Now let's take a look at those performance metrics again:
Utilization went up! That's good! But latency went up to 950us! That's not! So what can we do here to keep utilization up and latency down? Well, we could just 'hide the latency'. We can do this by overlapping it with other requests (overlapping computation with I/O). Here's the new new code:
send command for first 40 bytes; while(1) { send command for next 40 bytes; wait for disk to become ready; read 40 bytes into buffer; compute(buffer); }
And one more time, let's take a look at those performance metrics:
Utilization up, Latency down. Nice! ;-);-)