You are expected to understand this. CS 111 Operating Systems Principles, Fall 2006
You are here: CS111: [[2006fall:notes:lec12]]
 
 
 

edited by Calvin You, Alexandre Serriere, and Binh An

Virtual memory II & Performance I

Recall from the previous lecture that if a page hasn't changed at all, then there is no need to do any swapping or demand paging. However, what happens if the page does change?

Demand Paging and Dirty-Only Swapping

First let's see what a page fault is: According to the book, a page fault, or missing-page exception, is the event when an addressed page isn't present in the primary device and the virtual memory manager has to move the page in from a secondary device.

Now let's look at two functions from the previous lecture:

	swapmap(proc, va) {
		either: return disk address
		or: return FAIL
	} /*Code copied from previous scribe note*/

Basically, this function just tells us whether or not the process (pid) has a page with the virtual address (va) swapped out to disk.

	pfault(va, cpl, a type)
	{
		if(swapmap(current, va) != FAIL) {
			(process, va) = removal policy();
			pa = process->pmap(va);
			write process->va to disk;
			then set process->pmap(vm) = FAULT;
			...
			Read swapmap(current, va) into pa;
			...
			Set current->pmap(va) = pa;
			...
			}
	}

There is a slight performance problem with this implementation though; it will always be writing to a disk whenever pfault is called, even when there aren't any changes made to the page! So what we want to do here is implement demand paging.

-- When we start a program

  • Read data from disk

-- To swap out a page

  • Write page to swap space

Note : Swapmap() can point to any region of disk ( not just swap space )

A demand paging system is basically a system that moves pages to the primary device only after the application attempts to use them; or in other words, if you don't use it, don't load it. Here's a good definition provided by wiki. This will make the performance of pfault() much better because it reduces the number of swaps and saves some loading expenses. Here's how we can do this:

First we will start the program with demand paging by:

  • Setting the current->pmap(va) to FAULT.

Then for each va in the programs text and data:

  • swapmap(current, va) = the location of that data on the disk.

After starting the program with demand paging, we will change pfault() to work with demand paging:

	pfault(va, cpl, atype)
	{
		if(atype == WRITE && current->pmap(va) != FAULT && va access is allowed){
			if(swapmap(current, va) != FAIL) {
				(process, va) = removal policy();
				pa = process->pmap(va);
				if(process->pmap(va) == READ/WRITE)
					write process->va to disk;
				then set process->pmap(vm) = FAULT;
				...
				Read swapmap(current, va) into pa;
				...
				Set current->pmap(va) = READ_ONLY;
				...
			}
		}
	}

As we can see, performance is improved: a swapmap() will only occur if va access is allowed and a change has been made to the page.

We can see the numerical relationship between using and not using demand paging to give us a better insight on when to use demand paging. Let's say that:

  • N : # of pages in the program.
  • C : cost of reading a page for a disk.
  • U : # of program pages used.
  • F : cost of the page fault.

If we were to:

  • start a program without demand paging, then the total cost = NC
    • Latency to first instruction without demand = NC
  • start a program with demand paging, then the total cost = U(C+F)
    • Latency to first instruction with demand = C+F

[So if U = N, then starting a program without demand paging would be better than starting with demand paging.]
[But, even if U ~ N , using demand paging is still better since we can save load time !!]

Now that we know what demand paging is, let's take a look at what the Dirtybit is. We'll use an example to understand this. One good example given in class was the Paperclip guy in MS Word. We don't want to keep writing the data of the Paperclip to disk unless the data on the page has changed. So how can we make sure that we don't waste valuable resources on this little Paperclip guy? Well, we can use the Dirtbit to help us track WRITEs to the page. Since we can track the WRITEs, we now know when the data on the page has changed. So now we know when to write the data to the disk!

Copy On Write Fork()

So far we've been talking about having multiple pages in a program, but what happens if a page is shared between processes such as when a process is being fork()ed? To understand what is going on when pages are shared, let's look at the memory mapped files and the read function:

Memory Mapped Files

When a read function is called, it will first look up the file structure for a file descriptor, then read into buffer cache the file data corresponding to the offsets starting at file->f_pos to file->f_pos + the size of the data, and finally call memcpy(buf, buffer_cache, sz). However, since we are dealing with both virtual and physical addresses, we would want to make the read function more flexible. Therefore, instead of using memcpy(buf, buffer_cache, sz), we would use the mmap function. Note : memcpy() is expensive for big size whereas providing mapping is cheap !!

Now that we know what's going on with a page and its process, we can look into what's happening when a page is shared between processes, such as when a process is forked. First of all, what happens when we fork? Well, basically it copies the parents process into the child:

	fork(current)
	{
		child = new process descriptor;
		for each page va in current->pmap {
			pa = current->pmap(va);
			set the current->pmap(va) = pa (READ_ONLY); // dirty BIT doesn't help here :-X
			set child->pmap(va) = pa (READ_ONLY);
		}
	}

Now that we have a page with two processes, we need to modify pfault() again so that we don't violate the isolation of the two processes:

	pfault(va, cpl, atype) {
		if(atype == WRITE && the page is shared between processes) {
			pa = allocate page;
			copy the data from current->pmap(va) to pa;
			then set the current->pmap(va) = pa (READ/WRITE); // Copy on WRITE
		}
	}

The above implementations is called Copy On Write, since it will copy the data on write! Now let's look at the numerical relationship like we did for demand paging:

  • N : # of pages in process
  • C : cost to copy
  • U : # of pages written
  • F : cost of fault

If we were to:

  • do an eager copy on fork, the total cost is NC
  • do a lazy copy on write fork, the total cost is W(C+F)

Note: If # of pages in process > # of pages written, then the lazy copy on write fork is preferred.

Case : Clever Copy on WRITE

Symantec tries to get infected
- People who open email viruses
- Unprotected machines on Internet waiting to get infected ( Honey Pots )

Virtual Machines : improve utilization by running multiple OS per machine.

Performance Metrics

From the very beginning of this class, we have been concerned with performance and modularity in our program, but the only way we could ever check for these was just by eye-balling our code. Now we can use numerical values to help us determine where we need to make improvements.

There are 3 types of performance metrics:

  • Latency : delay between input request and output of response (time)
  • Throughput : rate of request completion (1/time)
  • Utilization : fraction of capacity spent doing useful work (percent)

Let's do an example to help us understand these performance metrics a little better:

Read 40 bytes from a disk:

	while(1) {
		char buf[40];
		read 40 bytes from disk into buf;
		compute(buf);
	}

Given a 1 Ghz CPU, the costs associated with each of these instructions are:

  • 1 cycle = 10ns
  • PIO = 1000 cycle = 1us
  • disk latency = 50us
  • computing on buffer = 5us
  • send command = 5 PIO
  • wait for completion = 50us
  • read result = 40 PIO

And here are the calculations:

  • Latency : (send command) + (disk latency) + (read disk data) + (compute) = 5us + 50us + 40us + 5us = 100us
  • Throughput : 1/Latency = 10,000 requests/second ( Requests are serial )
  • Utilization : 5us/100us = 5%

One thing to note here is that the utilization is quite low. We want to improve the performance by increasing utilization and decreasing latency time. One method of doing this is to use Batching, which means running several requests at once to avoid paying per-request overhead. Here's the new code:

	while(1) { #define BATCHSZ 21
		char buf[BATCHSZ * 40];
		read 21 requests worth of data (840 bytes) into buffer;
		for(i=0; i<BATCHSZ; ++i)
			compute(&buf[i*40]);
	}

Now let's take a look at those performance metrics again:

  • Latency : (send command) + (disk latency) + (read disk data for all requests) + (compute all requests) = 5us + 50us + (21*40)us + (21*5)us = 1000us (for 21 requests)
    • Average latency is about 950us
  • Throughput : 21 requests/1000us = 21000 requests/second
  • Utilization : 105us/1000us = 10.5%

Utilization went up! That's good! But latency went up to 950us! That's not! So what can we do here to keep utilization up and latency down? Well, we could just 'hide the latency'. We can do this by overlapping it with other requests (overlapping computation with I/O). Here's the new new code:

	send command for first 40 bytes;
	while(1) {
		send command for next 40 bytes;
		wait for disk to become ready;
		read 40 bytes into buffer;
		compute(buffer);
	}

And one more time, let's take a look at those performance metrics:

  • Latency : (send command) + (disk latency) + (read disk data) + (compute) = 5us + 50us + 40us + 5us = 100us
  • Throughput : 1/95us = 10500 requests/second (device has buffered commands, so no need to send command again)
  • Utilization : 5us/95us = 5.8%

Utilization up, Latency down. Nice! ;-);-)

 
2006fall/notes/lec12.txt · Last modified: 2007/09/28 00:25 (external edit)
 
Recent changes RSS feed Driven by DokuWiki