Authors: Chung Ming Hung, Edward Lien, Yan Liu
Last time we saw that allocating virtual memory in one block led to external fragmentation, so instead, we allocated virtual memory in fixed size blocks to fix this issue.
As an example, the x86 architecture uses a memory page size of 4KB (212 bytes), and the hierarchy is laid out as a 2-level tree.

Note: The address of the page directory is stored in the %cr3 register. The address of each of the page tables are stored in the page directory. The addresses of the individual pages are stored in the page tables.

Note: The virtual address memory is what the process sees as the usable memory space. It is mapped to sections of the physical address memory, which is what actually exists.

Note: Each virtual address contains a 10-bit page directory index, a 10-bit page table index, and a 12-bit offset.
Q: How do we know where the page directory is located?
A: We need to start somewhere, so we store the physical address in the %cr3 register. It is similar to the bootstrapping process for the OS.
Q: How much overhead is a full page table?
A: A full page table has 4MB + 4KB of overhead. The page directory requires 4KB, and it can store 1024 (4KB / 4B) page tables, which each use 4KB of space. So 1024 * 4KB = 4MB.
IA-64 machines use a 4-level page table. This would lead to a lot of overhead. To fix this, the page directory is allowed to contain blank entries to save space (as seen below). The minimum meaningful page table size for IA-32 machines is 8KB.
Q: What happens if a process tries to access memory that doesn't exist?
A: This causes a page fault.
if (addr_allowed(va, atype, cpl)) use phys.addr.pmap(floor(va / PGSIZE) * PGSIZE) + va % PGSIZE; // translation to physical address else process raises exception : PAGE FAULT; // raise exception (kernel)
Note: The code above is checking to see if the access is allowed. If it is not, then it causes a page fault. A page fault is the same as the segmentation violations that we may have experienced in our labs. In the older versions of Windows, this resulted in the Blue Screen of Death (BSOD). Having page faults allows processes to return a message before they die, so that the programmer or user will know what happened.
Processes are not allowed to alter the %cr3 register, the page table memory, the kernel's memory, or other processes' memory. All of these are protected.
To recap, utilization is the amount (or percent) of a machine's resources that are being used. We discussed utilitzation in terms of time before, now let's talk about utilization in terms of memory usage.
Q: Can we get higher utilization with virtual memory than without?
A: A running process only uses a fraction of its allocated memory.
Some examples of unused memory might be:
A single process is likely to has low memory utilization.
Q: How can we make use of the unused memory?
A: We can move the unused portion onto the disk. This turns memory into a cache. The term for this is paging or swapping. A couple examples of this in real OS's would be the Linux swap partition, or the Windows page file.
Improve system's utilization by lending memory to other processes.
Physical memory becomes a cache for a portion of the disk.
SWAPPING/PAGING
Idea: The kernel maintains a swap map that says whether a process's memory pages are stored on disk.
To move a memory page onto disk:
On page fault:
pfault(va, atype, cpl) { if (current->swapmap(va) exists) { (p, pva) = eviction_policy(); p->state = BLOCKED; disk_addr = find_free_disk_page(); ppa = p->pmap(pva); write phys page ppa to disk @ disk_addr; p->addr_allowed(va, *, *) = FALSE; read disk @ current->swapmap(va) into ppa; current->pmap(va) = ppa; current->addr_allowed(va, *, *) = TRUE; mark current->swapmap(va) as free; p->swapmap(pva) = disk_addr; p->state = RUNNABLE; resume current; } }
Thrashing: A large fraction of memory access cause a swap. This leads to performance crashes.
Reference String: A list of page accesses.
Note: Here, we have the first in, first out (FIFO) eviction algorithm using 3 physical pages of memory. This algorithm states that the page that was swapped in furthest in the page is the first page to be swapped out. In this particular scenario, we have 9 swaps.
| Process | A | B | A | C | A | B | C | |||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Reference String | -- 1 -- | -- 2 -- | -- 3 -- | -- 4 -- | -- 1 -- | -- 2 -- | -- 5 -- | -- 1 -- | -- 2 -- | -- 3 -- | -- 4 -- | -- 5 -- |
| Physical Page 1 | [1] | 1 | 1 | [4] | 4 | 4 | [5] | 5 | 5 | 5 | 5 | 5 |
| Physical Page 2 | [2] | 2 | 2 | [1] | 1 | 1 | 1 | 1 | [3] | 3 | 3 | |
| Physical Page 3 | [3] | 3 | 3 | [2] | 2 | 2 | 2 | 2 | [4] | 4 | ||
- Evict the page that will be accessed furthest into the future. (optimal eviction policy)
Note: The problem with this algorithm is that we cannot know which pages will be accessed at what time. This algorithm uses 7 swaps, which is the lowest possible number of swaps.
| Reference String | -- 1 -- | -- 2 -- | -- 3 -- | -- 4 -- | -- 1 -- | -- 2 -- | -- 5 -- | -- 1 -- | -- 2 -- | -- 3 -- | -- 4 -- | -- 5 -- |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Physical Page 1 | [1] | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | [3] | 3 | 3 |
| Physical Page 2 | [2] | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | [4] | 4 | |
| Physical Page 3 | [3] | [4] | 4 | 4 | [5] | 5 | 5 | 5 | 5 | 5 |
- Evict page loaded furthest in the past.
Note: Here, we use 10 swaps. Notice that this result is even worse than when we only had 3 physical pages of memory. In this particular case, increasing the memory yields a worse result.
| Reference String | -- 1 -- | -- 2 -- | -- 3 -- | -- 4 -- | -- 1 -- | -- 2 -- | -- 5 -- | -- 1 -- | -- 2 -- | -- 3 -- | -- 4 -- | -- 5 -- |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Physical Page 1 | [1] | 1 | 1 | 1 | 1 | 1 | [5] | 5 | 5 | 5 | [4] | 4 |
| Physical Page 2 | [2] | 2 | 2 | 2 | 2 | 2 | [1] | 1 | 1 | 1 | [5] | |
| Physical Page 3 | [3] | 3 | 3 | 3 | 3 | 3 | [2] | 2 | 2 | 2 | ||
| Physical Page 4 | [4] | 4 | 4 | 4 | 4 | 4 | [3] | 3 | 3 |
Belady's Anomaly:
- Some eviction algorithms (like FIFO) don't always improve performance given more memory. This can be seen when comparing the results of using the FIFO eviction algorithm on the specified reference string, with 3 versus 4 pages.
- Evict page accessed furthest in the past.
Note: The LRU eviction algorithm does not suffer from Belady's Anomaly. This one has 8 swaps.
| Reference String | -- 1 -- | -- 2 -- | -- 3 -- | -- 4 -- | -- 1 -- | -- 2 -- | -- 5 -- | -- 1 -- | -- 2 -- | -- 3 -- | -- 4 -- | -- 5 -- |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Physical Page 1 | [1] | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | [5] |
| Physical Page 2 | [2] | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | |
| Physical Page 3 | [3] | 3 | 3 | 3 | [5] | 5 | 5 | 5 | [4] | 4 | ||
| Physical Page 4 | [4] | 4 | 4 | 4 | 4 | 4 | [3] | 3 | 3 |
Software Fix:
If page in memory, Move page to front of LRU list Mark page accessible return else, swap ....
Hardware Fix:
Processor sets "accessed" bit in every accessed page table entry.

Note: In the above image, each page table entry contains a 12-bit set of flags, one of which is the accessed bit. At the granularity of a timer interrupt, this method places the accessed page table at the top of a list.
When we run a process, large portions of the binary code are never accessed.

Note: Recalling the old example of the sort program, we can see in the above image that if we only need to sort a single line, the majority of the code is never used. This has lots of latency because we load the entire binary file just to sort a single line.
The above example shows us the problem with our current setup. Running large programs can be very slow, even if we only need to use a small portion of the code. Demand paging solves this by having the OS load pages only when they are requested. This reduces the latency issue that results from loading entire programs into memory. The following shows us how this would be implemented.
When we start a process: - Empty pmap - Init swapmap to point to binary code on disk The OS loads pages as they are accessed.
Note: Prefetching makes all of this easier.
We can tweak atype, so that the process can't modify its own code.
if(addr_allowed(va, atype, cpl)) // atype: read-only memory

Note: We add the W flag to prevent the process from modifying its own code.
Demand paging: Loads disk files directly into memory for process binaries (code).
With Demand Paging, files that are on disk are loaded into a buffer cache at the kernel level, and then copied into the process that requested the file. If there are more than one process requesting the same file, then multiple copies of that file is made. This mechanism will clog the memory with duplications of a single file, which is not a very good thing. A solution to this problem is memory Mapped I/O.
Memory-mapped I/O: Maps disk files directly into process memory for ANY files.
With Memory Mapped I/O, the file is still loaded onto the buffer cache in the kernel, but each process that is requesting the file will have a map that is pointing to the file in the buffer cache. This way, only one copy of the same file exists, which will save space, and improve memory utilization.
Pros/Cons:
+ Fewer copies of data in memory
- Interface is harder to use (page alignment issues (4096x))
Example: This is useful for shared C libraries because then it is not necessary to copy the entire library into multiple processes' memory.
In this lecture, we began by reviewing concepts about virtual memory that was introduced to us in the earlier lectures, and the problem with virtual memory utilization was brought to our attention. With poor memory utilization, a large portion of the memory is wasted storing information that is not currently needed. One way to make use of that memory is by swapping/paging. The following are different methods used to implement swapping/paging algorithms.
Here are the key terms that were introduced to us in this lecture: