====== Lecture 15 Scribe Notes ======
Authors: Chung Ming Hung, Edward Lien, Yan Liu
===== Virtual Memory II =====
Last time we saw that allocating virtual memory in one block led to external fragmentation, so instead, we allocated virtual memory in fixed size blocks to fix this issue.
As an example, the x86 architecture uses a memory page size of 4KB (212 bytes), and the hierarchy is laid out as a 2-level tree.
{{pagetablelayout.jpg|}}\\
Note: The address of the page directory is stored in the %cr3 register. The address of each of the page tables are stored in the page directory. The addresses of the individual pages are stored in the page tables.
{{addressmapping2.jpg|}}\\
Note: The virtual address memory is what the process sees as the usable memory space. It is mapped to sections of the physical address memory, which is what actually exists.
{{virtualaddress.jpg|}}\\
Note: Each virtual address contains a 10-bit page directory index, a 10-bit page table index, and a 12-bit offset.
Q: How do we know where the page directory is located?\\
A: We need to start somewhere, so we store the physical address in the %cr3 register. It is similar to the bootstrapping process for the OS.\\
\\
Q: How much overhead is a full page table?\\
A: A full page table has 4MB + 4KB of overhead. The page directory requires 4KB, and it can store 1024 (4KB / 4B) page tables, which each use 4KB of space. So 1024 * 4KB = 4MB.
IA-64 machines use a 4-level page table. This would lead to a lot of overhead. To fix this, the page directory is allowed to contain blank entries to save space (as seen below). The minimum meaningful page table size for IA-32 machines is 8KB.
{{blankentries.jpg|}}
Q: What happens if a process tries to access memory that doesn't exist?\\
A: This causes a [[http://en.wikipedia.org/wiki/Page_fault|page fault]].
==== Processor's implementation of a memory access ====
* va = virtual address\\
* atype = READ, WRITE\\
* cpl = current privilege level
if (addr_allowed(va, atype, cpl))
use phys.addr.pmap(floor(va / PGSIZE) * PGSIZE) + va % PGSIZE; // translation to physical address
else
process raises exception : PAGE FAULT; // raise exception (kernel)
Note: The code above is checking to see if the access is allowed. If it is not, then it causes a page fault. A page fault is the same as the segmentation violations that we may have experienced in our labs. In the older versions of Windows, this resulted in the Blue Screen of Death (BSOD). Having page faults allows processes to return a message before they die, so that the programmer or user will know what happened.
==== Dangerous Operations ====
* lcr3 (changing %cr3 register)
* changing page table memory
* changing kernel or other processes' memory
Processes are not allowed to alter the %cr3 register, the page table memory, the kernel's memory, or other processes' memory. All of these are protected.
===== Utilization & Virtual Memory =====
To recap, utilization is the amount (or percent) of a machine's resources that are being used. We discussed utilitzation in terms of time before, now let's talk about utilization in terms of memory usage.
Q: Can we get higher utilization with virtual memory than without?\\
A: A running process only uses a fraction of its allocated memory.
Some examples of unused memory might be:
* other languages' messages
* functionality in code not being used
* portions of documents
* etc
A single process is likely to has low memory utilization.
Q: How can we make use of the unused memory?\\
A: We can move the unused portion onto the disk. This turns memory into a cache. The term for this is paging or swapping. A couple examples of this in real OS's would be the Linux swap partition, or the Windows page file.
Improve system's utilization by lending memory to other processes.\\
Physical memory becomes a __cache__ for a portion of the disk.\\
SWAPPING/PAGING
Idea: The kernel maintains a __swap map__ that says whether a process's memory pages are stored on disk.
To move a memory page onto disk:
- Choose disk offset
- Write page
- Mark swap map
- Clear page table entry
On page fault:
- Evict a page
- Read current->swapmap(va) into page
- Fill in page map
- Resume process
pfault(va, atype, cpl) {
if (current->swapmap(va) exists) {
(p, pva) = eviction_policy();
p->state = BLOCKED;
disk_addr = find_free_disk_page();
ppa = p->pmap(pva);
write phys page ppa to disk @ disk_addr;
p->addr_allowed(va, *, *) = FALSE;
read disk @ current->swapmap(va) into ppa;
current->pmap(va) = ppa;
current->addr_allowed(va, *, *) = TRUE;
mark current->swapmap(va) as free;
p->swapmap(pva) = disk_addr;
p->state = RUNNABLE;
resume current;
}
}
===== Eviction Policies =====
**Thrashing:** A large fraction of memory access cause a swap. This leads to performance crashes.\\
**Reference String:** A list of page accesses.
==== FIFO Eviction Algorithm ====
Note: Here, we have the first in, first out (FIFO) eviction algorithm using 3 physical pages of memory. This algorithm states that the page that was swapped in furthest in the page is the first page to be swapped out. In this particular scenario, we have 9 swaps.
^ Process | A ||| B | A || C | A ||| B | C |
^ Reference String ^ -- 1 -- ^ -- 2 -- ^ -- 3 -- ^ -- 4 -- ^ -- 1 -- ^ -- 2 -- ^ -- 5 -- ^ -- 1 -- ^ -- 2 -- ^ -- 3 -- ^ -- 4 -- ^ -- 5 -- ^
^ Physical Page 1 | **[1]** | 1 | 1 | **[4]** | 4 | 4 | **[5]** | 5 | 5 | 5 | 5 | 5 |
^ Physical Page 2 | | **[2]** | 2 | 2 | **[1]** | 1 | 1 | 1 | 1 | **[3]** | 3 | 3 |
^ Physical Page 3 | | | **[3]** | 3 | 3 | **[2]** | 2 | 2 | 2 | 2 | **[4]** | 4 |
==== Belady's Optimal Algorithm ====
- Evict the page that will be accessed furthest into the future. (optimal eviction policy)\\
Note: The problem with this algorithm is that we cannot know which pages will be accessed at what time. This algorithm uses 7 swaps, which is the lowest possible number of swaps.
^ Reference String ^ -- 1 -- ^ -- 2 -- ^ -- 3 -- ^ -- 4 -- ^ -- 1 -- ^ -- 2 -- ^ -- 5 -- ^ -- 1 -- ^ -- 2 -- ^ -- 3 -- ^ -- 4 -- ^ -- 5 -- ^
^ Physical Page 1 | **[1]** | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | **[3]** | 3 | 3 |
^ Physical Page 2 | | **[2]** | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | **[4]** | 4 |
^ Physical Page 3 | | | **[3]** | **[4]** | 4 | 4 | **[5]** | 5 | 5 | 5 | 5 | 5 |
==== FIFO Eviction Algorithm (4 pages) ====
- Evict page loaded furthest in the past.\\
Note: Here, we use 10 swaps. Notice that this result is even worse than when we only had 3 physical pages of memory. In this particular case, increasing the memory yields a worse result.
^ Reference String ^ -- 1 -- ^ -- 2 -- ^ -- 3 -- ^ -- 4 -- ^ -- 1 -- ^ -- 2 -- ^ -- 5 -- ^ -- 1 -- ^ -- 2 -- ^ -- 3 -- ^ -- 4 -- ^ -- 5 -- ^
^ Physical Page 1 | **[1]** | 1 | 1 | 1 | 1 | 1 | **[5]** | 5 | 5 | 5 | **[4]** | 4 |
^ Physical Page 2 | | **[2]** | 2 | 2 | 2 | 2 | 2 | **[1]** | 1 | 1 | 1 | **[5]** |
^ Physical Page 3 | | | **[3]** | 3 | 3 | 3 | 3 | 3 | **[2]** | 2 | 2 | 2 |
^ Physical Page 4 | | | | **[4]** | 4 | 4 | 4 | 4 | 4 | **[3]** | 3 | 3 |
**Belady's Anomaly:**\\
- Some eviction algorithms (like FIFO) don't always improve performance given more memory. This can be seen when comparing the results of using the FIFO eviction algorithm on the specified reference string, with 3 versus 4 pages.
==== Least Recently Used (LRU) Eviction Algorithm ====
- Evict page accessed furthest in the past.\\
Note: The LRU eviction algorithm does not suffer from Belady's Anomaly. This one has 8 swaps.
^ Reference String ^ -- 1 -- ^ -- 2 -- ^ -- 3 -- ^ -- 4 -- ^ -- 1 -- ^ -- 2 -- ^ -- 5 -- ^ -- 1 -- ^ -- 2 -- ^ -- 3 -- ^ -- 4 -- ^ -- 5 -- ^
^ Physical Page 1 | **[1]** | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | **[5]** |
^ Physical Page 2 | | **[2]** | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 |
^ Physical Page 3 | | | **[3]** | 3 | 3 | 3 | **[5]** | 5 | 5 | 5 | **[4]** | 4 |
^ Physical Page 4 | | | | **[4]** | 4 | 4 | 4 | 4 | 4 | **[3]** | 3 | 3 |
==== Tracking page accesses ====
**Software Fix:**
* Start every process with blank address space (during every timer interrupt)
* On page fault,
If page in memory,
Move page to front of LRU list
Mark page accessible
return
else,
swap ....
**Hardware Fix:**\\
Processor sets "accessed" bit in every accessed page table entry.\\
{{accessbit1.jpg|}}\\
Note: In the above image, each page table entry contains a 12-bit set of flags, one of which is the accessed bit. At the granularity of a timer interrupt, this method places the accessed page table at the top of a list.
===== Demand Paging =====
When we run a process, large portions of the binary code are never accessed.\\
{{demandpaging.jpg|}}\\
Note: Recalling the old example of the sort program, we can see in the above image that if we only need to sort a single line, the majority of the code is never used. This has lots of latency because we load the entire binary file just to sort a single line.
The above example shows us the problem with our current setup. Running large programs can be very slow, even if we only need to use a small portion of the code. Demand paging solves this by having the OS load pages only when they are requested. This reduces the latency issue that results from loading entire programs into memory. The following shows us how this would be implemented.
When we start a process:
- Empty pmap
- Init swapmap to point to binary code on disk
The OS loads pages as they are accessed.
Note: Prefetching makes all of this easier.
We can tweak atype, so that the process can't modify its own code.
if(addr_allowed(va, atype, cpl)) // atype: read-only memory
{{accessbit2.jpg|}}\\
Note: We add the W flag to prevent the process from modifying its own code.
===== Memory-Mapped I/O =====
**Demand paging:** Loads disk files directly into memory for process binaries (code).\\
{{memorymapped1.jpg|}}
With Demand Paging, files that are on disk are loaded into a buffer cache at the kernel level, and then copied into the process that requested the file. If there are more than one process requesting the same file, then multiple copies of that file is made. This mechanism will clog the memory with duplications of a single file, which is not a very good thing. A solution to this problem is memory Mapped I/O.
**Memory-mapped I/O:** Maps disk files directly into process memory for ANY files.\\
{{memorymapped4.jpg|}}
With Memory Mapped I/O, the file is still loaded onto the buffer cache in the kernel, but each process that is requesting the file will have a map that is pointing to the file in the buffer cache. This way, only one copy of the same file exists, which will save space, and improve memory utilization.
**Pros/Cons:**\\
+ Fewer copies of data in memory\\
- Interface is harder to use (page alignment issues (4096x))
Example: This is useful for shared C libraries because then it is not necessary to copy the entire library into multiple processes' memory.
{{memorymapped5.jpg|}}
=====Summary=====
In this lecture, we began by reviewing concepts about virtual memory that was introduced to us in the earlier lectures, and the problem with virtual memory utilization was brought to our attention. With poor memory utilization, a large portion of the memory is wasted storing information that is not currently needed. One way to make use of that memory is by swapping/paging. The following are different methods used to implement swapping/paging algorithms.
* __FIFO Eviction Algorithm__ - This algorithm states that the page that was swapped in furthest in the page is the first page to be swapped out. Suffers from Belady's Anomaly.
* __Belady's Optimal Algorithm__ - Evict the page that will be accessed furthest into the future. But it is difficult to know which page will not be used.
* __Least Recently Used (LRU) Eviction Algorithm__ - Evict page accessed furthest in the past.
* __Demand Paging__ - Files are loaded only when requested by the processes. Multiple copies of the same file might be present in user level process memory if more than one process requests the same file.
* __Memory-Mapped I/O__ - Requested file is being mapped to the actual file stored on the buffer cache at kernel level. Harder to implement.
====Key terms from today’s lecture====
Here are the key terms that were introduced to us in this lecture:
* [[lec15# FIFO Eviction Algorithm| FIFO Eviction Algorithm ]]
* [[lec15# Belady's Optimal Algorithm| Belady's Optimal Algorithm]]
* [[lec15# FIFO Eviction Algorithm (4 pages)| Belady's Anomaly]]
* [[lec15#Least Recently Used (LRU) Eviction Algorithm| Least Recently Used (LRU) Eviction Algorithm]]
* [[lec15#Demand Paging| Demand Paging]]
* [[lec15#Memory-Mapped I/O| Memory-Mapped I/O]]