By: Jeff Moguillansky, Jonathan Hosenpud, George Gov
Paged Virtual Memory (Review)
The address space is divided into fixed-size units called "pages".
PS (page size) = 4096 bytes (4 KB) on the x86.
The memory management unit (MMU) translates addresses from virtual addresses (per process address space) to physical addresses (machine addresses), using a BINDING / PAGE MAP function β(va) = pa.
The function β has the following properties:
For all offset, 0 <= offset < PS, β(n*PS + offset) = β(n*PS) + offset.
That is, everything within a page has the same binding; only the offset is different.
How is the Binding Function (β) implemented?
Through hardware. The software can’t do it without the hardware because it needs to follow the hardware’s definition of binding function.
x86 Paging:
2–level page table example:
(5–bit addresses)
The page table address must be a physical address. If it were virtual, it would go into a loop trying to determine the physical address.
Name of page directory address register: %cr3. To change page directory address, just change %cr3 (via the lcr3 instruction).
Reasons for Virtual Memory
Why did we introduce virtual memory?
Virtual memory provides allocation flexibility and prevents external fragmentation in the OS.
So now, a process can use an entire address space without causing any external fragmentation.
Other benefits of virtual memory:
Isolation
Goal: To separate processes from each other & from the kernel.
To get isolation, we give different processes different β functions (one β per process). Make sure that process A’s physical memory is only accessible via β A.
Example:
A’s VA (virtual address) at address 0x80000 maps to PA (physical address) x, while B’s VA at address 0x80000 maps to PA y.
In order to provide isolation for each process, we must mark the rest of the process's address space as unusable (denoted by the red X marks in the following diagram).
Accessing a page in a process's VA that is math unusable will result in a segmentation fault (the actual name for the interrupt is a page fault).
Now we have a β(va) that returns either:
If missing page fault occurs:
o The process requesting missing page is suspended o Memory manager locates missing page from secondary memory o Page is loaded into primary memory (causing another page to become unloaded) o Page table’s state is updated to display the newly loaded page o The suspended process is resumed
Another advantage of marking portions of the VA space as inaccessible is reduction of the amount of memory needed for the β function (you can mark portions of the directory as unused).
Maximum amount of memory (on x86) for a single β function: each page directory and page table is 4 KB large, therefore the maximum β size = 4 KB + 210*4KB = ~4 MB.
However, if they can be marked unused, we can reduce the size.
How do you change β when we run a new process?
Every time we run a new process, change script β using lcr3 instruction.
Requirements:
β is extended in the following way on the x86:
β(va,CPL) is allowed to give a different return code for different protection levels.
Note that the kernel space is the same for every process. However, it is locked from the process, so accessing it will return a page fault.
When there is a system call, the CPL changes.
Extending Memory:
Goal: make the computer appear to have more memory than it actually does.
Memory (RAM) comes at about 8 ¢ / MB. Hard disk is about 0.04 ¢ / MB.
Let’s say we have a machine with memory size 1 GB, but we want to use Microsoft Word (1 GB), and Firefox (1 GB).
How do we run both programs simultaneously?
Use the disk to store some portions of the “virtual” memory size! We have on the hard drive “swap space” – space used on the hard drive for extra memory. This is PAGED VIRTUAL MEMORY (the physical memory acts like a cache, and certain portions of memory will be paged out, or swapped out, to the disk when they’re not being used. When a process needs a page, it will swap it in from disk to primary memory.
What do we swap? (What data should be written to disk?)
It takes roughly 6 million cycles to load a page from disk (ouch!), compared to hundreds or thousands to make a system call. Inside the kernel there’s a routine called PF(va, cpl) (a page fault handler), that tells you the process that caused a page fault. The handler used to always be “kill process”. Let’s say that process β is running, and calls malloc to allocate 1 page of memory. However, the physical memory is full. This is a cache miss. The OS should swap out 1 page to disk. If (va has been swapped out)
/* swap function returns disk address for va */ select a page to evict. Write that page to swap space. Mark the page as inaccessible., in script β. Record disk address in swap function. Get VA’s disk address from swap function. Read that page in from disk. We then install a new mapping in script β function and resume the process
Else
Kill process
Selecting the page to evict (what to swap): Thrashing: the machine is mostly swapping, little useful work.
Example: given a sequence of page accesses and memory size, what to swap? Goal: minimize # swaps.
Belady's Optimal Algorithm (BOA):
Swap out the page that will be used furthest in the future. This is optimal (produces minimum number of swaps).
The bold numbers in the following table denote the page that is swapped in.
| Page Accesses | 1 | 2 | 3 | 4 | 1 | 2 | 5 | 1 | 2 | 3 | 4 | 5 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Physical Memory | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 3 | 3 | 3 |
| 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 4 | 4 | ||
| 3 | 4 | 4 | 4 | 5 | 5 | 5 | 5 | 5 | 5 |
BOA: 7 swaps.
First In First Out (FIFO):
Swap out the page that was loaded furthest in the past.
| Page Accesses | 1 | 2 | 3 | 4 | 1 | 2 | 5 | 1 | 2 | 3 | 4 | 5 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Physical Memory | 1 | 1 | 1 | 4 | 4 | 4 | 5 | 5 | 5 | 5 | 5 | 5 |
| 2 | 2 | 2 | 1 | 1 | 1 | 1 | 1 | 3 | 3 | 3 | ||
| 3 | 3 | 3 | 2 | 2 | 2 | 2 | 2 | 4 | 4 |
FIFO: 9 swaps.
For this example though, if you have more memory, you end up swapping more. For example, with 4 pages, you end up swapping 10 times. But with 3 pages of physical memory, you only swap 9 times. This is called Belady’s Anomaly: for some page replacement algorithms, more memory requires more swaps (e.g. FIFO).
In some scenarios, FIFO utterly fails (for example if you’re reading a file, as soon as you read something, you should swap it out).
Stack algorithms: not affected by Belady's Anomaly. Belady’s algorithm is a stack algorithm, and so is LRU (least recently used).
Least Recently Used (LRU):
Swap out the page USED farthest in the past.
| Page Accesses | 1 | 2 | 3 | 4 | 1 | 2 | 5 | 1 | 2 | 3 | 4 | 5 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Physical Memory | 1 | 1 | 1 | 4 | 4 | 4 | 5 | 5 | 5 | 3 | 3 | 3 |
| 2 | 2 | 2 | 1 | 1 | 1 | 1 | 1 | 1 | 4 | 4 | ||
| 3 | 3 | 3 | 2 | 2 | 2 | 2 | 2 | 2 | 5 |
LRU: 9 swaps.
Implementing LRU: Add age function, age(pa) = #(length of time in past addr used). Context switch: erase β (all process memory is inaccessible). Thus, the first thing we do when we address a page is get a page fault.
PF(va, cpl):
If (erased address) {
Age(β(addr)) = 0
Reinstall mapping into B
Return
On every quantum, age(pa)++ for all pa.