Topics:
An Address Space maps a name (an address or a pointer value) to a memory slot holding a byte of data. The address space size is determined by the length of the address. For example, in a 32-bit machine, there exist 2^32 = 4GB of addressable memory. Address spaces are usually much larger than the amount of actual (physical) memory in the machine.
Physical address space is the address space defined by the memory controller, which connects the bus to the physical memory. The diagram below shows address 0 mapped onto the physical memory.
A process's address space contains the following:
code/text
data/global variables
stack
heap
(kernel)
The data and global variables are divided into two sections: data and BSS. Data is the portion of variables and data that are non-zero on startup. BSS (Block Started by Symbol) are the ones that are zero on startup.
Segments are contiguous portions of the address space. Of the above, the text, stack, heap, data, and BSS are segments.
When we attempt to run an executable (in this example, threados-app), a loader is run to take the various segments off a disk file (for example, threados-app.c) and place them into memory. The result is something like the below:
In weensyOS, each of the processes was given an assigned addressed space to place its code, data, heap and stack. The memory ended up looking something like this:
There are several problems with this interface:
The Solution: Dynamic Loading (to be discussed later).
Imagine a single process. If you can't, here's a picture:
Make the following assumptions:
In this strategy, we use a free-page bitmap. Every bit in the bitmap represents one 4096 byte block in the heap. If the bit is 0, the block has been allocated. If the bit is 1, then it is free.
In this case, the free page bitmap is 1024 bits wide: 4MB = 4096KB => 4096KB/4096B = 1024.
The following are pseudocode for malloc and free.
malloc(s)
free(p)
Problems NOT handled by this method:
This method also has a performance issue: malloc is O(N) because of the need to search for a free bit, but free is O(1).
How can we achieve O(1) for malloc? Let's use a freelist!
A free list is a linked list of freeblock structures.
struct freeblock { struct freeblock *next; char garbage[4092]; }
In the data portion of the processes, we store a pointer which points to the first freeblock structure. The *next pointer of the structure points to the next free block.
With this in mind, we can update the malloc and free algorithms.
malloc(s)
if (freelist == NULL) return NULL; p = freelist; freelist = p->next; return p;
free(p)
p->next = freelist; freelist = p;
This algorithm is good if blocks are all the same size. There are no constraints on the ordering of the blocks, since all blocks are identical. For example, given that we have the following situation:
p1 = alloc p2 = alloc p3 = alloc free (p2) p4 = alloc free (p1)
After all the allocations and frees, any new allocations for blocks with size up to 4096 bytes can happen in any free block.
What if we only need to allocate 1 byte of memory many times?
Internal fragmentation is the loss of portions of memory due to allocations of memory size N when only k(< N) is needed. "Internal fragmentation" literally means that there is wasted space within allocated blocks.
In our example, the maximum internal fragmentation is 4095 KB (when we only use 1 byte in each of 1024 blocks) and the minimum is 0 (when everything is used).
We can (attempt to) solve this problem by making variable sized allocations.
Variable Size Allocations is, as you might imagine, allowing the allocation of memory of variable sizes. In order to allow this, we have to change how we allocate memory to the user: Instead of having several smaller blocks to allocate, instead we start with one big block of memory and split it as necessary to make smaller blocks.
To do this, we change our definition of a freeblock to include a size variable.
struct freeblock { struct freeblock *next; size_t size; };
Note that we need to reserve 8 bytes for this in each block for this.
And we also change how malloc and free work.
p = malloc(s)
- search the free list for a block with at least size s
- If s < block->size, split the block
- remove the block from the list
- return the address
free(p)
- Add block to the free list.
Problems:
In this method, instead of giving the whole block to the user, we use a small trick to keep the size and pointers in the block, while still giving the allocated amount of memory. Basically, what we will be doing is allocating 8 more bytes than necessary for the block and keeping the pointer and size in the first 8 bytes. First we update our freeblock structure a little:
struct freeblock { struct freeblock *next; size_t size; char data[]; };
Then we change the malloc and free algorithms a little:
malloc(s)
free(p)
The more problematic problem is external fragmentation.
External fragmentation is wasted space between allocations.
The following example illustrates external fragmentation. If we go through the following steps:
p = alloc(2 MB)
alloc(1 KB)
free(p)
alloc(3 MB)
As shown, we cannot allocate the 3 MB chunk contiguously due to the 1 KB block in the middle. However, it is possible to divide the 3 MB segment into chunks.
The solution to external fragmentation is compaction, which is the shifting of allocated space to reduce external fragmentation. The problem with this is that it is very bug prone and expensive.
Virtual Memory refers to the ability to map physical memory locations to virtual addresses that are accessible only by the process that owns the memory and the kernel.
The idea of virtual memory was born when people started to run more than one process at time on their computers. Before then, a process could access any memory address it needed. Since it was the only process running, it would never run into any problems. With multiple processes running, processes needed to be protected from others, or else processes would just run all over each other.
This idea has had a couple of solutions throughout the years, with both new hardware and new software being implemented.
Dynamic Loading is a software solution to the virtual memory problem that was introduced around 1970.
Dynamic loading basically allows a process to run at a new address space by using relocation symbols. These relocation symbols allow the loader to figure out how to change addresses and references so that the processes can be loaded into different address spaces. For example, a dynamic loader might've loaded threados-app like this:
The dynamic loader, in this example, would change an instruction like movl $0x19000,%eax to movl $0x29000,%eax.
The dynamic loader, of course, doesn't actually provide any isolation at all by doing this: A process that references some memory that it shouldn't have access to will still be able to access that memory.
To try and simplify implementations of the dynamic loader later, around 1975, hardware manufacturers changed the processor to support a new register. This new addition was the segmentation registers.
Processors now add a base address to any address that it was given for an instruction. This base address is held in the segmentation register. This means that processors could now transform any address into a physical memory address by adding the address to the segmentation register. Different processes would have different values for their segmentation registers, thus providing each process with a different address spaces. This brings about the concept of a Virtual Address Space: A process's address space is actually different from physical address space. So, what a process accesses in it's process space at a certain address may or may not be at that address in physical memory.
Segmentation Registers alone still don't provide isolation: Even though process's address space are different, if you know where a process is relative to the base address (or if you are accessing memory randomly), you can still overwrite other process's memory spaces.
To try and provide isolation, we can add another register called Segmentation Length. As you might guess, segmentation length contains the length of the segment of the process. Any addresses that try to access memory outside of the segmentation length are illegal.
In order to completely isolate processes, both of the segmentation registers must be protected. The instructions that change the registers must be protected instructions. In order to do this, we also need a concept of protected and unprotected code space. This brings about the creation of a new register: the Current Privilege Level or CPL register. The CPL register works in the following way:
| CPL Value | Privilege Level | Protected Instructions |
|---|---|---|
| 0 | Kernel Privilege | Protected Instr OK |
| 1-3 | User Privilege | Protected Instr NOT OK |
Current Privilege Level is also a protected register. When user level processes call system calls or when interrupts occur, CPL is set to zero.
There is one problem with this: by providing the length of a segment (that has to be contiguous and has no maximum size), we introduce the possibility of External Fragmentation (which is very bad).
The way that most modern operating systems fix this is through Paged Virtual Memory.
The operating system basically gets fixed-sized units of memory to work with, and the user is allowed to make variable sized allocations. Fixed-sized allocations means that each memory spot is interchangeable, which makes External Fragmentation completely disappear. It also means that pages can randomly appear in memory, which means that we need some way of translating hardware and virtual addresses to each other.
Operating System addresses also will no longer be the same as application addresses, so there also needs to be support for changing OS addresses into application addresses.
Hardware supports a Binding/Pagemap Function also known as moniker "Beta" (ß). For now, we will assume that ß works by mapping a virtual address(va) to a physical address(pa) like this:
ß(va) => pa ß(va) => ß(pagenum * 4096 + offset) => ß(pagenum*4096)+ offset => physpage*4096 + offset