Table of Contents

Lecture 3 Scribe Notes

We successfully implemented Ursula Moneybag's OS requirements in class last week. Now, she's come up with a few more features for her OS that she would like us to implement.

But first, for a

Simple Review

Last week we wrote a function that reads data from a hard disk. The function went through several stages before reaching its final stage. Let's note the changes, and see why they were made.

1:

void read_ide_sector (int sectorno, uint32_t addr)

This was initial version of read_ide_sector. It takes a sector to read, and stores the contents of that sector in the specified location of memory. This interface is not very versatile: it assumes a single disk, a disk type (IDE), and the associated sector size for an IDE disk.

2:

void read_sector (int sectorno, uint32_t addr)

This new version of read_sector abstracts away disk type. The disk is no longer assumed to be IDE. Rather, the implementation will determine the disk type and internally perform the appropriate tasks for the given disk type. This simplifies the interface dramatically: instead of having a separate function for each disk type, we now have a single, unified function. The function is more versatile and simple (2 of the 4 main goals in designing a good operating system)!

However, the implementation still expects a single disk; also, it assumes that sectors of all disk types are the same size.

3:

void read_sector (int diskno, int sectorno, uint32_t addr)

This third implementation adds support for multiple disks by adding a third parameter to select the appropriate disk, rather than assuming a single disk. This implementation still expects all sectors on all disk types to be of the same size, however.

4:

void read (int diskno, off_t offset, size_t length, uint32_t addr)

Our fourth implementation fixes this final problem. Rather than reading on a sector-by-sector basis, read() is provided with an offset and a length; it reads length number bytes, starting at offset, and stores them at addr. This final implementation fixes all of the interface issues we have so far identified.

We still do not have a mechanism to report to the user if there was some form of error. For example, there may be a bad sector, the disk may not exist, the provided offset may be invalid, or perhaps gremlins have taken control of the disk and are refusing to release its bits until you provide them with delicious, delicious cookies. As it stands now, read() has no way to inform the user that it failed.

The common convention in the linux kernel (and most other places) is to report status through a function's return value. So we give read() an integer return value, which is >= 0 if the read was successful and -1 if the read fails. This makes the function much more useful and enables the user to respond appropriately.

Each subsequent implementation of the function is more general and versatile than that preceding. In addition, it is fairly easy to determine the abilities and usage of each function from the function prototype.

Two terms effectively describe what is accomplished by these funcions.

Virtualization

A module that provides behavior similar to that of an original module.

This read() function is an example of virtualization because it provides the same function as inb(), outsl(), and outb() without having the interface reference them.

Abstraction

A virtualization where the interface is quite different form the original (usually simpler and more versatile).

The read() function is an example of abstraction, as the read() interface is vastly different from the sequence of inb()s, outsl(), and outb()s we used earlier. More importantly, the read interface is vastly simpler and more versatile.

So not only is our new read() function an example of virtualization, but it is also an abstraction. It can sometimes be easy to forget, as computer science students, that most people are not entirely computer savvy. Many do not even like computers, and use them solely as a tool, and not as a toy or hobby. This is why abstraction is important: for those "outsiders" who always inevitably do something to make a program crash. They need our help, and we help by making programs simple, easy to use, and versatile.

Untrustworthy Louis

Since we have successfully implemented Ursula's initial requirement, Ursula now gives a 2nd requirement: her son Louis wants to run code on our OS. The only restriction is that he cannot overwrite Ursula's file.

What is the first thing we need to do? Put Louis's code after Ursula's file on the hard disk (as this is the first free space on the disk). After completing the initial functions of the OS (counting the words in Ursula's file), we load Louis's code in the same way our OS was loaded. We load the first 512 bytes of Louis's code into memory at 0x20000 (which is currently unused), and move the instruction pointer there. Much like the bootstrapping procedure used to load our OS, these 512 bytes of code are responsible for loading the rest of his program into memory.

Great. We now have a simple method by which we can run Louis's code. Unfortunately, we have no robustness! Louis's code can do anything it likes, including (gasp!) overwrite Ursula's file. This will not do!

This is the state of the hard-drive and memory immediately after Louis's code has been loaded:

We will use modularity to protect Ursula's data. Modularity comes in two forms, SOFT modularity and HARD modularity.

Soft Modularity

Modularity where interfaces are maintained by convention (agreement). The convention is not enforced, so any party can easily break it, if they like. As an example of this, the Mac operating system was protected entirely by soft modularity until the release of OS X.

Hard Modularity

Modularity where an interface cannot be violated.

How can we prevent Louis from overwriting Ursula's file?

The simplest approach would be to ask Louis very politely to not do anything destructive to Ursula's code. This is not a particularly safe solution, though. But what if Louis, being an unruly teenage monster, decides to not listen, or what if he makes a coding mistake (perish the thought)? Simply trusting Louis to do the right thing is clearly not the best approach.

The easiest possibility would be to provide some abstraction for Louis: we already have a function to read from hard drives; why not just have Louis use that rather than writing his own IDE commands. If Louis uses the function, this is clearly an improvement: it would be much harder for him to accidentally write a bogus command, since he's using our (well tested) code. However, there's nothing preventing him from calling outb() to write to the disk if that's what he chooses to do. He can ignore our read() function all he likes. Clearly, this is an example of soft modularity.

Somehow, we need to prevent Louis from calling outb(), and force him to use the OS. We can't do this without hardware support: for performance reasons, all of Louis's code is executed directly on the CPU, rather than going through some sort of translation layer. Thus, there is no way to prevent Louis's code from containing an outb() call. Clearly we need some sort of hardware support.

Amazingly, this is how it is actually implemented. Processors have the idea of privilege, which specifies what types of command the processor is willing to execute. We then have the operating system run in privileged mode, which allows it to execute "dangerous" commands, like I/O. Louis's code, however, runs in unprivileged mode: if he tries to execute an unsafe command like, say, outb(), the processor won't let him.

Privilege

Running code has "privilege" associated with it. High privilege code (the operating system) can run any instruction it likes. Low privilege code (user code) can only run "safe" commands.

Great! Now Louis can't call dangerous commands unless we let him. This isn't perfect yet, though. Louis could always just overwrite the portion of memory in the operating system that corresponds to the read function. He could, for example, perform the following (commented out code is from the original read() function):

outb (0x1F6, (sectorno>>24) & 255));

// outb (0x1F7, 0x20)  // used to read from disk
outb (0x1F7, 0x30)  // used to write to disk

// insl (0x1F0, addr, 128)  // used to read
outsl (0x1F0, addr, 128) // used to write

Disaster! read() has just been turned into write()! Clearly, Louis needs to be prevented from overwriting the OS's memory. We fix this by introducing virtual memory.

Virtual Memory

User code is restricted to writing to only certain, allowable portions of memory, known as virtual memory. Writing elsewhere results in an error. This memory block appears like a single, contiguous block to the code, although in reality it may be fragmented in real memory. This has to be implemented using hardware support, for performance reasons.

Unfortunately, this still isn't safe! Louis could still ask the OS to read a chunk of his code and use the OS's read() function to read some unsafe code from disk and overwrite a portion of the operating system with it. Now that code will be run in privileged mode, as it's part of the operating system, and Ursula's file is still not safe.

He could, for example, overwrite a portion of the read() function with the following code (the commented out portion is from Argh! Louis has now used our own operating system to convert read() into write()! What's more, it will write anywhere with privilege level 0! Heavens!

If Louis did this, he would be performing privilege escalation, where he indirectly gets privilege that he should not have.

The next obvious thing to do is place a check at the beginning of each dangerous function, making sure that it is only being called in a legal way (in this case, not letting Louis read() to a location that his code is not allowed to access). This by itself, though, is not enough -- Louis could always just place his instruction pointer after the check and the function would be called without the check being performed.

It seems the best way to prevent him from doing things he shouldn't is to only let him jump into the OS at specified (and safe!) places. This way, he would not be able to use privilege escalation.

We have just discovered a key concept in OS security called

Protected Control Transfer/Process Isolation

Unprivileged code can only enter privileged code in a controlled way. In other words, the user cannot move the instruction pointer to arbitrary locations in OS code; she can only move it to certain, allowable locations. Doing otherwise results in an error.

Slowly but surely, we keep restricting Louis's access in specific areas, until he is utterly unable to do whatever he wants. We restrict his memory usage, instruction set usage, etc.

Ultimately, a process is created.

Processes

A process is a "program in execution in an isolated domain", or a "virtual computer". Code running in a process appears to have unlimited access to the computer's resources, even though in reality they are potentially being shared with many other running processes.

Here, look at this very realistic depiction of what a bunch of concurrently-running processes might look like:

All these processes can run at the same time without affecting each other because they are running in their own "virtual computer".

Kernel

The kernel is the portion of a modern OS that runs with full privilege. The Kernel can do whatever it wants. In other words, it has total control.

- Inter-process communication goes through the kernel

- Hardware communication goes through the kernel

So when applications want to perform certain operations, they must go through the kernel. Applications make a system call using the

System Call Interface

The System Call Interface is the communication layer between user processes and the kernel. This is the interface that virtualizes the hardware and enables applications to use the kernel to perform otherwise restricted operations.

The Components and Concepts of a Virtual Computer

If we want to provide a virtual computer to our processes, how do we go about doing this?

-Abstraction is key. What does the interface look like, and how do processes use it?

-Implementation of the interface. How do the kernel and the hardware actually provide that interface?

All modern computers use the Von Neumann Architecture, which consists of the following components:

1) ALU (the arithmetic logic unit, known as the processor)

2) Registers

3) Primary memory (RAM)

4) Other I/O devices

Let us begin to virtualize each component.

How to Virtualize the ALU (CPU)

Our virtual processor should not be allowed to do everything the kernel is allowed to do. It should have only a subset of the machine's real instruction set, excluding the dangerous instructions.

This is implemented via privilege levels, as described earlier. There is a hierarchy of privilege levels, from 0 to 3. The kernel has unlimited access to everything, or privilege level 0. All users have privilege levels from 1 to 3, which limit the instructions a process can execute. Each application has a current Privilege Level (CPL).

Each dangerous instruction on the processor executes a check before executing a dangerous instruction, which looks like the following pseudo-code:

if (CPL != 0)
    raise exception;
else
    execute exception;

*NOTE* The function that sets the privilege level is a dangerous instruction itself!!!!

Examples of other dangerous function calls are: halt(), where one process can stop all processes from running inb (), where one can write to a certain address

How to Implement Protected Control Transfer

We use the interrupt mechanism to implement protected control transfer. When user code wishes to execute a privileged instruction, it causes an interrupt. The kernel has an interrupt handler that is called when the interrupt occurs, and the code (which is in the kernel, and hence has CPL 0) calls the appropriate system call.

int() <---this instruction causes an interrupt

What int() does: -It saves the processor's current state -It loads the new state from a memory table, called the Interrupt Descriptor Table. -Once the interrupt as been performed, the processor jumps back to its previously saved state.