====== Lecture 4 Scribe Notes ======
===Organization of a Modern Operating System===
In modern operating systems, only kernel code runs with full privilege over the the hardware. By privilege, we mean the ability to execute "dangerous" instructions, for example writing directly to I/O devices. Restricting access to dangerous instructions prevents user code from behaving maliciously. In class, we saw that it was possible for Louis' code to delete Ursula's file if it were run with full privilege. Between the user code and the kernel sits a system call interface, which gives user code controlled access to dangerous operations.
{{http://btjue.bol.ucla.edu/cs111/processes.jpg?600}}
Together, the kernel and system call interface enforce processes isolation. As a result, modern operating systems have:
* Robustness
* Utilization
* Performance
* Protected, simple, and versatile access to hardware
The system call interface is responsible for virtualizing the physical machine. Because almost all programs must interact with it, its architecture determines weather or not our OS meets the goals above. Almost all modern computers are based on the [[wp>von Neumann architecture]], which has an ALU (part of CPU) for processing instructions, registers for the ALU to operate on (also in the CPU), primary memory for data and code storage, and I/O devices to get data into and out of the machine. In this lecture, we examined how to virtualize each of these architectural elements.
=== ALU (CPU) Virtualization ===
The CPU provides instruction processing for both the kernel and user code, but we do not want user code to execute dangerous instructions. One possible approach to controlling user code would be to virtualize the CPU in software. In this approach, the kernel would accept instructions from the user code, check if they were safe, and pass them to the CPU if so. However, this is far too expensive so instead the check is done in hardware by the CPU. The CPU is able to differentiate between privileged and unprivileged code through the use of a special register, the Current Privilege Level (CPL). If the CPL value is 0, instructions are executed with full privilege. If it is 3, the processor will refuse to execute dangerous instructions. It is up to the kernel to set the CPL to the correct value before it jumps to user code. The instruction to set the CPL is considered dangerous, preventing user code from changing its own privilege level.
If a process tries to execute a dangerous instruction, the processor triggers a software interrupt (also called a trap or exception). This software interrupt gives control back to the operating system, which can then deal with (kill) the offending process. Specifically, the interrupt triggered is #12, the General Protection Fault (GPF). For this to work, the kernel has to set up a GPF handler in the interrupt descriptor table when it boots, which we describe in the next section.
=== Interrupt Handling ===
Because our CPU virtualization allows user processes direct access to the CPU, we have exposed ourselves to a possible violation of our goals. While the user code may not be able to execute dangerous instructions, what happens if it enters into an infinite loop? The effect would be to bring the entire machine to a halt. The system would not be robust, utilization would be granted entirely to one process, and performance of other processes would be zero. We need some way to give control back to the kernel.
We do this through timer interrupts. In modern computers, there is a timer on the motherboard that periodically sends a signal to the CPU. When the CPU receives this signal, it looks in the Interrupt Descriptor Table (IDT) for the address of the kernel's interrupt handler. The IDT is a table of addresses set up by the kernel according to an architecture specified by the processor's designers (e.g. in the Pentium manual). The 0'th element of this table is the timer interrupt handler address. We jump to this address, and the kernel takes control of the CPU again. At this point, the kernel would probably choose another process to run. The cost of this context switch is high; all of the registers must be copied out so that the swapped-out process can be resumed later without losing its state. As a result, the timer interrupt is relatively infrequent, every 0.1s to 0.001s (approximately every 1M instructions).
^Interrupt Descriptor Table (IDT)^
| 0 Kernel |
| ... |
| 12 General Protection Fault |
| ... |
| ... |
Using timer interrupts, we are able to solve the problem of infinite loops in user processes and reclaim control of the machine for the kernel. Thus, our CPU virtualization satisfies the goals of robustness, utilization, performance, and protected, simple access to hardware.
=== System Calls for Protected Control Transfer ===
Now that we have a mechanism for starting user processes and interrupting them to give control back to the CPU, we need a method for protected transfer of CPU control. Operating systems re-use the interrupt/trap mechanism for this purpose. A system call is a mechanism for user code to request a service from the operating system (see [[wp>System Call]]). The system call that causes an interrupt is "int." To ensure robustness, user processes are not allowed to generate all interrupts; they must be given permission by the kernel. If they were able to generate fake kernel interrupts, they could bring the system to a halt because of the high cost of excessive context switching.
=== Registers ===
Registers in the Von Neumann model can be divided into two kinds: Normal registers, to which any process has direct access, and Unsafe registers (such as the CPL register mentioned above) which should be accessed only by privileged code. This is accomplished with hardware support for the current privilege level.
When we switch processes, we need to save all of the registers into memory so its state can be restored later. To accomplish this, we have a Process Descriptor Table:
**Structure of a Process Descriptor Table**
^procdescriptor_t^
| ... |
| **registers** |
| **address space** |
| ... |
**Sample Process Descriptor Tables**
^ PID: 1 ^ PID: 2 ^ ...^
| | | |
| Registers: | Registers: | |
| -%eax = 12 | -%eax = 9 | |
| -%eip = 15 | -%eip = 44 | |
| ... | ... | |
When an interrupt occurs, we perform the following steps:
1. An interrupt is received
2. Processor looks up the Interrupt Descriptor Table to find out where it should jump next
3. Store the current registers into memory (specified by the Interrupt Descriptor Table)
4. Transfer control to the Kernel (load new values for the Kernel into our registers)
5. Copy the registers we just saved from step 3 into the Process Descriptor Table
6. Kernel handles the interrupt.
Similar steps are taken to reverse the situation and return control back to the original process.
=== Primary Memory ===
For performance reasons, we want to give processes direct access to primary memory. For robustness reasons, we don't want to allow processes to have complete control over memory. To meet these ends, we must have hardware support for process isolation.
To isolate processes, we have a notion of Virtual memory. Virtual memory is a rearranged subset of physical memory. Processes can only directly access their own private virtual memory. From the point of view of the process, this virtual memory appears to be a normal address space. The Kernel, on the other hand, has complete access to all physical memory.
Here is a visual idea of a potential situation:
{{http://btjue.bol.ucla.edu/cs111/memory.jpg?690}}
=== I/O Devices ===
There are three main considerations which result in the way most operating systems handle Input/Output devices:
1. Performance - because most I/O devices are slow (especially those in direct contact with humans - keyboard, mouse) overhead when handling I/O devices tends to be acceptable.
2. Robustness - it can sometimes be dangerous to allow complete control over an I/O device. For instance, file protection on a hard drive.
3. Versatility - it is good to have a similar high-level interface that works for all devices, regardless of manufacturer.
For these reasons, modern operating systems have abstract interfaces for all I/O devices.
How these interfaces were developed was based on considerations of similarities and differences between each device. Take, for instance, four devices: CD-ROM, Hard Disk, Network, and Keyboard. The similarities between these devices are that they can read and/or write, they are slow, and they are prone to having errors. The differences are mainly between devices like the Hard Disk/CD-ROM, and devices like the Network/Keyboard:
Hard Disk/CD-ROM vs Network/Keyboard
- Data Request/Respond - Spontaneous data generation
- Finite data - Potentially infinite data
- Random-access data - Stream of data
===Unix's Big Idea===
Those aforementioned differences make accessing I/O unwieldy because as it were, two or more different interfaces may be needed. However, Unix's "big idea" is to treat everything as a file. Every I/O resource, whether Streaming or Random Access, is modeled as a "file descriptor."
If we modify the "read" function used previously for Random Access (specifically disks),
int read(int diskno, off_t offset, size_t len, uint32_t addr)
by removing the ''off_t offset'', we arrive at a function that is usable for streaming I/O. The ''offset'' is then moved to a separate function to handle Random Access-only operations.
The actual I/O read and write functions functions we will be using for the purpose of this class are as follows:
Read:
ssize_t read(int fd, void * data, size_t len)
When the file descriptor is of Streaming type, the next ''len'' bytes are read. In the Random Access case, the read function is used in conjunction wth the ''lseek'' function below to read the next ''len'' bytes at the file pointer.
off_t lseek(int fd, off_t offset, int whence)
The choices for the argument ''whence'' are ''SET_SEEK'', ''SEEK_CUR'', and ''SEEK_END''. Due to the offset and these options, attempts to call ''lseek'' on a streaming file will fail. To write, we use this function:
ssize_t write(int fd, const void *data, size_t len)
===Creating a new file descriptor===
{{http://btjue.bol.ucla.edu/cs111/hierarchy.jpg?600}}
Before things can be accessed, a file descriptor must be created and opened for them. The ''open'' and ''close'' functions provide an abstraction so we can readily work with file descriptors. The ''open'' function looks like this:
int open(const char *name, int mode);
The choices for ''mode'' include ''O_RDONLY'' (read only), ''O_WRONLY'' (write only), and ''O_RDWR'' (read or write). In order to open his uber-gradebook file, for example, Prof. Kohler would do ''fd=open("/home/kohler/grades.txt", O_RDWR)'' followed by ''write(fd, "A", 1)''. Hopefully he will be doing a lot of "''write(fd, "A", 1)''"!
Along with the registers and address spaces, each process's process descriptor table contains a **file descriptor table**, containing an array of active file descriptors for that process.
^procdescriptor_t^
| ... |
| registers |
| address space |
| **file descriptor table**|
| ... |
^file descriptor table^
| ... |
| 3: O_WRONLY - "grades.txt"|
| 4: O_RDONLY |
| ... |
To close file descriptor ''fd'', one simply executes int close(int fd);
===Waiting on I/O===
I/O devices are inherently slow relative to processor speed. The processor wastes a lot of time doing nothing while the I/O is doing its thing (for example, while the process ''read(fd, ...)'' occurs for an IDE HD). While a sector is being read:
read_ide_sector()
{
while((inb 0x1F7) & (0xC0)) != 0x40)
/* do nothing */
}
this may not allow other processes to execute; it's a possible infinite loop. It is an example of **Busy Waiting**. Busy Waiting occurs when code repeatedly tests a condition to the "exclusion" of other tasks. Basically, we are stuck on this one process until the condition happens.
An alternative to busy waiting is **yielding**, in which the running process releases its control of the processor so that other processes can run. One type of yielding is **polling**, in which the waiting process remains runnable while the processor periodically checks on the condition. The other type of yielding is **blocking**. A process waiting on a condition is "blocked" and thus ceases to run until the condition is satisfied.
Whether to use polling or blocking is a question of //utilization//. Again, utilization is one of our five main goals toward a good OS. Applications should be able to use available hardware resources for useful work. Obviously, utilization takes a major hit if busy waiting is employed. As far as yielding is concerned, blocking generally allows for better utilization because a blocked process does not consume system resources.
In order to keep track of what processes are blocked and which are runnable, there is another item in the process descriptor table called **process state**. Among the possibilities for process state are "blocked" and "runnable," and if it is blocked, the PDT also stores info about which process it is blocked on.
^procdescriptor_t^
| ... |
| registers |
| address space |
| file descriptor table|
| **process state**|
| ... |
Many times, programs are more than simply sending input, doing some work, and looking at the result as output. It is often useful to have two programs connected so that the output of one can be used as input to the other. This can be thought of as "coupling programs like a garden hose". The pipe ("|") command is used to implement this functionality.
==Topics that didn't make the cut==
The following topics were on the agenda for Lecture 4 but were not covered due to time constraints. ''fork'' and ''execvp'' were covered in the 10/11/07 discussion section, and signals were discussed in Lectures 5 and 6, but here is a quick summary.
**Fork**\\
usage: ''pid_t fork();''\\
Fork duplicates the process that calls it. The new process is a child of the originating process, and thus the originating process is the parent. The child gets its own unique pid. The ''fork()'' command returns 0 to the child process. ([[http://www.scit.wlv.ac.uk/cgi-bin/mansec?2+fork|Unix man page for fork]])
**Execvp**\\
usage: ''int execvp(const char *command, const char *argv[]);''\\
When a process calls ''execvp'', the command and arguments that were passed in get executed, and the calling process kills itself. Because of this, the calling process never returns if ''execvp'' is successful. ([[http://www.cs.biu.ac.il/cgi-bin/man?execvp+2|Unix man page for execvp]])
**Kill**\\
usage: ''int kill(pid_t pid, int sig);''\\
Depending on the arguments sent into ''kill'', it may either terminate a process or send a signal to it. ([[http://www.hmug.org/man/1/kill.php |Unix man page for kill]])
**Signals**\\
usage:
''typedef void (*sighandler_t)(int)''\\
''sighandler_t signal(int sig, sighandler_t func)''\\
These are functions for instantiating signals.
**Interrupts**\\
[[lec4#Interrupt Handling|See Interrupt Handling]]