Table of Contents

Lecture 3 Scribe Notes

By Susan Chebotariov, Xueman Liu, and Tri Loi

Announcements

QEMU

QEMU instructions are available on the WIKI QEMU is an emulator that can be used for the labs and minilabs Please start trying to install it and learn how to use it soon so the bugs can be worked out before lab 2 is due. Lab 2 will require modifying the kernel, so it is strongly recommended that you use QEMU or another emulator for that lab.

Operating systems goals II

Review: Computer System Design Principles

There are several Design Goals for the Application/OS Interface, which can be ensured through various means. During lecture 2, we talked about designing a quality interface that incorporates the following design principles:

First, Robustness is a form of fault tolerance which ensures that errors do not spread very far. We ensure robustness through Process Isolation, which says that 1) User applications cannot have access to kernel memory, 2) User applications cannot have direct access to disk, and 3) User applications cannot have access to other processes' memory.

Second, Generality (also known as versatility) is the level of flexibility inherently present in the system. We ensure generality through Abstraction, which virtualizes the necessary components to simplify the interface with them. A simple interface allows different types of data sources uniform access to the resource. Unix's "Big Idea" is file abstraction: everything in Unix is a file (whether is a file, keyboard, console, etc.).

More on Computer System Design Principles

Performance

Performance considers how efficiently an operations system uses hardware resources. Consider the following system call which allows us to read information off of the disk:

system function:
sys_read takes a file descriptor, pointer to a buffer, and the size to be copied into that buffer. It then reads data from the file, copies the requested amount of data into the buffer, and returns the length of the portion of the file that was successfully copied.

size_t sys_read (int fd, char *buf, size_t len) {
	int sector_no = sector_no_of_file(fd);	  //fd is the file descriptor
	char temp_buf[512];			  //temporary location to store 512 bytes (one sector) worth of data
	sys_read_sector(sector_no, &temp_buf[0]); //tells disk what we want and stores it in temp_buf
	memcpy(buf, temp_buf, len);		  //copy the correct number of characters from the temporary location to the final location
	return len;				  //return the number of characters that were copied
}

This version of sys_read assumes that all of the data fits into one sector and that we do not need to worry about offset (i.e., we assume the file starts at the beginning of the sector). Although these assumptions are not practical for dealing with a real operating system, we are simplifying the process to help us understand the important concerns with respect to performance in such an example.

What is the problem with the above code? Consider the following snippet:

user code:
The user application will read from the disk, one byte at a time, until it reaches the end of the file.

while (c != EOF)
	read(fd, &c, 1);	//retrieves one byte at a time

Compared to reading from memory, reading from the disk is very slow. Therefore, it is extremely inefficient.

Caching

Consider the scenario where a user application needs to read a file that has a total of 512 bytes. How many times will read be called? Can you think of a performance issue here? (Look at how sys_read is implemented.)

Problem:

If the file has 512 bytes, the application will call read 512 times. However, sys_read retrieves an entire sector at once, which is 512 bytes. Since the entire file resides in a single sector, sys_read will repeatedly retrieve the same sector from disk, return a single byte, and discard the remaining bytes. Reading from the disk takes a long time, and repeatedly reading the same sector and throwing it away is very inefficient in terms of performance. Now can you think of a way to solve this problem? How can we change sys_read to improve performance?

Solution:

One way to fix this problem is by implementing a cache in sys_read. Caching is one way programmers improve performance. Caching makes a temporary local copy of the data requested to avoid the overhead to retrieving data from the disk if it is requested again.

/*pseudocode for sys_read with a cache*/
size_t sys_read (int fd, char *buf, size_t len) {
	int sector_no = sector_no_of_file(fd);
	//creating the cache
	if (sector_no not in cache) {
		alloc space in cache;
		sys_read_sector(sector_no, &cache[sector_no]);
	}
	memcpy(buf, cache[sector_no], len);
	return len;
}

This new sys_read uses a cache. When it is called, it first checks whether or not the sector is in the cache. If it is already available in the cache, it retrieves the data from the cache. If the sector is not in the cache, then it reads the data from the disk, stores the new sector in the cache, and returns the appropriate data.

So what would happen now if the user application wanted to read in a file that is 512 bytes long? How many times would we be retrieving data from the disk? The answer is: only once, because the file fits in a single sector. sys_read now only needs to read from the disk once to obtain the entire file, storing it in the cache. Then, for each subsequent request, sys_read would be returning data from its cache instead of reading from disk to return a new byte. Before we were using the cache, a 4KB file required us to access the disk 4,096 times. Now, we can read a 4KB file sequentially and only access the disk 8 times (4KB/512B). So, using a cache is one way to help meet our OS goal of performance. However, 8 disk reads is still really expensive.

Reading from the Disk

We will take a brief detour and review how reading from the disk works (it's like a record player).1)

The disk is rapidly spinning and the head can move in towards the center of the disk or out towards the outer edge of the disk. The most time-consuming part of reading from a disk is waiting for the head to be properly positioned to read the desired sector. As a result, once the head is properly positioned to read the first position, sequential reads on the disk are fast. For example, reading sequentially is about 60MB/sec while random disk access takes about 50KB/sec, so sequential reads are roughly 1000x's faster.

Speculation (Batching & Pre-fetching)

In sys_read_sector we do programmed i/o calls such as:

void sys_read_sector(int secno, int32_t addr) {
	while( (inb(0x1F5) & 0xC0) != 0x40)
		/* Do Nothing */
	outb(0x1F2, 1); 	//# of sectors
	outb ...
	.			// more outb statements that we will ignore
	.
	.
	insl(0x1F0, addr, 128);	//128 represents the number of words (512B/(4B/word))
}

This reads one sector and returns 128 words (512 bytes). So, to read more than one sector we will need to make multiple calls to sys_read_sector.

However, this brings us to another major area where our OS takes a performance hit: sending commands to the system over the bus. This is a problem, because the bus is also quite slow as compared with reading from memory.

So, to read a 4KB file, we need to make 8 system calls, but we would like to be able to do them all at once to save time. This is called Batching. Formally, Batching is combining several requests into one to reduce per request overhead. We need our disk to support batching... but it already does (we just need to make proper use of it)! Previously, our sys_read_sector function only read one sector at a time (see the line outb(0x1F2, 1);). Now, we will change this to become a sys_read_8_sectors function which reads 8 sectors at a time.

void sys_read_8_sectors(int secno, int32_t addr) {
	while( (inb(0x1F5) & 0xC0) != 0x40)
		/* Do Nothing */
	outb(0x1F2, 8); 	//Change 1: now we read 8 sectors at a time
	outb ...
	.			// more outb statements that we will ignore
	.
	.
	insl(0x1F0, addr, 1024);//Change 2: now we read 1024 words ((8sectors * (512B/sector))/(4B/word))
}

Batching is one form of Speculation (when we assume something will happen before it does so it is ready when requested). More formally, Speculation is performing potential requests in advance so the results are immediately available when they are requested. Speculation for read requests, is called Pre-fetching.

Are there any concerns related to caching and speculation (batching and pre-fetching)?

Intuitively, speculation may not always be a good idea. For example, if further system calls are not necessary or if the sectors we pre-fetched are not needed, then the work performed is wasted. However, we have observed over time a phenomenon called Locality of Reference. The principle of Locality of Reference states that when requests are clustered in time they tend to refer to data that is clustered in space. In other words, if an application requests data from sector A, then that application is likely to need sector A+1 in the near future. Therefore, in practice, caching and pre-fetching work well to enhance performance.

Another concern with caching is the assumption that the cache contains the most recent copy of the data. In other words, the data on the disk has not changed since being cached. This is called Cache Coherence. A cache is coherent if it always contains the most up-to-date copy of the data. So, the cache in sys_read may not be coherent if sys_write does not use the same cache as sys_read. We assume the disk is attached to a single processor and all applications use system calls, rather than directly accessing the disk. This means that we require process isolation (ensuring that a write will write to the cache not directly to the disk). Recall that process isolation is hard modularity. Therefore, if we are careful and enforce process isolation, then we can guarantee cache coherence.

One final method that has been used to enhance performance (but does not apply to reads, is Dallying. The basic idea of dallying is to not immediately execute a request. Instead, wait awhile and execute that request along with other operations in a batch. We will address dallying later in the quarter.

Utilization

Recall that Utilization is the amount of a module or service's capacity that is used for useful work. For example, while(...) /* do nothing */ does no useful work. This does not provide full utilization of the computer's resources and is considered Busy Waiting. Busy Waiting is waiting for a condition that involves repeatedly checking the condition (AKA polling or spinning) to the exclusion of other processes.

Yielding

The opposite of busy waiting is Yielding. Yielding occurs when an application releases shared resources for use by another application when useful work is not being accomplished. Thus, instead of repeatedly checking for a condition, yielding to other applications allows other code to run and thereby gains more utilization from the processor. This is called Parallelism or Concurrency and overcomes busy waiting.

So, we can replace the /*Do Nothing*/ line with schedule(); which, for now, we will consider "Magic" (this is the same as "context switching," which will be used in Minilab 1).

void sys_read_sectors(int secno, int32_t addr) {
	while( (inb(0x1F5) & 0xC0) != 0x40)
		schedule(); //instead of do nothing (consider as "Magic" for now)
	outb...
	.
	.
	.
}

Multitasking

Another method that can be used to optimize utilization is Multitasking. Multitasking implements scheduling processes in a way that improves system utilization and performance.

Recall the picture of what our memory looks like when an application is running. memory1-1.jpg

For multitasking, we run this kernel code two times and copy the application code into two difference locations.
memory2.jpg

Due to process isolation, we do not need to change the program code in order to make multitasking work. By using process isolation, we ensure that a process cannot access other processes' memory. Note: this does assume that the processes being run do not share resources (such as memory) with each other

For the word count program with the /*Do Nothing*/ loop, here is a graph of the processor's utilization: processor1.jpg

Now, look at the graph for the word count program sharing the processor with a program that calculates pi: processor2.jpg

Notice that we still do not have complete utilization due to the overhead for executing kernel code (overhead is NOT useful work).


Process

What is a process?

(1) A program in isolated execution or (2) A virtual computer processes.jpg

Note: Each process thinks itself is alone.

Von Neumann Machine/Architecture

  1. ALU (Arithmetic Logic Unit)
  2. At least have one register
  3. Primary Memory
  4. A store program
  5. I/O Devices

Visualizing the Von Neumann Machine

vonneumann.jpg

Turing Machine

  1. Head
  2. Current state
  3. Tape
  4. State Transition Table

turing.jpg