Scribe Notes for October 3rd, 2006.
by Arash Nasibi, Paul Salzman, Brian Mathiyakom
Recall the example from the previous lecture where our goal was to read a line from a file, with the open file being represented by a file descriptor fd.
Also recall some of the desirable characteristics of a good module:
Now, let’s consider an interface. We’ll call this Interface #1, and successive interfaces will be #2, #3, etc. Please note that the following code is application code, and should not to be confused with anything more lower-level.
Interface #1:
int length_of_line(int fd); // tells you how many characters in the file at the current cursor int readc(int fd); // returns either the current character or EOF if the end of file has been reached
This interface certainly allows us to read a line from the file, like this:
char *readline(int fd) { int i, l = length_of_line(fd); char *line = malloc(l); for (i = 0; i < l; i++) line[i] = readc(fd); /* Don't bother to null-terminate the line for now */ return line; }
(Some of you haven't seen malloc before. The malloc(n) function returns n bytes of memory. It is something like C++'s new, but unlike the C++ version, you pass it a number of memory bytes, not the name of a type. A common usage is
T *t = (T *) malloc(sizeof(T));
This is C's version of C++'s T *t = new T;. The sizeof(T) expression means "the number of bytes it takes to store an object of type T.)
So what are some problems with Interface #1 with respect to the desirable characteristics listed above?
Problems:
Sidebar: How do we determine neutrality? Consider the concepts of policy vs. mechanism. Basically, a "policy" is a system goal, and a "mechanism" is used to achieve that goal. A neutral interface then is an interface that provides mechanism that can support many policies. We don’t usually want to encode policies into interfaces because it takes away from the neutrality.
A real-life example to help us illustrate this point would be traffic lights and traffic laws. The traffic light is a mechanism, used to indicate a particular policy of the California Drivers Handbook: namely, go on green, slow down on yellow, and stop on red. But the mechanism is lower-level than the policy, and could be used in other ways to define different policies. For example, imagine an alternate traffic signal system that used words. "Stop" would mean stop (red light), "Prepare To Stop" would mean slow (yellow light), and "Go" would mean go (green light). This mechanism is less neutral because it is bound to the policy. How can we express the meaning of "flashing yellow light", which means "go with caution"? No combination of the phrases "stop", "prepare to stop", and "go" means the right thing; you'd have to buy a new sign. End Sidebar
Number of system calls to read a line for Interface #1: l + 1. (Where l is the length of the line, and we add 1 for reading the character.)
Interface #2:
Let's first fix the robustness problem with Interface #1. Rather than implement length_of_line as a system call, let's try to implement it "at user level", meaning as a library function inside the application. If this works, we can use the existing implementation of readline without change! (Thanks, modularity!) It turns out we can almost implement length_of_line using readc.
int length_of_line(int fd) { int l = 0, c; while ((c = readc(fd)) != '\n') l++; return l; }
The problem with the interface above is that the cursor gets moved, so we can’t read the actual line!
Suggestion for improvement: add a backc system call, which rewinds the cursor by one character. We can use this to calculate line length without in the end moving the cursor.
int length_of_line(int fd) { int l = 0, c; while ((c = readc(fd)) != '\n') l++; for (i = 0; i < l; i++) backc(fd); return l; }
Number of system calls to read a line for Interface #2: 3l
But the cost of a system call is much greater than the cost of a function call! We'd like to minimize the number of system calls.
Interface #3:
getcursor(fd); setcursor(fd, pos); //side note: in Linux we use lseek for both these statements int length_of_line(int fd) { int l = 0, c; off_t pos = getcursor((fd)) while ((c = readc(fd)) != '\n') l++; setcursor(fd, pos); }
Number of system calls to read a line for Interface #3: 2 + 2l
Interface #4:
We've tried to read a line using length_of_line, but that seems to cause performance problems. Why not get rid of l of the system calls by reading the line and calculating its length in parallel? Here's how that would work.
char *readline(int fd) { int l = 0; int capacity = 512; char *buf = malloc(capacity); while ((c = readc(fd)) ! = '\n') { if (l == capacity) { buf = realloc(buf, capacity * 2); capacity *= 2; } buf[l++] = c; } /* Don't bother with null-terminating the line for now */ return buf; }
(The realloc(ptr, size) function extends ptr into a bigger region of memory, of size size. It is like doing another malloc, then copying the old data into the new region, freeing the old region, and returning the new one.)
Number of system calls to read a line for Interface #4: l! This is good but can we do better?
We are now able to achieve l system calls from our version of readline(). Can we do better? Of course! So far, we've been limited to reading one character at a time with readc(). We can read in chunks of characters with read()!
The system call, read() takes a file descriptor fd, a buffer buf which is the memory area to read into, and a size, size which is the number of bytes to read. A call to read()returns the number of bytes read, nread.
ssize_t read(int fd, void *buf, size_t size)
(size_t and ssize_t are other names for specific integer types. Usually size_t is the same as unsigned. These types are used to refer to object sizes.)
Now, let's rewrite readline() using read() and see what performance gains we get.
char *readline(int fd) { int l = 0, capacity = 512; char *buf = malloc(capacity); while (1) { int nread = read(fd, buf + l, capacity - l); int i; for (i = 0; i < nread; i++) if (buf[l] == '\n') // We're at end of line, exit goto done; else l++; } if (capacity == l) { space *= 2; buf = realloc(buf, space); } } return buf; }
Number of system calls to read a line of length l: just O(log l + 2)!
We've now gotten a high-performing interface, but all that realloc and malloc is annoying to program. Can we keep the good performance but return to something as simple as readc? Yes!
Buffered Input/Output is another term for caching. Recall that readc() is an expensive system call (l + 2 calls for line of length l). To optimize the cost, we can implement readc() as a function call, entirely in the application, that calls read() to get chunks of characters at a time into a buffer. We can use read() to read in a buffer and when readc() is called, we just grab a character from the buffer, reducing the number of system calls! Also, this means that readc() now becomes a library function call, thus costing less than a system call.
The library function, fgetc() implements this idea. fgetc() takes in a pointer to a FILE structure. A FILE structure contains a file descriptor and a buffer.
int fgetc(FILE *stream);
When fgetc() is called, if the buffer in FILE* is empty, fill the buffer using read() (to 32K bytes => 32768 characters) and return the the next character in the buffer.
With fgetc() the performance is (l + 2) library function calls (not system calls)! And because of the cache, there are approximately (l / 32768) system calls!
An abstraction reduces complexity by striping an element to its core attributes or purpose. The three fundamental systems abstractions are memory, interpreters, and links.
Memory stores data (i.e. a file). We can do two basic actions with memory: read a value from an address and write a value to an address.
Types of memory differ in size, i.e. modern hard drives are enormous in memory size while RAM is not.
Types of memory differ in speed (or latency), i.e. random access memory (RAM) has the same latency for all addresses.
Memory can be either volatile or durable. Volatile memory is temporary (i.e. when the power is turn off, the data is loss). Durable memory is the opposite in nature; it is recoverable after its power source is cut off.
Types of Memory are called upon differently, i.e. associative (files) or numbered (addresses).
An interpreter executes instructions. An operating system kernel is an interpreter.
A link allows data to be transfered across different medium. A bus, which sends information from computer components (cpu to disk to ram), is a link.
Layering is a design technique where a more complex and useful abstraction is built by using lower level abstractions.
A good example can be found in our memory abstraction:
The separations between layers do not have to be as simple. Take a look at the layering in our interpreter abstraction:
The disk is a durable, non-volatile memory with large space capabilities and a high latency cost. Disks are physically addressed with a location being defined its platter, track, and sector specifications. In order for data to be read or written the read head must be moved into position over the correct track and wait for the desired sector to rotate into position. During this process of “seeking” the read head can be accelerated 40 times faster than the acceleration of gravity.
The disk latency is generally much higher than that of a random access memory. Here are some example times for a disk memory:
On average we’ll wait 12,000,000 cycles for a piece of data to be read from the disk!
Price Comparison:
Earlier we wanted to make readc() faster. We decided to read in a larger buffer and to cache said buffer to speed up future calls to readc(). This led us to 32,768 fewer system calls. So why not apply this idea to the disk latency?
Operating systems do implement caching and prefetching in order to combat disk latency. They use the buffer cache, a portion of memory where a large, contiguous buffer of data from the disk is stored and can be easily accessed later. With larger sets of data stored in memory we can reduce the number of times we read to disk drastically.
The kernel sends commands like inb, outb, and insl over a bus in order to communicate with the disk like a memory. The bus is communication line and falls the link abstraction.
Here is some sample code of how the kernel communicates a read operation:
while ((inb(0x1F7) & 0xC0) != 0x40) // waiting for the disk to be ready /* spin */ outb(0x1F2,1); // number of sectors to read outb(0x1F3,0); // bits 0-7 (low bits) of 28-bit offset outb(0x1F4,0); // bits 8-15 of 28-bit offset outb(0x1F5,0); // bits 16-23 of 28-bit offset outb(0x1F6,0xE0); // bits 24-27 of 28-bit offset // bit 28 (= 0) means Disk 0 // other bits (29-31) must be set to one outb(0x1F7,0x20); // READ SECTORS command while ((inb(0x1F7) & 0xC0) != 0x40) /* spin */ insl(0x1F0,0x7C00,128); // get results as 128 "long words" // 1 long word == 4 bytes; 128 * 4 == 512 bytes, // the length of a sector
Note: The disk is acting both like a memory through its addressing and an interpreter through the instructions it receives.
So with our current model of reading data from the disk to memory is as follows:
Our model has data is traveling the bus two times. This is rather unnecessary, so we will use Direct Memory Access (DMA) and make a path from the disk directly to memory. Now take a look at how data is read from the disk: