Table of Contents

Lecture 5 Scribe Notes

By Matthew David, Jeanne Lopez, and Justin Moore

If receiving stack errors for the Minilab in Ubuntu, use:
-fno-stack-protect

PROCESS INTERACTIONS

- I/O Devices - portholes for how processes can communicate with one another
- Processes have distinct address spaces
- The only hole between processes are virtual I/O devices
- i.e. Blocking, Inter-Process Communications and Streams, Waiting for Process Exit, Threads

Process Descriptor (so far)

  1. registers
  2. process ID
  3. current privilege level (CPL)
  4. address space
  5. file descriptor table

Asynchronous I/O - Event-Driven Programming

int main(int c, const char *v[]){
	int fd=open(...);
	char buff[1000];
	ssize_t amt;
	amt=read(fd,buff,1000);}
* What if there aren't 1000 chars available to read, or the disk is not yet accessed to be able to load?

One possibility would be to Busy Wait: spin 1000 times until all characters are available. The problem with this, however, is poor utilization. We aren't using the OS's resources efficiently by simply waiting for the condition. If we used time as a measure of work, then 99% of the time we're not doing anything! Another problem with this, is robustness. There is a lack of process isolation.

Another possibility would be to Yield. This is when a process releases control of the processor so that other code can run. This helps increase the utilization of spinning or busy waiting. There are two methods of yielding: polling and blocking.

Polling

Polling is a form of yielding where the process remains runnable. It checks a condition every time it runs to see if it can continue execution. (Think of a nervous cook asking every second "Has the water boiled yet? How about now? Okay, now?")

So how do we build a polling read interface?

amt = read(fd, buf, siz);
if (amt == siz)
	// read succeeded
if (amt > 0 && amt < siz)
	// read succeeded, but only amt bytes were available at the moment
if (amt == 0)
	// we have reached the end of the file
if (amt == -1 && errno == EWOULDBLOCK)
	// no data available at the moment - try again later
// amt == -1 : indicates a non-normal return, some error happened so look at the errno value to find which one occurred
// errno : a global variable that stores the name of the error
// EWOULDBLOCK : name of the error (all errors start with an 'E')

The polling version may have to call the read() system call infinitely many times, however.

ssize_t pos = 0;
while (pos < 1000) {
	ssize_t amt = read (fd, &buf[pos], 1000-pos);
	if (amt > 0)		// if you read some amount
		pos += amt;		// remember how many you read in
	else if (amt == 0)	// if you didn't read any amount
		break;			// you're at the EOF
	else if (errno != EWOULDBLOCK)		// if you received an error, ignore EWOULDBLOCK
		break;				// EWOULDBLOCK loops again because no data at the moment to read
}

Cost of System Calls

1. Application calls interrupt instruction
2. Processor looks up IDT (interrupt descriptor table) and saves some registers on a kernel stack
3. Kernel saves registers into process descriptor
4. Kernel has to run system call
5. Load a new address space if new process
6. Reload process' registers, return via cross-privilege "iret" instruction

- A context switch is the loading of a new address space that takes thousands of instructions.
- The problem with polling is potentially many context switches, resulting in low utilization. - So to fix this problem, we turn to another method of yielding called blocking.

Blocking

Blocking is a form of yielding where a process is no longer runnable until the condition is true.
- In our example, this condition is having 1000 characters.
- If end-of-file or disk crashes, we might as well return instead of checking on it still.
- In polling, we may call read multiple times as opposed to blocking. To implement blocking, we shall add PROCESS STATE to our process descriptor, with possible values of BLOCKED or RUNNABLE. A "BLOCKED" process is placed on a wait queue. - So blocking provides us with fewer system calls, which result in fewer crossings between the user/kernel (context switches), and thus improved performance and utilization.

Inter-Process Communication

Advice from Doug McIlroy (a manager on the Bell Labs team that invented Unix):
"We should have some ways of connecting programs like garden hose--screw in another segment when it becomes when it becomes necessary to massage data in another way. This is the way of IO also."

Pipes: Should arrange for processes to talk to one another. We want the OUT of one process to be the IN of another.

ex: English-to-French translator into Word Count
e2ftrans file output: tempfile794321.x
wordcount input: tempfile794321.x

However, this gets complicated with many file names and amounts of files. Performance goes down as we write useless data to the disk, and there is also the possibility of the file being modified by a different process.

FILES are a linear array of bytes with a known finite size, allowing random access (ex: a book or TiVo).
STREAM is an array of bytes with unknown and possibly infinite bytes allowing ONLY sequential access (ex: TV before TiVo).

Read/write/close work the same in file and stream. Remember that everything is a file in Linux, so a stream is defined as a file where:
1. lseek is an error
2. we need to create a stream

So the question then becomes, how can we connect processes with a stream?
The most intuitive answer is something called a "pipe."

Specifications of a pipe:
int pipe(int pfd[2]);
pfd[0] : read end of the pipe
pfd[1] : write end of the pipe

Inside of the pipe, there is something called a "pipe buffer."
How big is it, you might ask?
Well, it’s not of size 1 because of performance issues
It’s not of size infinity because of space issues (robustness)
It turns out, it's usually a few thousand characters.
In addition, the smaller the buffer, the more requests we make,
and the larger the buffer, you can batch read/write requests.

Specifications of the “Bounded Buffer:”
• Robustness - finite buffer size
• Buffer full - writes block
• Buffer empty - reads block
• Default - a few thousand chars

The following code puts character "x" into the write-end of the pipe, then reads from the read-end, and asserts that the "x" was written.
int pfd[2];
pipe(pfd);
char x = ‘x’;
write(pfd[1], &x, 1);
char y;
read(pfd[0], &y, 1);
assert (y == ‘x’);

So, how exactly does a pipe works?
The following figures should help explain it.

But wait, when is a pipe at “end of file?”
Answer: When the write-end is closed.

Let's say Process P is going to start up Processes Q & R, with Q’s stdout writing to Q’s stdin

A function that is incredibly helpful for manipulating pipes is something called "dup2."
Specification:
dup2 (int oldfd, int newfd);
dup2 closes the current "newfd," and makes "newfd" point to the same file structs as "oldfd"

Another useful function is "exit."
Specification:
exit(int status);
This function exits a process with an integer status as its parameter.
The fact that the process has exited is stored in the process descriptor.
Because the process descripter is still around, it’s called a “Zombie" process. We need this "Zombie" categorization because we must make sure that someone realizes it’s dead.

Unix is very powerful.
For example, with the utilization of pipes, the following code can drastically simplify otherwise complex operations:

tr –cs A–za–z '
' |                 # formats the document to have one word per line, remove non-letters
 tr A-Z a-z |       # changes capitals to lowercase
 sort |             # sorts
 uniq -c |          # finds how many copies there are of each word
 sort –cn |         # sorts in reverse numeric order
 head –10           # prints the top 10 lines in the file