Andrew Ackerman, Dae-ki Cho, Johann Ly
12 October 2006
Interfaces and Threads
To be able to substitue the stdin, stdout, and the stderr file descriptors with other files.
After the parent process calls fork it creates a exact duplicate child process. This child process's file descriptor table is initially copied from the parent's. But remember that a file descriptor table is just an array of references to file structures, which can include regular disk files, terminals (like your screen), network connections, and other types of virtual I/O devices. (Unix's big idea: everything is a file!) These file structures are not duplicated. (Files are virtual I/O devices, and we wouldn't expect real I/O devices, such as disks, to magically copy themselves when a new process was started!) So the parent and the child have different references to the same structures.
However, if there is a redirection the stdin, stdout, or stderr descriptors in the child should change to the given file in the redirection.
int open(const char *pathname, int flags); int open(const char *pathname, int flags, mode_t mode);
Open opens the first file descriptor that is not in use. It chooses the first one not in use because system calls have predictable behavior.
int c; c = fork(); if (c == 0) { // in child close(0); // close stdin fd = open("input", O_CREAT); ... execvp(...); }
The above code can be used to redirect stdin to a file named "input". Since open is used after closing the stdin, it is guaranteed to open at the 0 file descriptor. However, this method gets trickier when trying to redirect stdout and stderr, especially when it is unknown whether the two file descriptors are already in use.
int dup2(int oldfd, int newfd);
dup2 changes newfd into a copy of oldfd. That is, after dup2 returns, oldfd and newfd point to the same file structure. (Exception: The old version of oldfd is closed before the copy, so dup2(x, x) is equivalent to close(x)!)
dup2 can be used to simplify redirection, as shown in the following code:
int fd; pid_t c = fork(); if (c == 0) { // in child fd = open("input", O_CREAT); if (fd != 0) dup2(fd, 0); // STDIN_FILENO == 0 close(fd); ... execvp(...); }
The above code redirects stdin to read from a file named "input" by use of dup2.
To get the output of process A to the input of process B, while keeping A and B isolated.
The need for a device to put the output of one process to the input of another process was evident to authors of early command shells. However, since the processes are meant to be isolated, they probably should not change each other's virtual computer state while they are running! That would make it hard for programs to get consistent, predictable output: how would a child process know when it was safe to start using its virtual I/O devices (that is, when the parent was done fiddling with the virtual I/O device layout)? Therefore, the Unix interface designers decided that the output of the first process must be linked to the input of the second process before the processes begin execution.
Pipes are bounded I/O buffers that are used to "pipe" one file descriptor to another file descriptor. They are extremely powerful devices because they can be used to connect the stdout of one process to the stdin of another process.
$ ls | sort -r
Ospsh is the main program, which was forked to make the two child processes (ls and sort).
#include <unistd.h> int pipe(int fd[2]);
The pipe system call creates two file structures representing a bounded buffer, one for the read end and one for the write end. It returns two file descriptors for the two ends. The input values in fd[] are ignored, but on output, fd[0] is the read end and fd[1] the write end of the newly created pipe. pipe returns 0 on completion or -1 on error (with proper errno set).
Setting up a pipe to get to the goal state above is a multi-step process with stages occuring both in the parent and in the child.
The Process
pipe Create the pipe in the parent process (shown in red)fork() Fork the parent process to create an exact copy child process (shown in green above)dup2 Change the write end of the pipe to be read from stdout of the child process (shown in blue above)close Close the other connections to the pipe in the child process (shown in orange above)close Close the parents write end connection to the pipeThis brings us to (the black parts):
Process Continued
fork() Fork the parent process to creat another exact copy child process (shown in green above)dup2 Change the read end of the pipe to read from the stdin of the 2nd child process (shown in red above)close Close the other connetions to the pipe in the 2nd child process (shown in blue above)close Close the parents read connection to the pipe (shown in orange above)execvp Execute the first child's command (shown in brown above)execvp Execute the 2nd child's command (shown in grey above)Removing the clutter, the diagram is now the goal state!
Deadlock occurs when two or more processes are waiting on the other to release a resource. The system stops making progress because of a circular wait.
Consider the following diagram:
The above diagram is a result of the programmer forgetting to close the file descriptors pointing to the pipe in the parent process. This will cause a deadlock. After ls is done running it closes its connection to the write end of the pipe. However, sort -r sees that the ospsh is still connected to the write end of the pipe and waits for it. At the same time, ospsh is waiting for sort -r to finish executing. Therefore, ospsh is waiting on sort -r and vice versa. This causes the system to hang due to a circular wait.
Waitpid is a method for reusing processes that have exited.
pid_t waitpid(pid_t, int *status, int options);
DESCRIPTION
The waitpid() function will suspend execution of the calling thread until status information for one of its terminated child processes is available, or until delivery of a signal whose action is either to execute a signal-catching function or to terminate the process. If more than one thread is suspended in waitpid() or wait(2) awaiting termination of the same process, exactly one thread will return the process status at the time of the target process termination. If status information is available prior to the call to waitpid(), return will be immediate. (Citation: The Single Unix Specification)
Why do we need waitpid()? A parent process often needs to know that one of its children has exited. For example, a shell shouldn't print its prompt again until the current command completes. But how can it tell that the right process has exited? Processes are identified by their process IDs (the pid_t returned by fork()), but process IDs are a finite resource (there are only 2^32 of them). What if a parent started a child, and then 2^32 other processes ran before the process had a chance to check up on its child? Is there a risk that some other process will have used the child's old ID in the meantime? (This may sound unlikely -- it is unlikely, actually! -- but computers keep getting faster & faster!) The solution Unix chooses is to keep the process descriptor around until the parent calls waitpid() (or a similar system call, such as wait()). The process descriptor for an exited process is marked as a so-called "zombie": despite its death, it continues to take up space. The child's process ID will never get reused until the parent calls waitpid() for that child. This correctly reports that the child has exited, and then frees the zombie (cuts off its head) so the process ID can be reused. This means that waitpid() will succeed (report true) at most once per child.
Example:
while (1) { if (fork() == 0) // Fork bomb exit(); }
The fork bomb is a classic example of the Denial-of-Service (DoS) attack. The attack works by creating unlimited forked processes within a while-loop, which can saturate the operating system's process table with replicated processes. Using waitpid() is useless against such an attack because the forked processes may never exit. As a result, the computer's performance becomes very slow and a large number of process IDs become occupied. To recover from such a fork bomb, the user has to free all such running processes and reclaim their IDs; however, this may not be possible without rebooting the system. A fork bomb can be prevented by reducing the maximum number of processes which can be run by a program or user at any time, or by reusing process IDs dynamically. Additionally, there exists a Linux kernel module to detect these attacks and clean up the forked processes appropriately.
Apache HTTP Server is a free software/open source web server for Unix-like systems, Microsoft Windows, Novell NetWare and other operating systems.
Here's an idea for how Apache might be written at a high level. Is anything wrong with it?
while (1) { accept_new_connection(); read_command_from_connection(); handle_connection(); /* read file, write to connection */ close_connection(); }
Yes, something is wrong: a single malicious client can stop the server from servicing any other client! This is called a denial-of-service attack: The server wasn't exactly hacked, but it is prevented from doing its job. How can a client do this? By simply sending its command very, very, very, very slowly, or even by sending half of a command then stopping. The server will block on read_command_from_connection(), waiting for the client to finish! To fix this, the server must introduce some sort of per-connection timeout, where a very slow connection is forcibly killed. But setting such a timeout is hard, and still the server is unusable for quite some time while it waits for the timeout.
What we have here is a failure to virtualize. We want the two connections to exhibit independent failure rather than the current propagation of effects (one client is slow => the whole server stops working). How do we get independent failures in computer software? With virtualization: processes!
while (1) { accept_new_connection(); read_command_from_connection(); if (fork() == 0) { handle_connection(); /* read file, write to connection */ close_connection(); exit(0); } }
So, we handle each connection in its own process, which is then independently handled.
Is this process good enough? If this has some problems what is that? Here is problem what we have.
Any Idea? Answer is using "THREADS" abstraction as thread is lighter than process.
What is Thread?
Thread is sometimes called "Semi Process" or "Light Weight Process". This is normally used for server programming task such as Server/Client model. This is also good way for applications to split into several jobs simultaneously. Threads and processes are different as the method of creating and sharing its resources is different. Threads typically share the state information of a single process, and share memory and other resources directly. Then how do they sharing a single resources together? that is Context switching. We will talk about context switching bit later. Following is how the threads use limited resources together.
| Process descriptor | Threads |
|---|---|
| Address space | Shared |
| virtual I/O (file descriptor) | Shared |
| Process instructions | Shared |
| Signals and signal handlers | Shared |
| current working directories | Shared |
| registers | Different |
| State(Blocked, Ready, Zombie) | Different |
| Stack | Different |
| Thread ID | Different |
| Signal Mask | Different |
| Return Value such as errno | Different |
Do we copy the old thread? We are NOT going to copy the old thread.
Why not?
So far, we looked up the general idea of threads. Then, how can we implement thread in real? As we are using POSIX systems (i.e., Linux) in our class, let's talk about the Pthread.
First job we have to do is creating a thread. Here is the system call for doing that.
#include <pthread.h> int pthread_create(pthread_t *thread, const pthread_attr_t *attr, void *(*start_routine)(void*), void *arg);
Now, the system call generated a thread. The thread has it's own register and stack which is looks like following.
Is thread perfect itself? Can we just use this thread and close it? The answer is No. We still need to take care about the synchronization issue here. One nice thing about Pthread is they provide "mutexes", "joins", and "condition variables" mechanisms. Then let's look at these mechanisms.
Mutexes are used to prevent race conditions. A race condition often occurs when two or more threads need to perform operations on the same memory area. The way to use mutex is similar to what we did in class. Generate a variable "pthread_mutex_t m" and lock it with command "pthread_lock(&m)". We can also unlock the mutex by using "pthread_unlock(&m)".
We use "join" when one thread wants to wait for another thread to finish. This is same manner as what we did in "weensyos1" and following is syntax to use a join.
#include <pthread.h> int pthread_join(pthread_t thread, void **value_ptr);
The condition variable has many functions. With these functions, we can create, destroy, wait on condition, and wake on condition. Following is actually code for them.
#include <pthread.h> int pthread_cond_init(pthread_cond_t *cond, const pthread_condattr_t *attr); int pthread_cond_destroy(pthread_cond_t *cond); pthread_cond_t cond = PTHREAD_COND_INITIALIZER; int pthread_cond_wait(pthread_cond_t *cond, pthread_mutex_t *mutex); int pthread_cond_timedwait(pthread_cond_t *cond, pthread_mutex_t *mutex, const struct timespec *abstime); int pthread_cond_signal(pthread_cond_t *cond); int pthread_cond_broadcast(pthread_cond_t *cond);
We are almost there. Now, we know how to create and how to use it properly. The last step would be "exiting the thread". The way to close a thread is really simple. Following is how to terminates the calling thread.
pthread_exit
#include <pthread.h> void pthread_exit(void *value_ptr);
As I mentioned earlier, the thread is often used to make a server. Then, how can we make the server to handle all the connections by using thread? First, we need a big "while(1)" loop as the ideal server should run forever. In the first line, we accept the connections and we create a thread for each connections. Following is the naive pseudo-Apache with thread.
while (1) { int fd = accept conn; pthread_create (.,.., handle_conn, (void *) fd); }
The question is how can we handle each connection. The answer is "Use a file descriptor (fd)". In each threads, we create a fd and read/write for request by using the fd. When we done to listen and response the client's request, we close the fd and exit the thread. Following is pseudo-code for handling connection.
handle_conn
void handle_conn(void *x) { int fd = (int) x; read from fd; read file requested; write file to fd (connection); close(fd); pthread_exit(); }
This approach looks pretty good and most of web-servers are using this approach.
We saw two different approach process and thread for web server. And, we also can realize that thread is better than fork. Why is that?
Actually, the thread is much faster in real time measument. The following table compares timing results for the fork() subroutine and the pthreads_create() subroutine from POSIX Threads Programming.
| fork() | pthread_create() | |||||
|---|---|---|---|---|---|---|
| Platform | real | user | sys | real | user | sys |
| IBM 375 MHz POWER3 | 61.94 | 3.49 | 53.74 | 7.46 | 2.76 | 6.79 |
| IBM 1.5 GHz POWER4 | 44.08 | 2.21 | 40.27 | 1.49 | 0.97 | 0.97 |
| IBM 1.9 GHz POWER5 p5-575 | 50.66 | 3.32 | 42.75 | 1.13 | 0.54 | 0.75 |
| INTEL 2.4 GHz Xeon | 23.81 | 3.12 | 8.97 | 1.70 | 0.53 | 0.30 |
| INTEL 1.4 GHz Itanium 2 | 23.61 | 0.12 | 3.42 | 2.10 | 0.04 | 0.01 |
An Idea of context switch is for sharing single CPU resource. Following is how the context switch looks like.
This context switch is a lot slower than function call. Why is that? The system call-by-hardware approach can become very slow, however, because the software interrupt and the context switch require heavy and complex operations. On the recent Pentium 4, the software interrupt and context switch is about 132 times slower than a mere function call. [citation: Linux Journal].
Threads can be a lot better than processes in terms of resource usage. Since multiple threads in the same process share a single address space and file descriptor table, creating a new thread can be faster and cheaper than creating a new process. Nevertheless, thread creation is relatively heavyweight. Different threads have different stacks, and allocating a new stack is pretty time consuming (and occupies non-negligible memory). Switching between threads in a single process may require a system call. There are several sources of overhead and expense. It turns out the fastest real servers don't use the threaded approach, exactly. Instead they use a variant of I/O calls that allow a single thread to handle multiple connections without blocking on any individual connection. That avoids the monopolization problem, without costing much in terms of overhead!
Non-blocking I/O is an approach to file input/output that doesn't require the processor to wait for I/O operations to complete before continuing the rest of its processes. Compared to blocking I/O, which leaves system resources idle until the physical task of I/O has completed, non-blocking I/O offers an improvement in information throughput while decreasing latency. For instance, when a process finds that a file descriptor can't be used since it would block, the process can continue any processing which does not depend on that file descriptor in the meantime. The file descriptor must be polled until it is ready, however, which can consume CPU time. This behavior is implemented with status flags that are recognized by system processes to determine whether the I/O can be non-blocking. These flags are used as an argument in the open() function to determine whether the new file descriptor can be used for reading, writing or both by system processes.
O_NONBLOCK - Enables non-blocking I/O mode for the file. O_RDONLY - Opens the file for read access. (POSIX-specified)O_WRONLY - Opens the file for write access. (POSIX-specified)O_RDWR - Opens the file for both read and write access. (POSIX-specified)O_READ - Opens the file for read access. (GNU-specified)O_WRITE - Opens the file for write access. (GNU-specified)O_EXEC - Opens the file for execution. (GNU-specified)EAGAIN - The file/resource is unavailable for non-blocking I/O. This happens when an operation attempts something on an object that would block although the file/resource is in non-blocking mode. (lit: "Error: try again")EWOULDBLOCK - The resource is unavailable for non-blocking I/O. This happens when an operation attempts something on an object that would block although the file/resource is in non-blocking mode. (lit: "Error: this operation would block", identical to EAGAIN except where specified on older UNIX systems)EINTR - A signal interrupted the operation.EPERM - Operation not allowed due to insufficient permissions.EBUSY - The file is busy/locked.ENOMEM - There isn't enough memory for the operation.EIO - A general error occurred.EACCESS - Illegal access.EINVAL - An illegal value was passed to the function.