====== Lecture 5 Scribe Notes ====== //October 15, 2007// //by Justin Meza, Xiangping Qiu, and Tom Mallery// ===== Orthogonality As a Design Paradigm ===== Orthogonality in its most literal sense refers to two or more vectors which are all at right angles to one another. We can easily see how a combination of these vectors, with their unique directions, may be used to trivially define any other vector we wish. {{notes:lec5:scribe-notes_lec5-fig1.png|}} **Orthogonality** in the context of operating systems is related in that it refers to the singularity of a module in terms of the particular function it accomplishes. We introduce six such fundamental operating system operations originally implemented in UNIX which, in conjunction with one another, provide a multitude of functionality. These operations are //orthogonal// with respect to one another because each brings with it a unique functionality not provided by the others. The beauty of these operations comes in their simplicity and versatility. These operations are: * ''open'' * ''read'' * ''write'' * ''close'' * ''fork'' * ''exec'' A recurring theme in operating systems programming is that of orthogonality, namely, making the various interfaces of an operating system as orthogonal to one another as possible. To understand why one might want to do this, we turn to an example. ==== Creating a File ==== Let's suppose we wanted to implement a new system call designed expressly for the creation of a file. It's prototype might look like: int create(const char* filename) This is a perfectly reasonable function, but is it the //best// way to create a file from the perspective of orthogonality? The answer is "no" for the simple reason that implementing such a function would require accomplishing nearly the same task as our //existing// ''open'' operation. Therefore, for the sake of orthogonality, we might instead add flags to the ''open'' operation to include the functionality embodied in creating a file. In fact, this is the case on a UNIX-like operating system. **Open flags** such as * ''O_CREAT'' (open and create a new file) * ''RD_ONLY'' (open a file in read-only mode) * ''WR_ONLY'' (open a file in write-only mode) * ''0_RDWR'' (open a file for read and write) exist for the sole purpose of preventing the extreme situation in which the operation embodied in each of these calls would be implemented separately. In fact, with the use of a bitwise OR operator (''|'') we are able to do //even more// by combining these operation into one single call to ''open''. Therefore, if we wished to create a new file with the name "temp" we might do the following: file = open("temp", O_CREAT); \\ **Side Note**\\ In addition to the traditional open() function, under Unix based Operating Systems, if a new file is being made one can call open() with three arguments in which the third is the mode. This mode represents the privileges that the file is going to have after being made. open(char *filename, int ORdOptions, mode_t mode); Some of the mode options are: * S_IRUSR - sets current user to have read privileges * S_IWUSR - sets current user to have write privileges * S_IRGRP - sets group to have read privileges * S_IROTH - sets everyone else to have read privileges **End Side Note**\\ \\ However, this method of creating a file leaves one thing to be desired: process isolation. Under this scheme, one might run into the situation in which one process could potentially overwrite another process's file. Consider the following sequence of events: - Start Process 1 and Process 2, each on a separate processor. - Process 1 creates a file called "temp" using ''open("temp", O_CREAT)'', writes a bunch of data to that file, and then does some other work... - Meanwhile, Process 2 starts running and //also// wants to create a file called "temp" with ''open("temp", O_CREAT)'', and as a result, Process 1's file becomes overwritten without its knowledge or permission. This sort of condition, in which two processes' //timing// effects the outcome of an operation is known as a **race condition**, as both ones seemingly "race" to affect the outcome of an operation. We might naïvely attempt to fix this problem through the use of another function, ''exists'', with the following prototype: exists(const char* filename) Let's pretend Process 2 in our example above used the following code while it attempted to create //its// temp file to try and prevent our race condition: ... char filename[100]; generate_random_filename(filename, 100); // This does what one would expect while (exists(filename)) generate_random_filename(filename, 100); // (*) open(filename, O_CREAT); ... Let's try and understand what is going on here. Process 2 uses a function, ''generate_random_filename'', to try and pick a completely random name for its file in an attempt to avoid collisions with any previously-created files. Using our proposed ''exists'' function, we check to see if the randomly-named file already exists, and stop checking when we have picked a (thus-far) unique name. We then create our file and proceed. Does this solve our race condition? The answer is "no" because (however unlikely) Process 1 might create an //identically-named// file at ''(*)'' which would be promptly overwritten by Process 2's subsequent ''open'' commend (remember: our two processes are running asynchronously from each other). So, we might encounter: - There's a file called ''a3r52'' already there. Both process 1 and 2 run the generate_random_filename line. - They both generate ''h84rf'' and run exist line and find this new file does not exist. - They both execute the open line causing conflict. or - Process 1 creates a file named ''a3r52''. - Process 2 checks to see if a file by the name of ''a3r52'' exists before creating it. - It does exist (Process 1 just created it) so Process 2 now checks if a file named ''h84rf'' exists. - It does not, so Process 2 is satisfied it has found a unique name and stops checking. - Before Process 2 actually creates ''h84rf'' with open, operating system switch to Process 1, - Process 1 also generates ''h84rf'', since it has not been created, it calls open and writes bunchs of data to it. - Process 2 (still thinking its random name is unique) creates //its// ''h84rf'' file and wipes out Process 1's data. Now, Process 2 might place some sort of lock on temp file creation before it generates its name and after it creates its file. //This// would, in fact, solve the race condition:\\ ... char filename[100]; acquire_lock(TEMPFILES); generate_random_filename(filename, 100); // This does what one would expect while (exists(filename)) generate_random_filename(filename, 100); open(filename, O_CREAT); release_lock(TEMPFILES); ... But this methodology is still not optimal. If Process 2 were to enter an infinite loop while having a lock on I/O, it might prevent other processes from performing their useful tasks. Suddenly, we have lost process isolation once again. What if, however, we chose to move the functionality of ''exists'' into our existing ''open'' operation? It is only a small change from our existing operation. In fact, this is how UNIX-like systems behave with the use of the ''O_EXCL'' flag: * ''O_EXCL'' (if a file already exists and this flag is OR'd with ''O_CREAT'' the ''open'' call fails) We return to our previous example using the ''O_EXCL'' to solve our shortcomings: - Start Process 1 and Process 2, each on a separate processor. - Process 1 creates a file called "temp" using ''open("temp", O_CREAT | O_EXCL)'', writes a bunch of data to that file, and then does some other work... - Meanwhile, Process 2 starts running and //also// wants to create a file called "temp" with ''open("temp", O_CREAT | O_EXCL)'', and as a result, is unable to! Process 1 has already created a file called "temp" and Process 2 can not gain exclusive rights to the file. The writers of Process 2 can figure out what to do at this point. {{notes:lec5:scribe-notes_lec5-fig3.png|}} **Side Note:** If you have different processes that need to access a single file simultaneously and only need to have a certain part of the file be undisturbed for something like writing some bits while another process reads from a different part one could call: fcntl(int filedescriptor, int cmdPurpose, struct flock flockptr); To put a lock on a segment of that file. This function has a few different uses but for this example it's just used to lock portions of a file. The first argument is the file descriptor to the file that needs have some locks put in place. Second is an integer that represents why you are calling it, in this case it would be F_GETLK, F_SETLK, or F_SETLKW. These are used for getting the lock, setting/clearing the lock without blocking, and setting/clearing a lock with waiting. The last argument is a structure that contains a start position, offset from that position and the length of the segment to lock among other things. For a detailed description of all the possible functionality of fcntl refer to the man page, "man fcntl". Locks there can cause problems too. For example if a process locks a file and then go into an infinite loop nothing else could use that since the remove lock command would never be called. To solve this problem leases are created. When the lock is set and if the lock hasn't been removed when the lease expires the Operating System removes the lock and the file is now fair game. **End Side Note** ==== Temporary Files ==== What if our files are merely temporary? It would be advantageous for us to provide the operating system with some context regarding why we require a file. An operating system might choose to place the temporary file on a faster disk. {{notes:lec5:scribe-notes_lec5-fig2.png|}} Some advantages of operating systems that support the creation of temporary files is that they have more freedom to place them where they will be most optimal for the user (on a faster disk, for example) and that the operating system may delete these temporary files when they are no longer in use. Also, making sure these temporary files are //persistent// (written to the disk in an atomic manner) is not necessarily important for temporary files, and operations such as //flushing// (pushing data from the buffer to its output destination instead of waiting for the buffer to be full) may be neglected. ===== Fork and Exec ===== Two of our basic operations, ''fork'' and ''exec'' handle the creation and execution of processes, respectively. Processes are directly related to the process that created them. We might think of their relationship as something similar to that of a parent and its child. We call created processes //children// and the processes that created them //parents//. It is important to understand that after a call to ''fork'', what was once //one// process is now //two//, each executing from the same line of code right after the call. ''fork'' returns either of two values: a Process ID, or zero. Thus, our parent process might identify itself (and reference its newly-created child) by checking for a non-zero return value from ''fork'', while a child would know it was so by checking for a zero value return from ''fork''. {{notes:lec5:scribe-notes_lec5-fig4.png|}} One might fork a process, creating a parent and child using the following code: pid_t pid; // Will hold Process ID's pid = fork(); // Do the fork! if (pid == 0) { // Child // Do child code. } else { // Parent // Do parent code. } return; Of course, a shell is a process. Consider the following shell command: ls | sort -r We might envision the parent/child relationship of the shell and its processes in the following manner: - Fork our shell into two individual processes. - Execute the ''ls'' program in the child process. - Execute the ''sort'' process in the parent process (what was the shell). A corresponding process tree might look like: shell -> sort | +-> ls However, we would lose our shell if we proceeded in this manner. Let us, then, do the following: - Fork our shell into two individual processes. - Execute the ''ls'' program in the child process. - Fork our parent shell process again. - Execute the ''sort'' process in the new child. - Our shell has two children now. shell | +-> ls | +-> sort Of course, there are a multitude of ways to approach the ''fork''ing and ''exec''ing of processes, we might have wound up with the following process tree for the equivalent result: shell | +-> shell -> ls | +-> sort Notice that ''fork'' and ''exec'' have very little overlap: we say they are orthogonal. ==== A Comparison of Process Creation ==== We will introduce one way of creating a new process, ''spawn'', which combines some of the functionality of ''fork'' and ''exec'' and compare and contrast these two approaches for creating processes. === ''Spawn'' === In Windows operating systems, ''spawn'' starts a new process that runs a specified program but it has less flexibility than ''fork'' and ''exec'' since sometimes users want to run the new process in an environment similar to, but different from the original process. A good example is the shell pipe. In the child process of the command to the right side of pipe, we close the writing end of the pipe and do other preparation. In the parent process, we close the reading end of the pipe. We cannot do this before the fork because doing so would result in both ends of the pipe being closed for our parent and child. We don't want to do it in the child process since we don't want the child program to know how it is invoked and how their I/O are connected i.e. we want they to be orthogonal! With seperate fork and exec, the connection code between fork and exec is also localized, more flexible and easier to debug (Note that Windows-based systems do not implement data pipes.) People may ask, since fork often is soon followed by exec, isn't most of the copying done in fork wasted of resources? Well, the copy-on-write approach of fork takes care of this problem. It makes the parent and child share the data segment, consisting of pages, till the segment is modified. As soon as the child or parent tries to write to the segment page, the page is copied to give each its own version. Since exec follows fork quickly in most cases, very few pages, if needed, will have to be copied. ==== From ''fork'' to ''exec'' ==== We now take a moment to analyze some of the changes which take place internally to a process between a ''fork'' and ''exec'' command. The Process ID, Parent PID. The child's accumulated execution times are also reset to zero since it just starts. Note, the file descriptor table is copied so that later on the child can open its new descriptor independent of the parent. But existing ones in the table are pointed to the same place so if any of them move the pointer like using the lseek, the other would see it too. As for exec, it's kind of opposite. The code, data, stack, registers are replaced but the pid, ppid stays. Note, on the signal, if the parent has arranged to catch any signal, since the signal handler code is gone after exec, it's no longer available and thus those signals are reset to default action. ==== Destroying Processes ==== It seems this is a big violation of process isolation. It is but is needed. Think of cases when the process goes into infinite loop on time-shared system. The resource cannot be effectively used by other processes. For robustness we need to have some way to kill the process. The system call is int kill(pid_t pid, SIGKILL) ==== Discovering Process Exit Status ==== **Option 1**: We can assign a unique process id for every process and use int is_process_running(pid_t p) to tell its status. **Option 2**: When the id for a process is not needed, release it for others to use. So when a process exits, its pid is released? There's complication here. If a child exits but its parent has not issued waitpid and if the OS releases the child pid anyway, this pid can be used by another process later and if the parent calls waitpid then, it's waiting for the wrong process and can get the wrong return value also. Therefore when a child exits, most of the resources are freed, but its process descriptor structure stays till the parent issues wait for it. Before that, the exited child is in a ZOMBIE state. **Aside**\\ Option 2 is the only logical choice for systems with a finite amount of memory, which is all of them. With option one there could only be a finite number of ran processes on the system and when that amount was hit no more processes could be created and the system would be stuck. How do we know for sure there'll be no waitpid for a child process? When the parent exits: time to collect zombies. But what if the process already exited before the child? Every some seconds, these children will be reparented to system init process. A note about reparenting, it's also seen in tracer tool which can trace the system call made by the program. There's a system call ptrace that can attach a process to a different parent. Is this interface perfect? No. In lab1b, to implement background after statement, the after command must somehow know if the after-ed job is done. Since that job's parent is the shell and the after cmd, another child of the shell, has no direct way of knowing that job's status. So this is imperfect. So how do we handle that with this model? As we'll cover soon, we can ask the shell to send a signal to the after cmd when the shell knows the job has terminated. In windows, this is handled differently. Every kernel object, like processes, threads, processes, mutes are assigned a handle like a file descriptor. Pretty much any process can query the state of another process. ===== Signals ===== We talked about hardware interrupt before where the processor uses the pointer of function listed in interrupt vector table to jump to the interrupt handler. Similar approach is used for signals. Processes can register signal handler for signals. Back to lab1b. We said three paragraphs earlier that we can ask the shell to send a signal to the after cmd process when the shell knows the job has terminated. But how does the shell know the background job is not there anymore? Turns out kernel delivers a SIGCHLD signal to the parent when that happens. But default action for SIGCHLD is to ignore it. So we need to add signal hander for SIGCHLD. The system call is signal(int signum, sighandler_t handler); See the following SIGCHLD handler, void collect_all_zombies(int signalno) { int status; (void) waitpid(0, &status, WNOHANG); write(STDOUT_FILENO, "C", 1); } int main(int argc, char *argv[]) { signal(SIGCHLD, collect_all_zombies); if (fork() == 0) { sleep(10); // sleep for 10 seconds exit(0); } while (1) { sleep(1); write(STDOUT_FILENO, "W", 1); } } What is the result of the program? It resembles the following, WWWWWWWWWWCWWWWWWWWWW... See the 'C' after bunches of 'W'. It's printed from the SIGCHLD handler. So this is really asynchronous. We've seen issues with it when talking about different temp files. Will that also occur to signal handling?. Let's look at the following example, int main(int argc, char *argv[]) { int status, pid; pid = fork(); if (pid == 0) { sleep(10); // sleep for 10 seconds exit(0); } waitpid( pid, &status, 0 ); printf("DONE"); } Does it always wait for the child to exit before printing ''DONE''? Not if the waitpid is interrupted, in which case, waitpid returns EINTR. To make the program more robust, try this, int main(int argc, char *argv[]) { int status, pid, ret; pid = fork(); if (pid == 0) { sleep(10); // sleep for 10 seconds exit(0); } while ((ret = waitpid( pid, &status, 0 )) == EINTR) { printf("waitpid returned: %d\n", ret ); printf("DONE"); } } ===== Summary ===== * We looked at the concept of //orthogonality// from the perspective of the core operations of a UNIX-like operating system, ''open'', ''read'', ''write'', ''close'', ''fork'', and ''exec''. Because these operations were orthogonal to one another, we were able to provide a vast variety of functionality from a very small operation set. * We focused on what it takes for an operation to be orthogonal and steps we can take to make one so through the use of ''open'' as an example (These included looking at how much overlap exists between already existing operations like open, and a proposed separate call such as create. We also see the //race conditions// resulting from this if open call does not implement O_EXCL within itself). * We then spent some time looking at the core of process creation and execution, ''fork'' and ''exec''. We compared them with another way of starting processes which embodies both of their tasks, ''spawn'', and concluded that greater power lies in separation of these actions. * Once we have ''fork''ed, the following ''exec'' call changes some process-specific information. In a nutshell, the call to ''exec'' //**copies**// the file descriptor table, the Process ID, and the state of the process (blocked, running, etc.) and replace the program code. * We then went over methods for destroying processes using signals, specifically ''SIGKILL'' and ways to discover the exit status of a process through ''waitpid''. * We ended with a look at signals, specifically, collecting //zombie// processes (those which are done executing, but still unacknowledged children of another process) through the use of the ''signal'' function.