====== Lab 1b: CS 111, Spring 07 ====== **Due Friday, October 19** [[http://courseweb.seas.ucla.edu/classAssignment.php?term=07F&srs=187336200|Skeleton code available on SEASnet]] \\ **October 14**: New lab1b-tester.pl available on SEASnet! ===== Overview ===== In this second part of Lab 1 you will build upon your command line parser to make a complete **shell** which can actually execute the parsed commands. You'll implement support for all commands that can be parsed, which may use features such as I/O redirection, pipes, and conditional, sequential, and background execution. You'll also implement the two internal commands ''cd'' and ''exit'', which change the working directory and exit the shell, respectively. (Puzzle: why do these two commands have to be built into the shell, when all other commands are executed by running the corresponding program?) We've provided extensive skeleton code for these parts of the lab. But in addition, you must complete an **open-ended problem** for which we have not provided skeleton code, namely //background command chaining//. ==== What is a shell? ==== A shell is a program whose main purpose is to run other programs. The shell parses commands into lists of arguments, then executes those commands using ''fork()'' and ''execvp()''. Here's a simple shell command: $ echo foo (The initial ''$'' is not part of the command. That is shorthand for the //prompt// that the shell prints before reading a command.) The shell parses this command into two arguments, ''echo'' and ''foo''. The ''echo'' argument names the binary that should be executed. So the shell forks a child process to execute ''echo'' with those two arguments. The ''echo'' program has a very simple job: it just prints its arguments to the console. (''echo foo'' will just print ''foo''.) Meanwhile, the parent waits for the child to finish; when it does, the parent returns to read another command. You may be interested in [[http://www.linuxcommand.org/learning_the_shell.php|a reasonable tutorial for Unix shells]]. You can find others by searching for, e.g., [[http://www.google.com/search?q=shell+tutorial|''shell tutorial'' on Google]]. Let us know if you find one you really like. Each program has //standard input//, //standard output//, and //standard error// file descriptors, whose numbers are 0, 1, and 2, respectively. (You know them in %%C++%% as ''cin'', ''cout'' and ''cerr''; in C the files are called ''stdin'', ''stdout'', and ''stderr''.) The ''echo'' program writes its output to the standard output file descriptor. Normally this is the same as the shell's standard output, which is the terminal (your screen). But the shell lets you //redirect// these file descriptors to point instead to other files. For example: $ echo foo > output.txt This command doesn't print anything to the screen. But let's use the ''cat'' program, which reads a file and prints its contents to standard output, to see what's in ''output.txt'': $ cat output.txt foo The ''> //filename//'' operator redirects standard output, ''< //filename//'' redirects standard input, and ''2> //filename//'' redirects standard error. (The syntax varies from shell to shell; we generally follow the syntax of the //Bourne shell// or ''bash''.) Shells offer many ways to chain commands together. For example, the '';'' operator says "do one command, then do another". This shell command prints two lines: $ echo foo ; echo bar foo bar The ''&&'' and ''%%||%%'' operators chain commands together based on their //exit status//. If a command accomplishes its function successfully, that command generally exits with status 0, by calling ''exit(0)''. (This is also what happens when the program runs off the end of its ''main'' function.) But if there's an error, most commands will exit with status 1. For example, the ''cat'' command will exit with status 0 if it reads its files successfully, and 1 otherwise: $ cat output.txt foo [[exit status 0]] $ cat doesnotexist.txt cat: doesnotexist.txt: No such file or directory [[exit status 1]] Now, ''&&'' says "execute the command on the right only if the command on the left exited with status 0". And ''%%||%%'' says "execute the command on the right only if the command on the left exited with status NOT equal to 0". For example: $ cat output.txt && echo "output.txt exists!" foo Output.txt exists! $ cat doesnotexist.txt && echo "doesnotexist.txt exists!" cat: doesnotexist.txt: No such file or directory [[Note: does not run echo!]] $ cat output.txt || echo "output.txt does not exist." foo $ cat doesnotexist.txt || echo "doesnotexist.txt does not exist." cat: doesnotexist.txt: No such file or directory doesnotexist.txt does not exist. Parentheses ''( )'' allow you to run a set of commands in a //subshell//. This, in turn, lets you redirect the standard input, output, and error of a //group// of commands at once. For example: $ ( echo foo ; echo bar ) > output.txt $ cat output.txt foo bar The exit status of a subshell is the same as the exit status of the last command executed. Finally, you can also execute a command in the //background// with the ''&'' operator. Normally, the shell will not read a new command until the previous command has exited. But the ''&'' operator tells the shell not to wait for the command. $ echo foo & $ foo (''foo'' is printed on top of the next shell prompt.) You may find the following commands particularly useful for testing your shell. Find out what they do by reading their manual pages. Be creative with how you combine these! (Also see the Lab 1B tester, below.) * ''cat'' (print one or more files to standard output) * ''echo'' (print arguments to standard output) * ''true'' (exit with status 0) * ''false'' (exit with status 1) * ''sleep'' (wait for //N// seconds then exit) * ''sort'' (sort lines) In addition to running programs from the file system, shells have builtin commands that provide functionality that could not be obtained otherwise. Our shell will implement two such builtin commands, to change directories (''cd'') and to exit the shell (''exit''). The ''cd'' command changes the shell's //current directory//, which is the default directory that the shell uses for files. So ''cd dir'' changes the current directory to ''dir''. (You can think of the current directory as the directory that shows up by default when you use an "Open" or "Save" dialog in a GUI program.) Of course, files can also be manipulated using //absolute pathnames//, which do not depend on the current directory; ''/home/cs111/cmdline.c'' is an example. There may also come a time when you would like to leave the wonderful world of your shell; the ''exit'' command instructs the shell to exit with a given status. (''exit'' alone exits with status 0.) (Why are ''cd'' and ''exit'' part of the shell instead of standalone programs?) ===== Lab materials ===== The skeleton code you will get for this second part of the lab consists of an updated ''Makefile'' and three additional files beyond what Lab 1A provided: ''ospsh.c'', ''ospsh.h'', and ''main-b.c''. They are contained in the file ''lab1b.tar.gz'' which you can extract just like you did in Lab 1A. You will need to copy your version of ''cmdline.c'' from your ''lab1a'' directory (with the ''cp'' command). Most of the instructions for Lab 1B are included as comments in ''ospsh.c'', but there is one exercise to complete in ''main-b.c''. Additionally, you will need to add simple job control features, as described below. You will likely find it helpful to learn more about the following functions (their manual pages are fairly descriptive): ''fork()'', ''WEXITSTATUS()'', ''execvp()'', ''waitpid()'', ''open()'', ''close()'', ''dup2()'', ''pipe()'', and ''chdir()''. Remember that you can read manual pages by using the ''man'' program. Our solution to the labeled lab exercises takes 109 lines of code above & beyond the lab handout (not counting ''cmdline.c'' and ''cmdline.h'' from Lab 1A). ===== Running ===== The updated ''Makefile'''s default target is now the program ''ospsh'', rather than ''cmdline''. (You can still compile just the parsing parts of your program by running "''make cmdline''" to tell ''make'' to build that target.) To run your shell, you can type "''./ospsh''" at the command prompt. It will display its prompt, just like ''cmdline'' did. Now you can type commands to it, and after each command you type, it will display the parser output and then attempt to execute the command. Initially, it won't execute anything, but as you complete the lab, more and more kinds of commands will work. To quit your shell, you can press control-C to kill it, and after you implement the ''exit'' command, you will be able to type "''exit''" to terminate your shell. ===== Open-Ended Problem ===== Most of the exercises in the lab are specified in the source code as ''EXERCISE'' comments. However, there is one exercise which is left up to to you: **background job chaining**. Background job chaining is a specialized form of process control that lets a shell user execute one command //after a previous background command completes//. Background job chaining has two parts. - You must assign a //"job ID"// to each background command. - You must implement the ''after'' command for chaining commands. First, assign a job ID to each background command. This will let you distinguish different background commands so you can set up chains. Job IDs are integers assigned starting from 1. For each background command, you should print out the command's job ID and process ID after executing the command. For example: cs111_fall07$ sleep 20 & [2 args "sleep" "20"] & [1] 4980 cs111_fall07$ Notice the ''**[1] 4980**'': this is the job ID (''1'') and process ID ''4980''. Job IDs are reused over time. For instance, in Unix shells, each new background command is assigned the minimum positive job ID //x// so that all active background jobs have job ID less than //x//. For example: cs111_fall07$ sleep 500 & [1] 4980 cs111_fall07$ sleep 10 & [2] 4982 ## Job ID 1 is still active, so pick 2. cs111_fall07$ ... wait 15 seconds ... cs111_fall07$ sleep 10 & [2] 4984 ## The previous Job ID 2 exited, so job ID 2 is available. cs111_fall07$ sleep 1000 & [3] 4986 ## Job IDs 1 and 2 are active, so pick 3. You may use a different job ID reuse strategy, but you must reuse all job IDs eventually. Second, implement the ''after'' command to chain jobs together. ''after'' works like this: A command ''after //JOBID// //COMMAND//'' waits until job ID ''//JOBID//'' exits, then executes the given ''//COMMAND//''. For example: cs111_fall07$ sleep 500 & [1] 4980 cs111_fall07$ after 1 echo Hello, world ... waits about 500 seconds for job 1 to exit, then prints: Hello, world If ''after'' is called with too few arguments, or bad arguments, it should write ''after: Syntax error'' to standard error. For example: cs111_fall07$ after X after: Syntax error cs111_fall07$ after after: Syntax error Note that ''after'' commands can have redirections just like normal commands. cs111_fall07$ after X 2> /tmp/x cs111_fall07$ cat /tmp/x after: Syntax error ''after'' commands can be chained together by putting them in the background. For example: cs111_fall07$ sleep 50 & [1] 49283 cs111_fall07$ after 1 echo Hello & [2] 49284 cs111_fall07$ after 2 echo World ... waits about 50 seconds, then prints: Hello World cs111_fall07$ You can also specify two or more commands to run after a background command completes. These commands are executed in any order. For example: cs111_fall07$ sleep 50 & [1] 49182 cs111_fall07$ after 1 echo Hello & [2] 49183 cs111_fall07$ after 1 echo World ... waits about 50 seconds, then prints: Hello World ... or maybe it might print: World Hello A command like "''after 1''" (with nothing after the ''1'') just means "wait until that job ID completes, then exit with status 0". If an ''after'' command's job ID has already completed, then the ''after'' command should run immediately. Thus: cs111_fall07$ echo Hello & [1] 49182 Hello cs111_fall07$ after 1 echo Goodbye Goodbye ## prints immediately The exit status of an ''after'' command equals 1 on syntax error, and otherwise equals the status of the contained command. ''after'' commands should "wake up" immediately after the relevant job ID completes, //even if the shell is waiting for something else at the time//. For example: cs111_fall07$ sleep 50 & [1] 19823 cs111_fall07$ after 1 echo First & [2] 19825 cs111_fall07$ sleep 100 ; echo Second ... waits about 50 seconds, then prints: ... First ... waits another 50 seconds or so, then prints: ... Second **Subshell note:** Background commands //inside// subshells (parentheses) do not get their own job IDs. cs111_fall07$ ( echo Hello ) & ## background subshell gets a job ID [1] 40193 Hello cs111_fall07$ ( echo Hello & ) ## background command IN subshell does not Also, you should not worry about ''after'' commands inside subshells. ''after'' does not mean anything special inside a subshell. Implementing the ''after'' command will be a bit tricky. Questions to ponder: How does the shell know when a background process completes? How can the shell tell a waiting ''after'' command to start running? For full credit, you should also avoid [[http://en.wikipedia.org/wiki/Race_condition#Computing|race conditions]]. Some useful system calls for avoiding race conditions include ''pselect''. Although ''after'' is, as far as we know, a feature unique to ospsh, features like ''after'' are common in real Unix shells. In general features like ''after'' are given the name //job control//. See [[http://people.ischool.berkeley.edu/~kevin/unix-tutorial/section13.html|this reference]] for a good introduction to job control. ===== Testing ===== We have provided a ''lab1b-tester.pl'' script that will help you to test your shell by giving it a set of sample inputs designed to test many of the features of your shell. **However, ''lab1b-tester.pl'' does not test all of the behavior the lab requires.** You should also design your own test cases to run either by hand or automatically. You may compare your shell's behavior to that of the shell ''bash'', which might even be your default system shell. If you're not sure, you can always run it by typing ''bash'' at your prompt. All the syntax your shell is required to handle is also accepted by ''bash''. (In a few isolated cases, such as ''> out'' with no command, your shell should produce an error on syntax that is OK for ''bash''. The lab1a tester caught those cases.) We will be using scripts very similar to the tester scripts included with Lab 1B to actually grade your labs. To run the Lab 1A tester, you may need to run "''make cmdline''" first in order to compile the Lab 1A portions of the lab separately. These tester scripts expect to be run on a Linux machine; although you can do Lab 1 on other Unix machines or even Windows with Cygwin installed, the tester scripts will incorrectly report errors for some test cases when they are run on platforms other than Linux. It is probably easiest to just use a Linux machine, since later labs will require it anyway. The Lab 1B tester does not test ''after'' commands at all. You should design your own tests for that functionality. ===== Design problem ideas ===== If you have been assigned a design problem for Lab 1, you may implement one of the following problems, or design your own, as long as it's similar in difficulty and the course staff approves. (If you want to design your own, //do it early// and get approval from us early.) [[design|How to do design problems.]] ==== Control structures ==== Design and implement analogues of the ''if'', ''for'', and ''while'' control structures common to many programming languages. For example, your ''if'' structure should execute several commands in order, but only if some condition is true -- for example, only if a command exits with status 0. You may investigate current shells, such as bash, to see how they implement these structures, but your design must differ from bash/tcsh in at least one way. Point out that difference and describe why you think your version is better. ==== Shell functions ==== Design and implement a way for shell users to write their own "functions". Once a function ''f'' is defined, typing ''f'' at the command line will execute the function, rather than a binary program named ''f''. For example, the user might write a function ''echo_twice'' that printed its arguments twice, by running the ''echo'' command twice. Discuss how other command line arguments will be passed to the shell function. As above, you may investigate current shells, such as bash, to see how they implement these structures, but your design must differ from bash/tcsh in at least one way. Point out that difference and describe why you think your version is better. ==== Complex redirections ==== Design and implement a new shell syntax that allows more complex forms of redirection. In particular, your syntax should support arbitrary redirection graphs, including: * Run five commands, ''a'', ''b'', ''c'', ''d'', and ''e'', where: * ''a'' stdout ==> ''b'' stdin (this is a conventional pipe). * ''a'' stderr ==> ''c'' stdin. * ''b'' file descriptor 3 ==> ''d'' file descriptor 4. * ''d'' stdout ==> ''e'' stdin. * Run two commands, ''a'' and ''b'', where: * ''a'' stdout ==> ''b'' stdin. * ''b'' stdout ==> ''a'' stdin. (This is a "bidirectional" pipe.) You may investigate bash's implementation for some ideas (look for ''>&''), but bash does not completely support this feature. ==== Tab completion ==== You may have noticed that in the default shell on the Linux machines in the lab, you can use the tab key to automatically complete partially typed commands, if the shell can figure out what you were trying to type (by searching for programs or files starting with what you have typed so far). Add support for this tab completion feature. Your tab completion should handle both program names in the default path (use the ''PATH'' environment variable) and special commands, such as ''cd'' and ''exit''. You may want to investigate the GNU ''readline'' library, which can help. ===== Hand-in ===== When you are finished, edit the file named ''answers.txt'' and follow the instructions at the top of the file to fill in your name(s), student ID(s), email address(es), short descriptions of any extra credit or challenge problems you did, any known limitations of your code (including known bugs), and any other information you'd like us to have. Then run "''make tarball''" which will generate a file ''lab1b-''//yourusername//''.tar.gz'' inside the lab1b directory. Upload this file to CourseWeb using a web browser to turn in the project. Remember to upload it only once if you are working in a team - the ''answers.txt'' file will allow us to give both team members credit. **Good luck!**