You are expected to understand this. CS 111 Operating Systems Principles, Fall 2006
You are here: CS111: [[2006fall:lab2]]
 
 
 

Lab 2

Update, 11/2/2006

Some of you have noticed that Linux can block some open attempts before your code gets control. This behavior will help out your bounded wait implementation! However, we want your bounded wait implementation to work independently, so you'll get partial credit if you depend on this behavior. You can avoid some of this behavior (but not all of it) by adding the following lines right at the top of osprd_open:

     int exclusive = (filp->f_flags & O_EXCL);
     filp->f_flags &= ~O_EXCL;

Then wherever you said "if (filp->f_flags & O_EXCL)", say "if (exclusive)" instead.

Overview

In Lab 2, you'll actually work inside the Linux kernel! You will write a kernel module -- code that can be dynamically downloaded into the kernel -- that implements a ramdisk: an in-memory block device. (A block device is basically a device that acts like a disk. It supports random-access read and write operations of fixed-sized units called sectors. Random-access means the disk can read or write sectors in any order -- unlike a pipe, for example.)

Your ramdisk will support the usual read and write operations. You can read and write the ramdisk by reading and writing a file, for instance one named /dev/osprda. (In fact, you could also initialize your ramdisk to contain a Linux file system!) After one process writes to the disk, another process can read out the results. Your ramdisk will also support a special discretionary locking feature, where a process can gain exclusive access to the ramdisk. This, of course, leads to some interesting synchronization issues!

Like most device drivers, your module can support several of these devices simultaneously; in particular, it supports four simultaneous ramdisks.

This lab has a handful of purposes, including:

  • To introduce you to kernel programming.
  • To give you experience in writing a device driver for a real operating system. (Although you don't have to implement any interaction with real hardware controllers.)
  • To give you some interesting synchronization problems to handle.

This lab requires that you use a Linux machine; additionally, you must have the necessary permission to load kernel modules on that machine. The Linux lab machines in 4405 Boelter have been set up for this, but you can also use your own machine.

Lab materials

The skeleton code you will get for this lab consists of:

  • a Makefile
  • two module source files (osprd.c and osprd.h)
  • the source code for a utility to read and write a ramdisk (optionally locked) (osprdaccess.c)
  • a header file implementing the instrumented locking code described below (spinlock.h)
  • a script (create_devs) which will come in handy for creating the special files in /dev used to access your devices

We have already created the special files for your devices on the lab machines, so you will only need create_devs if you are using your own machine. When you extract lab2.tar.gz you will get a directory lab2 which contains these files. Since you will be using Linux for this lab, the command tar xvzf lab2.tar.gz will suffice to extract the archive.

Writing kernel code

Writing kernel code is actually not as scary as it sounds. The Linux kernel is a very large piece of software, but you only have to use a subset for your module. Remember that the kernel is providing the abstractions we usually take for granted in userspace, so often it cannot take advantage of them itself (sort of a chicken-and-egg thing). We will try to list a few of the most important things for this lab here.

Each process in Linux has its own kernel stack, which means that while the kernel is executing on behalf of that process, it uses a stack which is specific to that process. This is convenient, because we can do things like invoke the scheduler from this stack, and the scheduler will "return" to the same place on that process' kernel stack next time the process is scheduled. Additionally, many recent Linux kernels are preemptible which means that just like threaded user code, code in the kernel has to worry about race conditions. Therefore, it is very important in the kernel to use proper synchronization mechanisms like locks to avoid unpredictable behavior and deadlocks. It's not just one process that might get stuck as the result of a bug -- the whole system can crash!

While the lab machines in 4405 Boelter do not have preemptible kernels, kernel modules still need to handle these synchronization issues so that they will work correctly when used with a preemptible kernel. In the lab skeleton code, we have instrumented the kernel's locks to help you detect incorrect lock usage even on non-preemptible kernels where ordinarily incorrect use of the locks does not cause problems. This may be easier than testing using a preemptible kernel, where incorrect locking often deadlocks the kernel (making the whole machine lock up). However, you won't be able to detect missing locks (though that's hard even with a preemptible kernel), so you'll have to think carefully about where they will be necessary.

Memory protection is also not the same as it is in userspace. In userspace, if you follow a bad pointer, your program may be killed, but that's about it. In the kernel, the kernel itself will likely crash. This is obviously a bad thing. Even worse, userspace programs can pass invalid pointers into the kernel through system calls. If the kernel dereferences such a pointer, the machine is in trouble! To help the kernel protect itself against such abuses, there are special kernel functions for dealing with userspace-supplied pointers in a safe way. You won't need to worry about these functions for this lab, but they will become important in Lab 3. You will find them, and other useful kernel functions, documented here.

Another important thing to keep in mind is that you can't call system calls like open, read, or write from inside the kernel. You are implementing these functions for your block device, so calling them would likely cause an infinite loop (and also the kernel just won't let you). Similarly, you can't call internal kernel functions like send_sig from userspace.

What you have to do

To complete the lab you need to complete all EXERCISE sections in osprd.c. This section of the lab manual gives an overview of the exercises and related background knowledge.

Your ramdisk will be a block device, meaning it will implement the Linux block device interface. A block device is, basically, a device that acts like a disk. Concretely:

  • Block devices store data. They support read and write operations.
  • That data is stored in fixed-size units, called sectors, which the system can access in any order.

Even though the ramdisk could be more flexible, it conforms to the block device interface so that it can be used everywhere a physical block device can be used. The Linux block device interface also includes an ioctl function to perform special per-device operations. You will use the ioctl interface to implement the locking features of your block device.

Reading and writing

We suggest you first implement read and write support in your block device. This is the osprd_process_request function. When you're done, you'll be able to read and write data from each ramdisk by reading and writing the /dev/osprdX files! To make this easier, we've provided a program called osprdaccess that can read or write the block device.

To write data into the block device, use a command like

  echo DataForRamDisk | ./osprdaccess -w

The osprdaccess -w command reads data from standard input and writes it to the ramdisk. It writes to /dev/osprda by default, but you can also supply a ramdisk name:

  echo DataForRamDisk | ./osprdaccess -w /dev/osprdb

You can also write the contents of a file into the ramdisk.

  ./osprdaccess -w < whatever.txt

You can limit the amount of data written; this writes at most 128 bytes:

  ./osprdaccess -w 128 < whatever.txt

And you can write starting at an offset into the file. This writes the third sector (bytes 1024-1535).

  ./osprdaccess -o 1024 -w 512 < whatever.txt

The ./osprdaccess -r command reads data from the ramdisk and writes it to standard output. It takes very similar arguments. For instance, this command reads bytes 12-15 of /dev/osprdc and prints them to standard output.

  ./osprdaccess -o 12 -r 4 /dev/osprdc

Here are some more examples.

 % ./osprdaccess -r 512 | hexdump -C
 00000000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
 *
 00000200
 % ./osprdaccess -w 512 < osprd.c
 % ./osprdaccess -r 512 | hexdump -C
 .......
 00000000  23 69 6e 63 6c 75 64 65  20 3c 6c 69 6e 75 78 2f  |#include <linux/|
 00000010  76 65 72 73 69 6f 6e 2e  68 3e 0a 23 69 6e 63 6c  |version.h>.#incl|
 .......

Discretionary locking

Next, you will implement support for discretionary locking, where a user can obtain exclusive access to a ramdisk. We've designed the interface as follows:

  • The user decides whether to open a ramdisk with exclusive access at open time. The O_EXCL flag indicates exclusive access.
    • Opening a ramdisk for exclusive reading -- open("/dev/osprda", O_RDONLY | O_EXCL) -- will block until no processes have the ramdisk open for writing.
    • Opening a ramdisk for exclusive writing -- open("/dev/osprda", O_WRONLY | O_EXCL) (or O_RDWR) -- will block until no processes have the ramdisk for writing or reading.
  • This locking is called "discretionary" because opening the ramdisk normally -- i.e., without O_EXCL -- will always succeed. Thus, say that process P1 has a ramdisk open for exclusive writing. Process P2 can successfully open the ramdisk for nonexclusive writing (open("/dev/osprda", O_WRONLY)); if this happens, both P1 and P2 can write to the ramdisk simultaneously. However, if P2 opens the ramdisk for exclusive writing, then P2's open will block until P1 is done.
  • Exclusive open requests are served in the order in which they are received. Thus, for example, if P1 opens a ramdisk for exclusive writing, and then P2 opens the same ramdisk for exclusive reading, then P2 will not begin until P1 has closed its ramdisk. This is bounded wait. Note: It is easier here to get mutual exclusion than bounded wait. Make sure you get mutual exclusion working first.

To implement these semantics, you will need to maintain a wait queue per ramdisk, so that processes opening the ramdisk with exclusive access can block. You also keep track of how many files are open for a given ramdisk for reading and for writing.

The osprdaccess program's -x option will open the ramdisk file with the O_EXCL flag. You can use this to test discretionary locking.

    # First, zero out the ramdisk.  We do this after each set of commands.
 % ./osprdaccess -w -z
    # This command writes "foo" into the first four bytes of the ramdisk,
    # as we can see when we read those bytes back out.
 % echo foo | ./osprdaccess -w
 % ./osprdaccess -r 4
 foo
    # This command has the same effect, but the "-d 5" option tells the first
    # osprdaccess to wait 5 seconds after opening the ramdisk, but before
    # writing.
 % echo foo | ./osprdaccess -w -d 5
 [[WAITS 5 SECONDS]]
 % ./osprdaccess -r 4
 foo
    # Now let's play with some discretionary locks!  Run the first osprdaccess
    # in the background, then try to run the second.  It should block until
    # the write is done.  It is easiest to get this to work if you "tee up"
    # both commands in adjacent terminals, then press return on the two
    # terminals in the correct order.  Or you can make the delay bigger so you
    # have time to type (or paste) the second command.
 % echo foo | ./osprdaccess -w -d 5 -x &
 [1]
 % ./osprdaccess -r 4 -x
 [[WAITS ~5 SECONDS]]
 foo
    # If we don't use exclusive locking, the read and write proceed in parallel.
 % ./osprdaccess -w -z
 % echo foo | ./osprdaccess -w -d 5 &
 % ./osprdaccess -r 4
 [[returns four 0 characters right away]]
    # Note that two exclusive reads can proceed in parallel.
 % echo foo | ./osprdaccess -w
 % ./osprdaccess -r 4 -d 5 -x &
 [1]
 % ./osprdaccess -r 4 -x
 foo      [[returns right away, then after 5 seconds:]]
 foo
    # But an exclusive read will block an exclusive write.
 % ./osprdaccess -r 4 -d 5 -x &
 [1]
 % echo bar | ./osprdaccess -w -x
 [[WAITS FOR 5 SECONDS]]
    # A ramdisk can write into itself.
 % echo foo | ./osprdaccess -w
 % ./osprdaccess -w 3 -o 3 < /dev/osprda
 % ./osprdaccess -r 6
 foofoo
    # But not when the lock is exclusive, since that would cause deadlock!
    # (opening the ramdisk again)
 % ./osprdaccess -w 3 -o 3 -x < /dev/osprda
 open: Resource deadlock avoided

First, keep track of each ramdisk's number of open files. You will have to change the ramdisk info structure (osprd_info_t) to store the right information. Change the osprd_open and osprd_release functions to maintain the counts. osprd_open is called when a process opens a ramdisk device, and osprd_release is called when the process closes the ramdisk device. (A process can dup2 the file in between, so that multiple file descriptors refer to the same file; osprd_release is called when the last file descriptor is closed.) A single count of open files will not be enough, you need at least two counts to implement the right semantics. Check out filp->f_mode and filp->f_flags.

Make sure you protect any access to shared state with the osprd_info_t's mutex member. This is a spin lock, or polling mutex. Remember that it is illegal in Linux for a process to lock a spin lock, and then block! (This is because another thread could then attempt to lock the spin lock, and spin forever.) You will have to arrange your code in later steps so that you always release the spin lock before blocking and re-acquire it afterwards.

Next, change osprd_open to implement O_EXCL. Instructions are included in the skeleton code.

You can either implement O_EXCL support all at once, or you can go more gradually. A more gradual plan would look like this. First, try to implement mutual exclusion: make sure that exclusive opens don't complete until there are no other readers and/or writers. You can do this without blocking. Instead, call schedule() if an exclusive open is not ready. You only need to change osprd_open to get this functionality. (schedule and other important kernel functions are defined here.) Second, support blocking, using the wait_queue functions. To implement blocking, you need two pieces of code: osprd_open must block, and osprd_release must wake up any blocking processes. Finally, support bounded waiting, where exclusive-open processes manage to open the file in the order that they arrive.

Signals are a tricky aspect of this interface. Remember that signals are a way to send a running process asynchronous events, or to kill a running process. If the process is blocked in the kernel -- for instance, if it is blocked in osprd_open -- then the kernel must wake up so the process can process the signal. Since we want to kill blocked processes, they had better be able to handle signals. This means:

  • When blocked, you should put processes into an interruptible wait. This means that signal delivery takes place as usual. (In an uninterruptable wait, you would not be able to kill a waiting processes except by causing the event it was waiting for -- and this would, in turn, prevent your module from being unloaded because it is currently blocked by code in your module. You'd have to reboot a lot!) Putting a process into interruptible wait is a two-step process: you prepare to wait (prepare_to_wait), then you schedule (schedule). The schedule function blocks the current process, which will not run again until much later, when the process wakes up.
  • When your process receives a signal, it will wake up (the schedule function will return). You should use the signal_pending function to check whether a signal has occurred. If so, osprd_open should immediately return to its caller. (See the comments to tell which return value.)

Test your code using test cases like those above (AND OTHERS you come up with yourself).

Deadlock

So far, so good! But there's a problem in store: A file operation might cause a known deadlock. That is, a process would be effectively waiting for itself -- and thus would never wake up. Your job is to detect these deadlocks. Instead of blocking forever (or until a signal), your code should return -EDEADLK, a special error code that indicates a deadlock was avoided.

We aren't going to describe these deadlock conditions precisely, although there are a couple hints scattered in the lab manual and the code -- including this:

  • You may find the for_all_files helper function useful (definition here). This function calls a little callback function you supply, once for each open file in a given process.

Compiling your module

On the lab machines in 4405 Boelter, we have already installed the Linux kernel source code necessary for building kernel modules for the revision of the kernel they are running. Therefore, all you have to do to compile your code into a loadable kernel module (a .ko file) is run make. However, if you are using your own machine, you may need to get the kernel source code first. Try compiling the skeleton module; if it does not work, go here for instructions on how to download and install the kernel source code.

Loading and unloading your module

Ordinarily, loading or unloading Linux kernel modules requires that you be the administrator (the root user). If you are using your own machine, you can just change to the root user with su. On the lab machines, you are not allowed to do this, so we have set up a special mechanism for you to execute the commands needed to load and unload modules.

After changing to the root user, you can use the insmod program to load .ko files: just run insmod with the name of the .ko file to load as an argument. The name of the module is the part before the .ko, thus, for this lab, the module will be named osprd and the module file will be osprd.ko. To unload a module, use the rmmod command with the name of the module as an argument. Note that when you try to load a module, if there is already a module by the same name loaded, it will fail -- so you will have to unload it first.

To do this on the lab machines, there is a program called sudo which allows you to run specific programs as root (as specified by the system administrator). We have set up sudo to let you run insmod and rmmod, though because sudo is a little paranoid you have to specify the full path to the commands. Here are the commands you should use:

$ sudo /root/fixMachine
[[You only need to do this command once per session, and you may not
need to do it at all -- but if you see a funny error, try this.]]
$ sudo /sbin/insmod osprd.ko
$ sudo /sbin/rmmod osprd

For security reasons, sudo will ask you to type your password before it will execute the commands. However, for convenience, if you have used sudo within the last 10 minutes, it will not bother to ask you for your password again. The first time you run sudo, it will display a short message about talking to the administrator about local security policies.

Once your module is loaded, it becomes a part of the currently executing kernel. Your code executes as part of the kernel, and can easily crash the machine if you dereference bad pointers or do other things that would only kill your process in userspace. Save your files and run sync before you load your module to avoid losing your work! Even after unloading your module, there may be residual effects of its having been loaded if there were serious bugs in it, so you should save your work frequently.

Getting debugging output

Messages you print with printk or eprintk from your module may not be shown on the screen by default, which can make debugging a little difficult. If you find that this is the case, you can get a list of the messages by running the dmesg program. On the lab machines, you may need to run it via sudo as described above for insmod and rmmod; if so, you may also need to specify its full path: /bin/dmesg. We found that eprintk messages were actually sent to all open terminals on the lab machines, so you may want to use printk and dmesg instead.

On the lab machines, and on any other machines with non-preemptible kernels, you will see some lock debugging information by using dmesg as well. On machines running preemptible kernels, the system will probably lock up instead of giving any debugging information. This is because the locks aren't really necessary on non-preemptible kernels, so we've replaced them with special debugging versions that detect deadlocks for you (but don't actually lock, which is why the machine doesn't lock up). On a preemptible kernel, the locks are actually needed, so we can't replace them like this. (But don't think the synchronization exercises aren't necessary - even on a non-preemptible kernel, there is still a real race condition that you will need to fix during the lab.)

Using your module

After loading your module, you'll probably want to try using it. To do this, you will have to access the block special files in /dev which represent the devices it controls. If you are using your own machine, these files may not exist yet (depending on a variety of things). Look in /dev to see if you have files named osprda, osprdb, osprdc, and osprdd. If so, you're all set. If not, run the create_devs script as root to create them.

These files are not like other files in your filesystem. They do not actually contain any data; they just "represent" the devices your driver controls. The only reason they are there is so programs can call open on them to get file descriptors which the kernel knows are connected to particular devices. However, this property means that many programs can use them just as if they were normal files. This is part of the UNIX design philosophy that "everything is a file."

So, to use your devices and thus your module, you just need a program which will open the block special files and read from or write to them. We have included the osprdaccess.c code for this purpose, but you can also write your own program to do it. If you write your own program, you may want to look into the read (man 2 read) and write (man 2 write) system calls.

Design problems

Disk Emulation

Since ramdisks are just blocks of memory, they behave much nicer than physical disks in lots of ways. Real disks have advantages (persistence, cheap per-byte cost), but they are also slow, disk requests can cause contention (multiple outstanding disk requests are often executed in some order, rather than in parallel), and disks tend to fail (I/O errors, bad sectors).

In this design problem you will design support for emulating real disk characteristics on your ramdisk. Your ramdisk should be able to run slowly (requests will block for some time before they complete), and/or report errors (where ramdisk sectors "go bad": all writes and reads to a bad sector should fail with an EIO error), and/or support for other characteristics of a real disk as you see fit.

This might seem like the least useful thing ever, but in fact it's quite useful for testing file system implementations. To see whether a file system can handle disk errors, a programmer might put a file system image on an error-simulating ramdisk and see how it behaves. This is a lot more convenient (and cheaper) compared to putting the file system on a real hard drive and shaking it until an error occurs! Similarly, you can test how a file system behaves for different disk speeds.

Implementing this design problem will require more hacking into the osprd_process_request function, plus designing a way to specify disk characteristics (speed/errors -- the ioctl function may be useful here).

Change Notification

Design a mechanism by which a process can request notification whenever data on a ramdisk changes. Specifically, a process P1 can open a ramdisk and request notification on any change. When another process writes to the ramdisk, P1 should receive notification.

We are being purposefully vague about how "notification" should be implemented: it is your job to decide. But a waiting process P1 should be able to block on something -- such as a system call like ioctl -- until the notification arrives.

The waiting process should be able to wait for changes to specific areas of the disk. For instance, maybe there will be separate notifications for each sector of the disk, or for ranges of sectors, all the way down to individual bytes. Again, you should decide what type of notification to implement, and justify your choice. Think about time and space overheads.

Encrypted Ramdisk

Recent operating systems have allowed users to store data encrypted on the filesystem. The filesystem is readable and writable only if the user provides the right password when they log in.

Implement an encrypted ramdisk, where data is stored on the ramdisk in encrypted format. If a user opens the ramdisk normally, they should see encrypted gobbledegook. But if a user provides the right password at open time, then read operations on that open file should transparently decrypt the disk's data. Furthermore, if the user writes to the file, the data they write should be encrypted before it is sent to the ramdisk.

You will need to implement ramdisk-specific read and write operations for this challenge problem; we can help you set this up. (You must change the osprd_blk_fops structure's read and write operations to point to your code.) You should decide which forms of encryption to support. One choice might be to support any form of encryption Linux can handle. These are described in include/linux/crypto.h and Documentation/crypto/*.

Hand-in

When you are finished, edit the file named LAB and follow the instructions at the top of the file to fill in your name(s), student ID(s), email address(es), short descriptions of any extra credit or challenge problems you did, any known limitations of your code (including known bugs), and any other information you'd like us to have.

Then run "make tarball" which will generate a file lab2-yourusername.tar.gz inside the lab2 directory. Upload this file to CourseWeb using a web browser to turn in the project. Remember to upload it only once if you are working in a team - the LAB file will allow us to give both team members credit.

Good luck!

 
2006fall/lab2.txt · Last modified: 2006/11/03 00:39 (external edit)
 
Recent changes RSS feed Driven by DokuWiki