Quick link: Getting kernel source
In Lab 2, we used a RAM disk block device to teach you about the Linux kernel setting and about synchronization. In Lab 3, you'll learn about file systems by writing your own file system driver for Linux.
Operating systems generally support many different file systems, from old-school FAT (common in DOS machines) to new formats such as Linux's ext3. Each file system type is written to fit a common interface, called the VFS
layer. (File systems are one of the object oriented aspects of the OS we discussed in class.) You will see how a real VFS layer works by filling out a simple file system implementation. You will load that file system module into the kernel and mount the file system, allowing you to access your file system with usual Unix commands like cat, dd, emacs, and even firefox. The file system driver stores its data in a built-in RAM disk, kind of like your RAM disk from Lab 2.
Like Lab 2, this lab requires that you use a Linux machine; additionally, you must have the necessary permission to load kernel modules on that machine. The Linux lab machines in 4405 Boelter have been set up for this, but you can also use your own machine.
The skeleton code you will get for this lab consists of the following files:
ospfs.h | Commented header file defining file system structures and constants. |
ospfsmod.c | The file system module's source code. All exercises in the lab involve changing this code. |
truncate.c | Utility: Truncates a file to a specified length. |
base/ | The contents of this directory are copied into your file system's initial RAM disk. You can change the contents of base/, then type make clean; make, to add files to or remove files from your RAM disk. |
runospfs.sh | This script automates the procedure of loading your module code into the kernel. |
Makefile | |
ospfsformat.c | Utility: Creates a RAM disk image from the base/ directory. |
fsimgtoc.c | Utility: Links the disk image into your module. |
When you unpack lab3.tar.gz (with the command tar xvzf
lab3.tar.gz) you will get a directory lab3 which contains these files. Do your work in this directory.
When you are finished, edit the LAB file and follow the instructions to fill in your name(s), student ID(s), email address(es), short descriptions of any challenge problems you did, and any other information you'd like us to have. Then run "make tarball" which will generate a file lab3-yourusername.tar.gz inside the lab3 directory. Upload this file to CourseWeb using a web browser to turn in the project. (Please do not upload .zip files!)
The file system you will work with, OSPFS, is much simpler than most "real" file systems, but it is powerful enough to read and write files organized in a hierarchical directory structure. OSPFS currently does not support symbolic links, timestamps, permissions, or special device files like most UNIX file systems do. Additionally, your module is only capable of using an in-memory file system, meaning that the data "on disk" is actually stored entirely in main memory. This means that changes made to the file system are not saved across reboots (or across module loads and unloads).
Most UNIX file systems divide available disk space into two main types of regions: inode regions and data regions. UNIX file systems assign one inode to each file in the file system; a file's inode holds critical metadata about the file such as its attributes and pointers to its data blocks. The data regions are divided into large data blocks (typically 4KB or more), within which the file system stores file data and directory metadata. Directory entries contain file names and pointers to inodes; a file is said to be hard-linked if multiple directory entries in the file system refer to that file's inode. OSPFS is designed in much the same way.
Both files and directories logically consist of a series of data blocks, which may be scattered throughout the disk much like the pages of a process's virtual address space can be scattered throughout physical memory.
Modern disks perform reads and writes in units of sectors, which today are almost universally 512 bytes each. File systems actually allocate and use disk storage in units of blocks. Be wary of the distinction between the two terms: sector size is a property of the disk hardware, whereas block size is an aspect of the operating system using the disk. A file system's block size must be at least the sector size of the underlying disk, but could be greater (often it is a power of two multiple of the sector size).
The original UNIX file system used a block size of 512 bytes, the same as the sector size of the underlying disk. Most modern file systems use a larger block size, however, because storage space has gotten much cheaper and it is more efficient to manage storage at larger granularities. Our file system will use a block size of 4096 bytes.
File systems typically reserve certain disk blocks, at "easy-to-find" locations on the disk (such as the very start or the very end), to hold meta-data describing properties of the file system as a whole: the block size, disk size, any meta-data required to find the root directory, the time the file system was last mounted, the time the file system was last checked for errors, and so on. These special blocks are called superblocks.
Our file system will have exactly one superblock, which will always be at block 1 on the disk. Its layout is defined by struct ospfs_super in ospfs.h. Block 0 is typically reserved to hold boot loaders and partition tables, so file systems generally never use the very first disk block. Most "real" file systems maintain multiple superblocks, replicated throughout several widely-spaced regions of the disk, so that if one of them is corrupted or the disk develops a media error in that region, the other superblocks can still be found and used to access the file system. OSPFS omits this feature for simplicity.
Just as a kernel must manage the system's physical memory to ensure that each physical page is used for only one purpose at a time, a file system must manage the blocks of storage on a disk to ensure that each block is used for only one purpose at a time. In file systems it is common to keep track of free disk blocks using a bitmap rather than a linked list, because a bitmap is more storage-efficient than a linked list and easier to keep consistent on the disk. ("FAT" file systems use a linked list.) Searching for a free block in a bitmap can take more CPU time than simply removing the first element of a linked list, but for file systems this isn't a problem because the I/O cost of actually accessing the free block after we find it dominates performance.
To set up a free block bitmap, we reserve a contiguous region of space on the disk large enough to hold one bit for each disk block. For example, since our file system uses 4096-byte blocks, each bitmap block contains 4096*8 = 32768 bits, or enough bits to describe 32768 disk blocks. In other words, for every 32768 disk blocks the file system uses, we must reserve one disk block for the block bitmap. A given bit in the bitmap is set (value 1) if the corresponding block is free, and clear (value 0) if the corresponding block is in use. The block bitmap in our file system always starts at disk block 2, immediately after the superblock. We'll reserve enough bitmap blocks to hold one bit for each block in the entire disk, including the blocks containing the boot sector, the superblock, and the bitmap itself. We will simply make sure that the bitmap bits corresponding to these special, "reserved" areas of the disk are always clear (marked in-use).
Each file and directory on the system corresponds to exactly one inode. The inode stores metadata information about the file, such as its size, its type (file or directory), and the locations of its data blocks. Directory information, such as the file's name and directory location, is not stored in the inode; instead, directory entries refer to inodes by number. This allows the file system to safely move files from one directory to another, and also allows for "hard links". Not all file systems have inodes, but OSPFS does.
The layout of OSPFS's inodes is described by struct ospfs_inode in ospfs.h. The oi_direct array in struct ospfs_inode contains space to store the block numbers of the first 10 (OSPFS_NDIRECT) blocks of the file, which we call the file's direct blocks. For small files up to 10*4096 = 40KB in size, this means that the block numbers of all of the file's blocks will fit directly within the ospfs_inode structure itself. For larger files, however, we need a place to hold the rest of the file's block numbers. For any file greater than 40KB in size, therefore, we allocate an additional disk block, called the file's indirect block, to hold up to 4096/4 = 1024 additional block numbers. The 10th block number is the 0th slot in the indirect block. Our file system therefore allows files to be up to 1034 blocks, or a bit more than four megabytes, in size. To support larger files, "real" file systems typically support double- and triple-indirect blocks as well. See the picture at right.
Inodes are relatively small structures; OSPFS's take up 64 bytes each. They are generally stored in a contiguous table on disk. The size of this table is specified when the file system is created, and in most file systems it cannot later be changed. Just like blocks, an inode can be either in use or free, and just like blocks, a file system can run out of inodes. It is actually possible to run out of inodes before running out of blocks, and in this case no more files can be created even though there is space available to store them. For this reason, and because inodes themselves do not take up much space, the inode table is often created with far more available inodes than the file system is expected to need. (Remember, each file uses an inode, so the maximum number of files and directories the file system can store at once equals the number of inodes.)
In OSPFS, the inode table is allocated immediately after the free block bitmap, and its blocks are also marked as in use in the free block bitmap. The number of inodes is stored in the superblock, and for convenience the block number of the first block of the inode table is also stored in the superblock. (Why is this only for convenience? How could we determine this value based only on the size of the disk?)
Even though your kernel module will not support creating new hard links (unless you do that challenge problem), each inode in OSPFS has a "link count," which is a number indicating how many directory entries refer to it (since the ospfsformat utility knows how to create hard links when it creates the initial OSPFS file system for your module to use). If there are no hard links to a file, then the link count will be 1 (since only one directory entry refers to the inode). For each hard link created in a "real" UNIX file system, the link count increases by 1, and for each hard link removed, the link count decreases by 1. Technically speaking, even the file's original name is called a "hard link" -- it is no different than later directory entries created for the inode except that it was created first. If the link count reaches 0, it means that the inode no longer has any hard links, and its storage space (both the blocks that make up the file and the inode itself) can be marked as free and reused. You will need to support this behavior when you delete files from OSPFS.
You may find the -i and -l options to the ls program useful. The -i option causes ls to display the inode number of each file. The -l option causes ls to display a "long listing" which includes, among other information, the link count and size of each file. These options can also be combined.
Here's an OSPFS file system corresponding to the base/ directory we handed out. (Note that the hello.txt data block is also linked to world.txt -- there is a file with two hard links!)
An ospfs_inode structure in our file system can represent either a regular file or a directory; these two types of "files" are distinguished by the type field in the ospfs_inode structure. The file system manages regular files and directory-files in exactly the same way, except that it does not interpret the contents of the data blocks associated with regular files at all, whereas the file system interprets the contents of a directory-file as a series of ospfs_direntry structures describing the files and subdirectories within the directory. Each ospfs_direntry structure just contains a filename and an inode number. All other information about the file -- including size, type, and pointers to data blocks -- is stored in the inode.
Inode number 1 in our file system is special: it is the inode for the root directory of the file system. (Inode number 0 is reserved and must never be used.)
Your assignment will be to implement the routines that handle free block management, changing file sizes, read/write to the file, reading a directory file, and deleting files.
You will need to implement exercises in the following functions for this lab. One plausible order is this:
ospfs_readospfs_dir_readdirallocate_block, free_blockchange_sizeospfs_writeospfs_unlink
All of these functions have comments explaining exactly what you must do. Also read the comments in ospfs.h and towards the top of ospfsmod.c to get oriented.
inode_operations structure (see the function call for create). You may want to look into the d_instantiate() function.-s option of the ln utility.)link operation. You may want to look into the d_instantiate() function.
The Makefile builds an initial OSPFS image using the contents of the base directory. If you want to add files to your OSPFS image, simply add files and directories to base, then run make clean and make to rebuild the image. If make reports an error that the resulting file system is too big, raise the number 512 in the Makefile to something larger.
By default, the image created by ospfsformat contains two directory entries which refer to the same file (that is, they are hard links): hello.txt and world.txt. You can add more files like this by adding hard links to the base directory using the ln command. Read its manual page by typing man ln at a shell prompt. (Note: the link counts on directories are always larger than one. Why might this be?)
The resulting module knows how to provide access to the linked-in OSPFS image, but just loading the module doesn't make the file system available. The script runospfs.sh will perform the necessary steps to make the file system available. After cleaning up any existing copies of the module, it runs make, and then mounts the file system on an existing directory. This hides the directory's existing contents, turning the directory into a "gateway" into the new file system. After loading OSPFS, the runospfs.sh script mounts that file system onto the test directory. When your file system is ready, ls base and ls test should return the same results (except for things like ownership, permissions, and inode numbers).
It is worth discussing how the functionality of the script is implemented. The following list shows the commands used:
insmod ospfs.kocat /proc/modules or lsmodtest directory: /bin/mount -t ospfs none testtest: umount testrmmod ospfs
If you are not using your own machine, you will need to use the following command instead of running runospfs.sh directly:
sudo /root/runospfs.sh (This has to be done in your lab3 directory)
After implementing these functions, you should be able to do the following in your filesystem: read and write to files, change the size of files, append to existing files, and read the contents of directories. Note: even after the code is implemented, you will not be able to create files, unless you have completed the challenge portions of the project.
Here is a list of the test cases we will run on your code. Feel free to write a testing script that checks these things automatically for you! You will probably find the dd utility very useful to perform many of the tests. You can read its documentation (man dd) for details of how to use it: check out its if=, of=, bs=, seek=, skip=, and conv=notrunc options. Also the truncate utility we included with the skeleton code will help you test some of the cases. (Suggestion: compare the results of performing operations on your OSPFS file system to doing the same operations on the normal Linux file system.)
Good luck!