Table of Contents

Lecture 15 notes

by Wei Diana Chiang, Daniel Leung, and Andy Choi

File Systems (continued)

History

Hard Links

.:hardlink.png

Hard links can be used as a sharing mechanism, or as a cheap “copy”. There are two possible ways to save changes to a file that is hard linked to another file. The obvious method is to open the file and changes into it. A problem that arises is if the computer crashes during while writing to the file: it will contain half old data, and half new data. Another idea is to save a copy of the file with the new file, and rename it (atomically replacing the old version with the new one in one step). This prevents the problem happening in the previous case: either the file will contain the only the new data, or it will contain only the old data. A new problem arises with this method, however:

Lets say /Bob/report.txt and /Alice/report.txt are hard linked.

Bob's editor then saves a new /Bob/report.txt. But Alice still sees the old version of the file. This is because Bob’s editor saves a new copy of the file, with a new inode, and deletes the old hard linked file.

Symbolic Links

.:symlink2.png

Symbolic links add another layer of indirection. They are a special file type in which their file contents are a name. The kernel opens symbolically linked files by reading this name, and repeating the lookup with the new name. A problem with symbolic links arises if someone renames the file a symbolic link points to. Another problem is with circular symbolic links. A simple example of this is if a symbolic link points to itself. This can be solved by only visiting a set number of symbolic links before stopping.

Multiple File Systems

How do we knit multiple file systems together into a single user image? Typically, a single user image may contain many different file systems, including FAT floppies, a Unix hard drive, and a Joliet CD-ROM. We need a single namespace for multiple file systems.

Examples:

In Windows, each file name contains a disk: i.e. A:\ or C:\
In Unix, a single hierarchical namespace with trapdoors (or mount points) from one file system to another. A trapdoor is basically a gateway between different file systems.

Virtual File System

Neutrality

A virtual file system can help maintain neutrality. What we want is one set of file system system calls for many file system types (such as Unix or FAT). We can use an object-oriented approach to multiplex many file system types onto one set of system calls.

.:vfs.png

Note: pipes also have to implement the VFS layer (for instance, to allow piping between different file systems).

Performance

To maximize performance, we want to avoid seeks as much as possible. We want a memory-type performance. Caching keeps file system data in memory. A read copies data from the cache, and a write copies data to the cache.

Prefetching is loading data into the cache before it's needed. This can be useful if it can predict properly, or harmful if the prediction is wrong.

In a modern OS, there is a mandatory buffer cache for all disk reads and writes.

.:modernos2.jpg

Disk Head Scheduling

There are multiple concurrent processes requesting a disk access for a read or a write. Disk head scheduling algorithms determine the order that requests are processed. The object is to minimize the seek time for and between each request.

What order should reads and writes be executed?

First Come, First Serve (FCFS): Processes disk access requests in the order of arrival.

Process Block
A 1
B 2000
C 2
D 2001

Shortest Seek Time First (SSTF): The next block read is the block closest to the last block read (or the disk head).

Process Block
A 1
B 2000
C 2
D 2001

Elevator Scheduling (SCAN): The next block read is the closest block in the current direction. When there are no more requests, the direction is switched. This algorithm is similar to the way an elevator is operated.

Process Block
start 5
A 6
B 2
C 2000
D 2001

Robustness

File Systems Robustness – Day 1

The third goal of file systems is file systems robustness.

File systems robustness is the ability of a file system to tolerate faults and can defined as an adherence to four invariants:

1.  A referenced block is not free.
2.  A non-referenced block is free.
3.  Each disk block is used for exactly one purpose.
4.  All referenced blocks are initialized.

In general, we would like users to be able to rest easy knowing that their data, whether it be mellow jazz tunes, important documents for the office, or haikus about honeycombs, will be safe from unexpected hiccups during otherwise normal operation. Many things can happen, but specifically, what we will take as our primary motivation for file systems robustness in this initial summary, are events such as power failure. Creating a file system that can recover from or withstand such events, without violating these invariants, is the goal of file systems robustness.

Now imagine...

... you are an entry-level worker bee for a top typing-stuff-on-the-computer company and sometime while you are working, you decide to start moving files around on your file system, occasionally writing things here and deleting things there, when all of a sudden, the power goes out. All eight of your colleagues, in the adjacent cubicles, panic, “All my work is lost!.” However, you sit there content. You’ve taken a course on operating systems, and you know that today’s systems protect against things like power failures. You know that your data is fine and dandy. And so, while waiting for the power to come back on, you decide to think back and recall the different ways your file system could possibly recover from such a not-so-catastrophic event.

The first approach to file systems robustness you learned in class, you remember, was...

fsck - File System Check

The main principle of fsck is to enforce the four invariants at boot time. As such, the file system will always adhere to the invariants at all times during normal operation, up until there is a crash, the disk fails, or when the power fails. After such an event, it is likely that the file system will be in such a state that one or more the of invariants will be violated. On restart, the operating system will be aware that the computer was not shutdown properly and will want to run a fsck and correct any violations.

The fsck runs a series of tests to ensure the integrity of the invariants.

For example, in the FAT file system, fsck can scan through the FAT and check to see if every block is referenced at most once. This type of check would deal with invariant #3, that all blocks are used for only one purpose. If fsck finds a violation, then there are several ways to fix it. For this example, it would not be wise to free the block. The data in those blocks could be important. It would be better to allocate another block so that now there exists two copies of the block in question. There are many ways to deal with invariant violations. Some ways are better than others, but it really depends on the file system and what architects ultimately decide.

In another example, fsck could look at all the directory entries’ inode numbers and check to see if those block numbers are not free. This type of check would deal with invariant #1, that all referenced blocks are not free. If fsck finds a violation, it could delete the file, or set the block as not free. Again, there are many ways to deal with such violations and depend on the file system.

With the basic idea of fsck in your mind, there are many other tests that fsck could run to enforce the invariants at boot time. You think for a second, fsck does a pretty good job. It ensures that data is never really lost. But then again, there are advantages and disadvantages to using fsck to ensure file system robustness. An advantage of fsck is that extra work is only needed when an event like power failure actually occurs. In all other cases, it can be assumed that the file system will always be true to the four invariants. However, the apparent disadvantage of fsck is the enormous cost during recovery. Many of the tests require scanning though the entire file system, block by block, and fsck will most likely take a lot time, assuming that reading from disk is extremely slow. With disk space being increasingly cheap for large amounts of space, fsck becomes more and more expensive, and less of a viable solution.

So you conclude, fsck is really slow. What else can you do, maybe while the computer is running normally, to ensure file system robustness? The second approach, you recall, was

Write-Ordering

Write ordering employs a real-time solution to the power failure problem. The way it works is to write info to the disk in such an order that the disk image always stays true to the four invariants, except for a possible violation invariant #2 – memory leak. Write ordering allows for memory leaks because, in comparison to the other three, leaking disk space is the least destructive to the file system and leaked disk space can be reclaimed by background processes. Write ordering also depends on the assumption that all block writes are atomic. Writes either happen fully or not at all. Writing half a block to disk is not a possibility. This can be achieved with a last-ditch power source, such as a small capacitor, that would allow a write to finish the block despite a power failure.

To explain how write ordering works, you remember an example from class, where the steps to removing a file in a Unix-like file system with inodes, can be reordered to achieve file systems robustness.

Removing a file consists of these three atomic steps:

	1.  Mark in the free block bitmap that the data blocks are free
	2.  Clear the inode
	3.  Remove the directory entry
	(assume that there is only one link to this file’s inode)

Each of these steps is atomic, however, a power failure can occur anywhere in between. Thus, it is important that the ordering be correct, such that at all times, the invariants are maintained(except for #2). The current order is incorrect, as marking the data blocks free while they are still referenced violates invariant #1. This would result in data corruption if a power failure were to occur after step 1, but before step 2. In addition, clearing the inode while it is still referenced by the directory entry would also result in data corruption if a power failure were to occur after step 2, but before step 3. It so happens that the only orderings that would result in a safe and robust file system is an ordering in which the directory entry is removed first. If a power failure were to occur after the directory entry is removed, there would be a memory leak – unreferenced blocks that are not free. However, leaked disk space can be reclaimed and will not result in a loss of data, only a temporary loss of space.

So after thinking about write ordering, you realize that it could definitely be a good, and inexpensive solution to file systems robustness. It doesn’t really involve doing much extra work. After all, you would be performing all those three steps to delete a file, independent of any kind of ordering. However, you also realize that write ordering can be tricky and that different operations may require different orderings. Also, leaking memory is not at all desirable and it would be a worthy goal to not lose any disk space at all.

And just as you were beginning to remember another, more systematic, way to ensure file system robustness, the lights suddenly turn on. The power is back, you restart your computer and get back to work, leaving file systems robustness to be continued. Journaling, another approach to file systems robustness you recall, was covered next lecture.

***