You are expected to understand this. CS 111 Operating Systems Principles, Spring 2006
You are here: CS111: [[2006spring:notes:lec14]]
 
 
 

Lecture 14 notes

Filesystems II: Design Motivations

A Simplified FAT Design

Like most filesystems, the first disk sector of our simplified FAT FS is reserved for the boot sector code. We are assuming 4096 byte blocks and 512 byte disk sectors. The remainder of the first block stores system-wide information such as the block size, the number of blocks in the FS, the block number containing the root directory, etc. The block containing this system-wide information is called the Super Block.

Following the Super Block is the File Allocation Table, or FAT. The FAT is an array of block numbers. In our model, we'll assume that blocks 1-9 are used for the FAT (the Super Block has block number 0). If block numbers are four bytes each, then we can store 9 * 4096 / 4 = 9 * 210 block numbers in the FAT. This limits the size of the FS to about 36M.

The FAT is treated like a singly-linked list. Each element of the FAT represents one block. For example, FAT[0] would represent the Super Block. If we assume we have N blocks, then FAT[N-1] would represent the last block in the FS. We initialize the first 10 blocks to some special value, say X, because they represent the Super Block, and the FAT itself. FAT[0, ..., 9] = X. This indicates that these blocks are not free for use by the FS. The rest of the blocks in the FS are free, which means they can be used to store file or directory data (we'll assume we don't have a root directory yet, so all the rest of the blocks are free). The first free block is then block number 10. This is the first block in the free list. We'll have an area in the Super Block called Free_List initialized to 10. Since the FAT is a linked list, FAT[10] will point to the next free block. In this case, FAT[10] = 11. Likewise, FAT[11] = 12, FAT[12] = 13, and so on. The last block in the FS terminates the list. Since 0 is an invalid block number (it holds the boot sector and Super Block), we can write FAT[N-1] = 0.

Now we really need a root directory, so we'll use the first block on the free list, block 10. A directory stores a set of entries, one for each file in the directory. The entry stores file metadata. This is data describing the file itself. It includes things like the file size, type, owner and access permissions. In a later model we'll move the metadata into the inode, but for now we'll keep it in the directory entry. Also stored in the entry is the filename and the block number of the first block containing the file data. A directory is just a special type of file, and this is what goes into a directory's data. Initially, however, the root directory will be empty, so there are no entries in it yet.

Now that we're using block 10 for the root directory entries, Free_List will need to be set to 11, and FAT[10] will be set to 0. We say that block 10 is allocated. A file or directory may use more than one block for its data. If it does, then the FAT entry for the first block of the file will contain the block number of the second block, and the FAT entry for the second block will contain the number of the third block, and so on. This is similar to the way the free list works, but in this case it is a list of blocks used by a file or directory. The last block of data will contain the block number 0 indicating the end of the list.

So far we have a Super Block (block 0) to store system-wide information, the FAT (blocks 1-9) to indicate which blocks are free and which blocks go together, and a directory with no entries using block 10. Free_List is 11, as it is the first free block. FAT[0, ..., 9] = X, FAT[10] = 0, FAT[K] = K+1 for 11 <= K <= N-2, and FAT[N-1] = 0. Now let's see how to add a file to the root directory. Suppose we type

$ echo "foo" > /a.txt

When this command is executed, the first thing that happens is the shell creates the file "/a.txt". The FS adds an entry in the root directory. This will contain the file's name "a.txt", size (initially 0 since nothing's been written yet), and type (regular file). Since the file is initially empty, we'll assume no blocks need to be allocated yet, so the file's first block is the invalid number 0. Once the file is created, we return to the shell and attempt to write the four bytes "foo\n" into the open file. Now the FS needs to allocate a block for the file. It grabs this block by setting Block = Free_List, Free_List = FAT[Block], and FAT[Block] = 0. When this is all done, Block = 11, the newly allocated block, Free_List = 12, the next first free block, and FAT[11] = 0, since it is allocated for "a.txt" now. Once we have a block, we copy "foo\n" into it. Lastly we set the block pointer in the directory entry to Block and update the file size field in to 4.

Now let's add a really big file so we have to allocate more than one block for it. Say we execute

$ perl -e 'print "a" x (1 << 22)' > /big.txt

This will create a file called "/big.txt" containing the letter "a" about four million times (try it, just don't cat it). The file size will be precisely 4M. Let's see how the FS handles this. As above, the FS first creates the directory entry for "big.txt". We'll assume when the shell starts writing the file, the FS will allocate one block at a time. The procedure is essentially the same as above, but we need to handle adding each new block to the end of the file now.

Block = Super_Block.Free_List;
Super_Block.Free_List = FAT[Block];
FAT[Block] = 0;
// Copy the user's data into the block Block
if (Entry.First_Block == 0) {
        Entry.First_Block = Block;
} else {
        Next = Entry.First_Block;
        while (FAT[Next]) {
                Next = FAT[Next];
        }
        FAT[Next] = Block;
}

This first grabs a free block off the head of the free list (we don't consider the case where the list is empty). Then it updates the free list so that the head points to the next free block after the one we just took. Then it puts a 0 in the FAT for the block we just took to indicate that it is at the end of some list of allocated blocks. Finally, if the file doesn't contain any blocks yet, we just give it this one. Otherwise, we have to traverse the list of allocated blocks, starting from the beginning, until we find the end so we can append the block we just allocated. Of course, to make this more efficient we could just put a pointer to the end of the allocated block list in the entry as well. In general, though, seeking to an arbitrary location in the file is slow in the FAT design.

Now let's add a directory:

$ mkdir /dir

This is just the same as adding a file except in the entry we need to mark it as a directory instead of a regular file. The block allocated for directory data will hold entries of its own.

Lastly, let's remove "big.txt":

$ rm /big.txt

In order to do this, all we need to do is remove its entry in the root directory and prepend the file's list of allocated blocks to the free list. Removing the directory entry is easy. One method is to just zero out the entry. In order to deallocate the blocks the file used, we find the end of the allocated list, and set the FAT entry for that block to Free_List, and update Free_List to point to the beginning of the chain we just deallocated. For example, "big.txt" should be using blocks 12-1035. We can get the first block, 12, from the directory entry. We then traverse the list of blocks in the FAT until we reach the end (FAT[1035] = 0). Then we set FAT[1035] = Free_List and Free_List = 12. This has the effect of placing all the blocks the "big.txt" used back on the free list, so they can be reused by the FS. Visualize it as prepending the allocated list to the free list.

.:figure1mf.gif

.:figure2mf.gif

Locality of reference implies proximity on disk

When 2 objects (blocks, files, …)are accessed together, it will minimize seek time, which is desirable.

Q: How can we change FAT given the location on disk to minimize seeks?
A: Add a single bit field to each block to represent whether the block is free or not.
Example: -1 => this block is free => easy to find a free block close to a given location, whereas before we had to traverse the linked list starting from the free list head pointer.

Q: How much time does it take to find an nth block given the system?
A: O(n) to traverse the list because FAT stores blocks in a single linked list. Random access is a common operation.
Example: Demand paging and executables. We need to find a better design.

Hypthetical example (straw man)
Singly linked list for file data (FAT) -> O(filesize) random access
Idea: Let's try storing all data pointers in directory!

.:figureamf.gif

Problem: Need to find free blocks faster (in less than 0(filesize) time, but we can't tell whether a block is free or not without additional information
Solution: FREE BLOCK BITMAP

.:figurebmf.gif

Problem: We have external fragmentation because we allowed directories to store a variable number of direct block pointers.

Solution: Make each directory a fixed size and introduce the concept of direct and indirect block pointers. This eliminates external fragmentation.

.:figurecmf.gif

Problem: A remaining problem is internal fragmentation because unused space in blocks add up.

Given: 1 block = 4KB, and pointers are 32 bits = 4 bytes: Largest file that can be stored using 10 direct blocks only is: 10 blocks * 4KB/block = 40KB
Largest file using 1 indirect block and 10 direct blocks = (1024 block pointers/indirect block)*4KB/block pointer + 10 direct blocks = 4MB + 40KB
If we need more than 4MB, add another level of indirection
indirect^2 = 4GB+4MB+40KB
indirect^3 = 4TB+4GB+4MB+40KB

What's the largest disk with 4kb pointers? 2^32 x 2^12 (4 KB) = 2^44 (16 TB)

.:figure3mf.gif

File System Invariants

What types of errors can we have on a disk?
4 Invariants - has to always be true to work successfully
(This is what happens when invariant violated)

1. Referenced blocks are not free - (data corruption)
2. Unreferenced blocks are free - ("memory" space leaks)
3. Every block is used for exactly one purpose
4. All referenced blocks are initialized - (garbage is in place first) -> covered later

Q: How do you rename a file safely.
"Safely" means at all times, invariants are maintained.
bad -> rename("./cats.gif", "/subdir/foo")
A: Inodes!

  • Inodes stores data blocks pointers file metadata
  • Directory entries store pointers to inodes
  • Inodes may be linked multiple times

Example: Why doesn't it work with our current design?
.:figure4mf.gif

Example: Safe Rename
rename("/a.txt","/subdir/a.txt")

  1. add new dir entry
  2. update link count
  3. remove old directory entry
  4. update link count

Q: Which invariants are violated if the power fails between two commands?

1.update link count
if an interrupt occurs, then invariant 2 is violated
2.write new entry
if and interrupt occurs, then all the invariants are preserved
3.delete old entry
if an interrupt occurs, then invariant 2 is violated
4.update link count
 
2006spring/notes/lec14.txt · Last modified: 2006/09/26 11:42 (external edit)
 
Recent changes RSS feed Driven by DokuWiki