Compiled by: Gerald Chang, Yiwei Guo, Eric Marcin-Cuddy
Last class: We talked about file systems and briefly began discussing characteristics of the FAT file system.
The FAT file system uses 1-kilobyte blocks to store data, and blocks are linked together in the file allocation table (FAT). The FAT contains one entry per block of data on the disk, and therefore must be large enough to store an entry each address on the disk. By not requiring entries to be contiguous, we avoid external fragmentation. A sample FAT file system below is 2MB large and has 4-byte addresses. (For comparison, the standard FAT32 has 32-bit addresses.)
| Block No. | 0 | 1 | 2 to 9 | 10 to 2047 |
|---|---|---|---|---|
| Name | Boot sector | Superblock | File allocation table | Data blocks |
| Information | Contains file system type | FAT implementation data | 1 entry per block | Actual data |
The FAT itself contains addresses to successive blocks in a file, or the number zero if that particular block is the terminal block of the file. The number zero can also mean that the block is not permitted to be written, as is the case for the boot sector, superblock, and FAT blocks. It will contain -1 if the block is free for writing. If for example we had a 4-kilobyte file that occupied blocks 101 to 104 and a 2-kilobyte file that occupied blocks 100 and 105, the FAT would look something like this:
| Block No. | 0-9 | 10-99 | 100 | 101 | 102 | 103 | 104 | 105 | 106-2047 |
|---|---|---|---|---|---|---|---|---|---|
| Address | all 0 | all -1 | 105 | 102 | 103 | 104 | 0 | 0 | all -1 |
The first block for actual file system data is block #10. A file may not necessarily occupy an entire block. If this is the case, it will fill up its final block until it is full and then continue onto the next free block. The advantage of this process is that it is simple and few special data structures are stored on the disk. Furthermore, the FAT is unambiguous as to which block is free and also how to get to the next block. However, there is not as much locality of reference since file blocks do not necessarily have to be contiguous, which means that special procedures like prefetching do not work as well. The file system is similar to a linked list, but the links are stored in FAT.
Block Address Characteristics:
Advantages: There are few data structures stored on disk; tells us which and where data is stored
Disadvantages: Locality of reference; complex prefetching
An invariant is a statement that must always be true for a data structure to remain consistent. Here are some invariants for FAT:
For Directories:
Directories map names to files. They are stored like a normal file in the data blocks, and can span multiple blocks, if necessary, just like a regular file. Directories contain entries, which are of the following format
| Name | Pointer to the first block | File size | File type | Additional information (FS-specific) |
|---|---|---|---|---|
| unique | Cannot be boot sector, superblock, or FAT block | Regular/Directory | Permissions, user ID, other flags |
Ex: How to find "/home/kohler/grades"?
The Unix file system introduced two major ideas:
For the example below, which is the same 2MB disk as above, the free block bitmap will store one bit of data per block, which will only take up 256 bytes of space. Your actual mileage may vary.
| Block No. | 0 | 1 | 2 (256B used) | 3 upward |
|---|---|---|---|---|
| Name | Boot sector | Superblock | Free block bitmap | Data blocks |
| Information | Contains file system type | FAT implementation data | Says whether block is free | Actual data |
Maximum File Size:
direct: 10 x 1KB = 10KB
indirect: 10KB + 1KB x 256 = 266KB
indirect2: 266KB + 1KB x 256 x 256 = 266 + 65536 = 65802KB
nlink: How many directory entries refer to this file?
2 Ways to Remove the File:
Remove first:
Add first:
*Invariant broken: 2 directories point to the same block; more invariants will be broken...
Both ways DON'T work. If power is pulled between steps 1 and 2, invariants are broken. The first order is better only because the file is lost and not the entire disk.
*The 4th invariant can sometimes be broken, because only disk space is lost.