By Yige Li, Xiaohang Zhang,and Henry Su.
Here is a physical disk, with three platters, six surfaces, and thousands tracks of sectors. The performance characteristics of the disk depend on a physical character which is how fast it is spinning, how long it takes to move from one place on the disk to another. But what the file system sees when we look at a disk, is something a lot simpler, which is a big array of sectors, where the first sector is sector 0 up to large number of sectors, each of which contains 512 bytes. These sectors are laid out on the disk in some way, so there is a mapping function from this array view of the disk onto the physical view, and disk controller which is part of the hardware manages this mapping. There is one more view of what is on the disk, mainly all the data that is stored on the file system. The files are mapped onto the array view of the disk sectors, and this mapping is controlled by the operating system. In particular, the file system layout permits how this type of data is going to be mapped onto array view.
Good Performance means to avoid seeks*.Good Utilization means to use most disk space for file data.Good Robustness means to keep the disk consistent at all times.*Avoid Seeks Strategy: Disk tries to lay out sectors consecutively, and lay out file blocks in consecutive sectors.
RT-11 (RT stands for Real Time) is a single-user, real time operating system implemented in 1970 on the PDP-11 series of processors. Its file system stores files a t contiguous sectors.
All files are stored at contiguous sectors.
File's Location is defined by
starting sectorsize
Sector 0 is the boot sector.
Sector 1 is the superblock.
Superblock contains:
Just like every file system, the content of the directory is stored on the file system in the same way as a file. The difference between a directory and a file is how that data is interpreted. So files are generally big strings, but the file system has the requirements on the contents of the directory, because the file system uses the directory data to find other files. So what is in the directory? At root, every entry in the directory has a file name and the information you need to find that file, where that file is located. So what is the directory entry going to have in this file system? A name, a starting sector, a size. That is the minimal directory entry of this file system.
External Fragmentation – Utilization problem with contiguous allocation where unused space goes wasted because it is divided into small fragments.
Example: On a 2 GB disk (don’t worry about how free space is stored on this system)
Let’s say that I allocate a 1GB file, called X, starting at sector number 5, and go up to sector 220+4, and that corresponds to some directory entry in the root directory (which says: Name X, Starting Sector 5, Size 230 Bytes)
create a new file named small starting at 220+5, and occupies 2 sectors(directory entry: Name small, Starting Sector 220+5, Size 1024B)
Returning all these sectors to free space, and erasing the directory entry. There is approximately 2GB free space in the file system at this point
The largest file we can allocate is 1GB, and we cannot allocate any file larger than 1GB, because the free space is divided by small into 2, one is exactly 1GB, the other is slightly less than 1GB. So we have 2GB free space and I can use at most half of it for any one allocation. This is the general problem with all sorts of allocation strategies, named Fragmentation.
Fragmentation – performance and/or utilization problem caused by an allocation strategy.
External Fragmentation becomes a problem when there are no consecutive sectors large enough for a large file.
The particular type of fragmentation that you see here is called External Fragmentation. External Fragmentation is a utilization problem with contiguous allocation where unused space goes to wasted because it is divided into small Fragments. The disk in the example has more or less 2GB free space and divided into 2 smaller fragments. If we want to allocate files larger than 1GB, we can't because the space is fragmented. However, we can still allocate small files. So External Fragmentation only becomes a problem when allocating large files, or when the disk is almost full.
Compaction:
Solves External Fragmentation by moving data until unused space is contiguous.
Problem with Compaction - It is very expensive. (Takes a long time for large disks, and uses a lot of seeks.)
File Allocation Table is a linked list of numbers representing the status of each block.
Let n = FAT[a]
Directory Entry contains filename, first block number, size.
FAT allocates a file by going through File Allocation Table and look for -1’s.
Trade-off between FAT and RT11 - Despite we solved external fragmentation in FAT, we need more parameters to represent a file.
Internal Fragmentation: Utilization problem where allocated chunks can contain unused space.
Expensive Seek time: Seek time for nth block of a file is O(n).
Bad Utilization: Best case is (Disk size – FAT size) / Disk size. Worse case is less than 1/1024.
Invariant is an expression that is always true of correct code.
Distribute block pointersIndirect blocks give 0(log n) seeksFree block bitmapThe purpose of the free block bitmap is to keep track of free blocks on the disk. The size of bitmap depends on the number of blocks on the disk. Each bit represents the availability of a corresponding block on the disk. If bit is 0, then the block is not free. If bit is 1, then the block is free for allocation. The blocks containing the boot sector, the super block, the bitmap, and the inodes are never free. So the bits representing these sectors must always be 0. The blocks are represented sequentially in free block bitmap (block 0 is in first bit, block 1 is in second bit, etc.).
Advantages:
Disadvantages:
Every time the file system requests a new block, it must perform a linear search through the bitmap and look up every bit until it finds one that is free. All linear time operations are not efficient.
Inode is a data structure containing basic information (size, permission, mode, type and the number of links pointed to it) about a file, directory, or other file system object. For smaller files, we only need to use the direct block pointers. In order to support larger files, we will have to use indirect and indirect2 block pointers.
There are two reasons why we have hard links on the file systems:
Historical one
Back in the day, in the early 70s. There was no real idea of directory and sub-directory. The hierarchical directory idea that we have now came about only a little bit later. What happened then was every user of a computer has their own directory that would be the only directory that they could access. For example, if Justin and I share access to a computer, I couldn’t read Justin’s file and Justin couldn’t read my file. It is because the files were in different areas of the disk, and they couldn’t be touched. So what happened if Justin and I working on something together? The original thought of the system designer was that when two people are working on the same file, they will have a version of the file in both places. But how to support a file located in two directories? The solution was to use hard link in which a super user would create two directory entries that both point to the same file.
In my region of the disk there will be presentation.ppt and that refer to inode number 4. In Justin’s region of the disk he has his own version of presentation with the same inode number. Both of these inode numbers pointed to the same definition of the file. The inode has size 40000 bytes and direct pointers. So once we have the idea of multiple directory entries can point to the same version of the file, there is something we have to worry about. What happened if I remove my version of the file? Then we run into a robustness issue. If I delete my version of the file, what happened to the inode? Can we just remove the inode? No, because Justin still points to that inode. There is only a couple things we can do. Either we have to find Justin’s version of the file and destroy it, even though he didn’t remove the file, or we have to keep the file around until Justin removes it. If we add a link count to hard link, we can correctly discover exactly when the file is no longer used and when the last copy is removed.
Robustness one
Very rarely does operation system use hard links to share file (bad interface design), but frequently operating systems uses hard link to solve robustness issues.
Let's say a user wants to move this file “hello.txt” into the subdir. We have to write on block 300 in order to get rid of the old file, and we have to write on block 500 in order to move the file. But the disk does not write things in parallel. It has to write one of those two block before the other one. Consider this situation: We will have the power turned off at some point during the write operation, but assuming the power is not turning off where we are writing a block so the power only goes off between the block writes. Our goals is do not lose the file whenever the power goes out. One attempt would be we could write the new directory first, and then get rid of the old directory. Write hello.txt2 > write subdir > crash the file system > reboot Invariant is not correct, the number of links is off, we have two reference to the file but the link count is one. We could use this and destroy the whole operating system. One would suggest we free this by freeing the inode. As a side effect of freeing the inode is that it could scribble garbage all over the content of the inode. Second attempt, we erase the file in old directory first. But when the power is off after we erase the file in old directory, we lose the file.
Objective - Support a file that is in two directories.
Hard Link: Multiple directory entries refer to the same file.
Using Hard Link to solve the problem:
increase link countwrite new entrydelete old entryreduce nlink countHard Link uses nlink (link count) to keep track of how many directory entries link to the file. When one user deletes the file (its directory entry), nlink is decremented.
Aspects of hard link:
File systems will not allow creation of hard link targeting a directory. The reason is to prevent cycles in the directory structure. Example: assume hard linking directory is allowed, and we have directory A and directory B. In directory A, b is a hard link to directory B. In directory B, a is a hard link to directory A. We perform a search for c.txt on the file system. Given c.txt does not exist, the search may result in an infinite loop between directory A and directory B through the hard links.
Hard links to a file are preserved when the file is renamed or moved.
In Linux, the rm command does not delete the file. It reduces the file’s hard link count.