by Kou, Min Sup Jin, Tung Dang
Remember last time we discussed two ways to make file systems robust.
Still problems can exist even with these two robustness mechanism:
Why? Because these two systems try to promote File System Integrity, but these two mechanisms only operate from a File System Perspective. They don't concern themselves with the disk, but merely how the file system is organized. What kinds of problems do these two mechanisms fail to address?
3. Journaling
What we need is a filesystem ATOMIC TRANSACTIONS.
An ATOMIC TRANSACTION is a series of operations on persistent state, possibly but not necessarily including changes to persistent state, that happen either ALL AT ONCE (and permanently), or NOT AT ALL. This all-or-nothing property is what makes a transaction useful for maintaining disk consistency.
A set of changes with persistent state obeying the A C I D properties. Atomic transactions satisfy four ACID properties, which are:
Atomicity Consistency Isolation (NB not idempotence, see below) Durability
(Parallels the Synchronization problem.)
Each atomic transaction ends in a single COMMIT POINT. This ends the group of operations. If the system crashes before the commit point, then none of the changes in the transaction can be stored permanently, due to the Atomicity property (all-or-nothing). If the system crashes AFTER the commit point -- after the commit returns -- then ALL of the changes in the transaction must be stored permanently, because of the Durability property; for instance, the reboot process might complete the transaction.
Question: Would ACID properties helpful when we’re trying to maintain a robust and correct file system.
Answer: What if all the changes to the file system happened inside transactions? If the file system plug is pulled, the transaction mechanism takes care of the consistency issue, as the transactions ensure that the change either happens completely or doesn’t happen at all.
File system changes in transactions -> File system is always consistent!
Problem solved! \ (^_^) /
Question: How do we implement transactions?
Guess: Hardware? Unfortunately, Hardware implementation of transactions would be much more difficult and outside the scope of the lecture.
Answer/Idea: Do everything twice : keep a log (journal) of all file system changes.(Record of all File System changes, circular log- does not grow infinitely.) Journal mechanism characteristics:
A common implementation technique for atomic transactions is called WRITEAHEAD LOGGING; we used this today in the context of journaling. "Writeahead" means that the log is written BEFORE the transaction commits. Here's how writeahead logging works in the context of journaling.
1. All changes are first written to the log.
2. After the transaction's log entries are written, a COMMIT RECORD
is written to the log. Writing this commit record is the commit point.
3. After the commit record is written, the changes are written to the main
part of the file system.
4. When these are done, a COMPLETION RECORD is written to the log, marking the
transaction as complete.
After a crash, the reboot procedure *recovers* the log by *replaying* any committed transactions:
''For each entry in the log,
If the entry has a completion record,
Skip it
Else if the entry has a commit record,
Write the logged changes to the main part of the file system
Then write a completion record
Else, the entry has no completion or commit record
Skip it (do not perform its changes)''
This recovery might write the logged changes to the main part of the file system multiple times. For instance, the computer might crash after writing changes to the main part of the file system, but before writing the completion record; the recovery procedure would then write those changes again. For this reason, it is important that log entries be IDEMPOTENT: that performing them two or more times has the same effect as performing them once. (I accidentally promoted this requirement to an ACID property, but it is really just an implementation technique.)
There are other ways to design a transaction log. For example, you might store BOTH the old data AND the new data for each change. This would allow steps 1 and 3 above to happen somewhat in parallel; the reboot procedure can UNDO any changes that were written to disk too early. Such designs are common in databases.
Non-Journal:
Requires 3 actions:
Pro: Simpler, efficient, and doesn't waste as much memory as journaling.
Con: Only guarantees block by block atomic writes. Possible system leak errors as only 3 out of the 4 invariants are held at any given time.
Journal:
Problem: We’re making file system changes that are modifying more than one block yet we can only change one block at a time (safely) and the change should happen atomically, either all at once, or not at all, no matter when there’s a hardware failure.
Solution: Keep a log or a journal of our transactions.
Question: How do we defend against mechanical failure?
Solution 1: Replication – Most basic way of achieving robustness.
Replication assures that faults does not necessarily imply failure. You need more than one fault to have a failure.
Replication strategy helps to avoid failures for the initial rarer faults.
Replication is actually the simplest version of RAID.
Solution 2: Redundant Array of Independent Disk.
RAID uses multiple disks for performance and robustness. (Raid is defined by multiple levels, with each level having it’s own setup, designation. Add to that no one can really agree to what levels should exists, or what some more obscure levels mean…)
Different levels of RAID differ not only in terms of perfomance and efficiency but also in terms of Mean Time to Fault and Mean Time to Failure.
MMTFault/Failure is the average length of time it takes a component to fault/fail.
RAID 1: Mirroring.
You might actually be able to get more performance and have faster SEEKS with a mirrored setup. (By having two independent seek heads.)
RAID 0: Striping
Raid 2: Mirror + Striping combination
Raid 4: Parity
Parity Bit Explanation
Given A,B,C,D, and a parity bit X, and the XOR operation, you can design a file recovery system.
You can take away any of the bits and recover the value of all the bits if you know which bit was taken away.
Raid 5: Distributed Parity bits
“A distributed system is one in which the failure of a computer you didn’t even know existed can render your own computer unusable.” – Leslie Lamport
The main problem of distributed systems: FAILURE, and dealing with it.
What is so different about a networked system compared to our own computers?
Question: We have a small mini network in our computers, the BUS. It networks the CPU and the disk, and memory, and etc. Yet our failure rate isn't as high or as big as a problem as in distributed systems. Why?
Answer: Correlated failure rate. If one part fails, it is likely that that other parts will fail. However, on a distributed network, locality of failure rates does not hold, and there are many more parts that can fail independently. So we have to cope with a very short mean time to fault.
Additional Problems with Distributed Systems:
Question: How does the network receive a packet?
Answer: The network card sends an interrupt whenever a packet is available.
Roughly 10000 packets a second are possible, which means… 10000 interrupts a second. How do we service 10000 interrupts a second? We use some sort of DMA queue for received data.
If the number of interrupts grow so large, that we are unable to service any of the packets, and can only handle interrupts.
Under high loads, we actually want to switch to POLLING instead of interrupts to deal with high load attacks.